Getting into programming: tools

If you study computer science you could be forgiven for not realising that programmers use tools; all that matters is the mighty language compiler. In practice, tools such as shells, editors, version control, testing frameworks and linters are what programmers spend most of their time interacting with. Getting your toolset right can turn programming from a frustrating chore into a satisfying, productive activity.

Command lines

Historically operating systems offered their services via code libraries, such as libc the C Standard Library. These are part of what is known as the operating system kernel. A convenient means of accessing those services, a wrapper around them if you like, is known as a shell, or more commonly the command line interface or just CLI. Shells are from the Unix family of operating systems but Windows has the Command Shell and PowerShell too.

Your computer will have a default shell which is what you are interacting with when you open a “Command Prompt” or “Terminal”, and there are lots more to choose from: bash, zsh, ksh etc. If you are just starting out it does not make much difference which one you choose. What is important is to learn its features and use them to your advantage.

Minimise typing (and typos!) by using the shell’s command history: it remembers everything you type and you can search back through that history to retrieve previously typed commands. Auto-completion either completes what you are typing or gives you the options simply by hitting TAB. Output redirection can send the results of your commands to files rather than the terminal so you can inspect them at your leisure in an editor, or use them as input to other commands. Doing this without an intermediate file is called a pipe and uses the pipe symbol: |. For example, how many files are there in the current directory? Pipe the output of the file list command (in one-file-per-line mode) to the word counter command (in line counter mode: ls -1 | wc -l. Shell commands can be put in a file and run just like a Python program, just remember to make the file executable: chmod +x my_commands.sh. This is called a shell script.

Editors

Programs are text files, so to write a program you need a text editor. In our article about starting programming in Excel we used its built-in editor. Most computers these days come with a primitive editor but you almost certainly will not want to use that for long for programming. There are dozens of far better alternatives.

Old-school programmers like your author like to use an editor that works inside the terminal, such as Vim or Emacs, but they are not the most approachable for beginners. Sublime text and TextMate are free and easy to get started with. For all the bells and whistles you can try Visual Studio Code.

You can try them all but be wary of tool distraction, they are supposed to be a means to an end.

Version control: git and github

We are all used to having an “undo” feature in applications such as Word or Excel. Having a similar feature for your programs is invaluable. While your text editor will likely enable you to undo back to your last save, what about going back to the version you had yesterday, or last month or going back to the program as it was five years ago? This is called version control and these days everyone uses git to do it. Git keeps code in repositories (files within a .git folder) and can synchronise with other copies of those repositories. They can be anywhere but usually they are on github.com. This allows you to share code with other developers. Even if you program alone, keeping your code in a github repository is a good habit to cultivate.

Your git repository is a complete history of your program, assuming everyone working on it has been committing to the repository regularly. This should give you confidence to make changes because even if you completely break your program you can revert back to the state when it was last working. You can even commit to git a broken version of a program and it will not matter because you can revert to a version prior to the commit. If you are using github to keep a replica of your repository you can even trash your local copy of the repository, for example if your hard disc breaks, and it will not matter because you can just clone a copy from github. Git and Github are your project’s undo button and safety net.

Git: set-up and workflow

To start using git on a project you need to initialise a repository, or repo for short. This is simply some files in a folder called .git/ alongside your code. The repo is empty until you add some files and make your first commit:

$ cd my_project
$ git init  # Initialise a repo
$ git add my_file.py
$ git add README.md
$ git commit -m 'Initial project version' 

Those two files in their current state are now in the repo forever. No matter what you add, change or delete, you can always revert to that state with git checkout.

Note this was a two step process. Adding files is called “staging”. Staged files are copied into a holding area known as the “index”. This is because a change to a program usually involves multiple files: the code, its tests, some data etc. You can also remove files from the index if you make a mistake. Staging often takes multiple git add commands. Once you have the index in the state you want it (you may not want to stage all changed files) then you make the commit. git commit takes all the files currently staged in the index, stores them in the repo and makes a commit object that records which files were commited, who did it, when it occured and what was the previous commit.

This add and commit process works on the project as a whole. If you revert to a previous commit, it will likely revert several files. Git can work on a file-by-file basis but this whole project approach is the default.

Git: branches

Once you have some code in use, “in production” as we say, you will likely want to add more features to it. However that takes time and there will be a period where the new features are incomplete or buggy. You don’t want the new code to interfere with the production code until it is ready and conversely you don’t want to lose what you are working on if you have to go back to the production code and make a change to that. This is what branches are for.

The entire code base currently used in production is one branch, usually known as the “master” or “main” branch, while the entire code base including your new feature or bug fix is another branch, sometimes referred to as a feature branch. Git allows you to have any number of branches and to switch between them easily. Once work is complete on a feature branch it is merged into another branch, usually the master branch. The feature branch can then be deleted. The merge process is very sophisticated, operating on all the files in the project and dealing with conflicts where the same file has been changed in the same place on different branches. Git is careful never to lose any changes, even if there are merge conflicts.

Branches are in fact just “pointers” to commits in the git repo and are therefore very lightweight. You should have no qualms about creating branches. Frequent commits, for example as soon as your tests pass, and developing any new feature on its own branch, or fixing a bug on its own branch, are good habits to develop, even if you work alone. If you code with a team branches are essential.

Using git is a huge subject but start with this tutorial.

Testing frameworks

Testing your code as you write it, preferrably writing tests before you write the code, is a good habit to acquire. Where your tests should be stored, how they are run and how the results are shown is the business of testing frameworks. Whatever language you use, there will be multiple testing frameworks. Also most editors these days integrate with these frameworks so you can run your tests without leaving the editor. You should run your tests very frequently, almost continuously, and certainly before your commit your changes to git.

In our articles we use Python. The language itself has the assert statement which can be used for elementary testing:

assert expected_val == my_func()  # Raises an exception if not True

For more extensive testing we usually separate the tests from the code it is testing by putting the tests in their own files. As the number of test cases grows we need ways to organise them and run a selection of them, usually just for the code we are working on. This is where testing frameworks come in.

Pytest

Pytest looks for all files with names test_*.py or *_test.py, executes the tests it finds there and presents the results in a way that allows you to go straight to the code causing any test failures. Within a test_*.py file you can group tests in a class and you can ask pytest to only run that class. Writing a test is as simple as importing the function under test and writing some assert statements.

Linters

You will quickly get a sense of what well written code looks like. This can be codified into style guides which in turn are used in so-called linters (removing “fluff” from code) which will automatically check your code against a style guide. As with git, many modern editors integrate with linters to highlight style problems in your code as you write it. Many linters can also correct styling deviations for you.

There are literally hundreds of tools for programmers and the more you get into programming the more you use. In future articles we show this in action in both Excel and Python.