Getting into programming: tools
If you study computer science you could be forgiven for not realising that programmers use tools; all that matters is the mighty language compiler. In practice, tools such as shells, editors, version control, testing frameworks and linters are what programmers spend most of their time interacting with. Getting your toolset right can turn programming from a frustrating chore into a satisfying, productive activity.
Command lines
Historically operating systems offered their services via code
libraries, such as libc
the C Standard Library. These are part of
what is known as the operating system kernel. A convenient means of
accessing those services, a wrapper around them if you like, is known
as a shell, or more commonly the command line interface or just
CLI. Shells are from the Unix family of operating systems but Windows
has the Command Shell and PowerShell too.
Your computer will have a default shell which is what you are
interacting with when you open a “Command Prompt” or “Terminal”, and
there are lots more to choose from: bash
, zsh
, ksh
etc. If you
are just starting out it does not make much difference which one you
choose. What is important is to learn its features and use them to
your advantage.
Minimise typing (and typos!) by using the shell’s command history:
it remembers everything you type and you can search back through that
history to retrieve previously typed commands. Auto-completion
either completes what you are typing or gives you the options simply
by hitting TAB. Output redirection can send the results of your
commands to files rather than the terminal so you can inspect them at
your leisure in an editor, or use them as input to other
commands. Doing this without an intermediate file is called a pipe
and uses the pipe symbol: |
. For example, how many files are there
in the current directory? Pipe the output of the file list command
(in one-file-per-line mode) to the word counter command (in line
counter mode: ls -1 | wc -l
. Shell commands can be put in a file and
run just like a Python program, just remember to make the file
executable: chmod +x my_commands.sh
. This is called a shell
script.
Editors
Programs are text files, so to write a program you need a text editor. In our article about starting programming in Excel we used its built-in editor. Most computers these days come with a primitive editor but you almost certainly will not want to use that for long for programming. There are dozens of far better alternatives.
Old-school programmers like your author like to use an editor that works inside the terminal, such as Vim or Emacs, but they are not the most approachable for beginners. Sublime text and TextMate are free and easy to get started with. For all the bells and whistles you can try Visual Studio Code.
You can try them all but be wary of tool distraction, they are supposed to be a means to an end.
Version control: git and github
We are all used to having an “undo” feature in applications such as
Word or Excel. Having a similar feature for your programs is
invaluable. While your text editor will likely enable you to undo back
to your last save, what about going back to the version you had
yesterday, or last month or going back to the program as it was five
years ago? This is called version control and these days everyone
uses git
to do it. Git keeps code in repositories (files within a
.git folder) and can synchronise with other copies of those
repositories. They can be anywhere but usually they are
on github.com. This allows you to share code
with other developers. Even if you program alone, keeping your code in
a github repository is a good habit to cultivate.
Your git repository is a complete history of your program, assuming everyone working on it has been committing to the repository regularly. This should give you confidence to make changes because even if you completely break your program you can revert back to the state when it was last working. You can even commit to git a broken version of a program and it will not matter because you can revert to a version prior to the commit. If you are using github to keep a replica of your repository you can even trash your local copy of the repository, for example if your hard disc breaks, and it will not matter because you can just clone a copy from github. Git and Github are your project’s undo button and safety net.
Git: set-up and workflow
To start using git on a project you need to initialise a repository,
or repo for short. This is simply some files in a folder called
.git/
alongside your code. The repo is empty until you add some
files and make your first commit:
$ cd my_project
$ git init # Initialise a repo
$ git add my_file.py
$ git add README.md
$ git commit -m 'Initial project version'
Those two files in their current state are now in the repo forever. No
matter what you add, change or delete, you can always revert to that
state with git checkout
.
Note this was a two step process. Adding files is called
“staging”. Staged files are copied into a holding area known as the
“index”. This is because a change to a program usually involves
multiple files: the code, its tests, some data etc. You can also
remove files from the index if you make a mistake. Staging often takes
multiple git add
commands. Once you have the index in the state you
want it (you may not want to stage all changed files) then you make
the commit. git commit
takes all the files currently staged in the
index, stores them in the repo and makes a commit object that records
which files were commited, who did it, when it occured and what was
the previous commit.
This add and commit process works on the project as a whole. If you revert to a previous commit, it will likely revert several files. Git can work on a file-by-file basis but this whole project approach is the default.
Git: branches
Once you have some code in use, “in production” as we say, you will likely want to add more features to it. However that takes time and there will be a period where the new features are incomplete or buggy. You don’t want the new code to interfere with the production code until it is ready and conversely you don’t want to lose what you are working on if you have to go back to the production code and make a change to that. This is what branches are for.
The entire code base currently used in production is one branch, usually known as the “master” or “main” branch, while the entire code base including your new feature or bug fix is another branch, sometimes referred to as a feature branch. Git allows you to have any number of branches and to switch between them easily. Once work is complete on a feature branch it is merged into another branch, usually the master branch. The feature branch can then be deleted. The merge process is very sophisticated, operating on all the files in the project and dealing with conflicts where the same file has been changed in the same place on different branches. Git is careful never to lose any changes, even if there are merge conflicts.
Branches are in fact just “pointers” to commits in the git repo and are therefore very lightweight. You should have no qualms about creating branches. Frequent commits, for example as soon as your tests pass, and developing any new feature on its own branch, or fixing a bug on its own branch, are good habits to develop, even if you work alone. If you code with a team branches are essential.
Using git is a huge subject but start with this tutorial.
Testing frameworks
Testing your code as you write it, preferrably writing tests before you write the code, is a good habit to acquire. Where your tests should be stored, how they are run and how the results are shown is the business of testing frameworks. Whatever language you use, there will be multiple testing frameworks. Also most editors these days integrate with these frameworks so you can run your tests without leaving the editor. You should run your tests very frequently, almost continuously, and certainly before your commit your changes to git.
In our articles we use Python. The language itself has the assert
statement which can be used for elementary testing:
assert expected_val == my_func() # Raises an exception if not True
For more extensive testing we usually separate the tests from the code it is testing by putting the tests in their own files. As the number of test cases grows we need ways to organise them and run a selection of them, usually just for the code we are working on. This is where testing frameworks come in.
Pytest
Pytest looks for all files with names
test_*.py
or *_test.py
, executes the tests it finds there and
presents the results in a way that allows you to go straight to the
code causing any test failures. Within a test_*.py
file you can
group tests in a class and you can ask pytest to only run that class.
Writing a test is as simple as importing the function under test and
writing some assert
statements.
Linters
You will quickly get a sense of what well written code looks like. This can be codified into style guides which in turn are used in so-called linters (removing “fluff” from code) which will automatically check your code against a style guide. As with git, many modern editors integrate with linters to highlight style problems in your code as you write it. Many linters can also correct styling deviations for you.
There are literally hundreds of tools for programmers and the more you get into programming the more you use. In future articles we show this in action in both Excel and Python.