Getting into programming with Excel
If you want to get into programming, Excel is an easy way to do it. Everyone has it on their laptop or work computer so you don’t need to worry about installing software or getting permission to use it. That is important because to become an effective progammer you need to practice. It is as easy as openning Excel and typing a few key strokes. So, please have Excel open as you read this article and follow along.
We are going to start with basic built-in Excel functions in the spreadsheet then move on to Visual Basic to build our own functions. We will see how functions you write can be called from the spreadsheet and see a way of testing the results using conditional formatting.
That will give us all the tools we need to write a proper program which we do in the next article with an extended step-by-step example. That will give you a feel for how real programs are written, tested and debugged. In the following article we will move out of Excel and Visual Basic and into Python and its spreadsheet-like helpers, Pandas.
Functions
A fundamental idea in programming is the function. Just like functions in maths they take inputs (called “arguments”) and return a result.
Let’s try one of Excel’s built-in functions, MAX(). It takes two or more numbers as inputs, given in brackets after the function name, and returns the biggest, the MAXimum. Excel displays this result in the cell where the function is, so you see the result of the function, not the function itself.
Start a fresh spreadsheet in Excel, click on an empty cell and type
=MAX(5,7)
then hit Enter. You should see the number 7. If you see
MAX(5,7)
then you missed off the equals symbol in the first
character. Excel needs that to know you meant to call its MAX()
function rather than just putting the word MAX in a cell. If you
really do want to have the text “=MAX” show in a cell, you have to
start with a single quote.
MAX()
can take more arguments. Either double click on the cell
containing the function or in the box above the grid showing the
function and give it lots of arguments, such as
=MAX(5,7,-1,234,-100,25)
and hit enter. Check that the result makes
sense to you.
As well as taking numbers as arguments, MAX()
can get its arguments
from other cells, it just needs the cell addresses. Let’s try it. Type
three different numbers in three different cells, say 2, 7
and 11. Then in a fourth cell, type =MAX(
without pressing Enter
then click on the cell containing the number 2, then type a comma,
then click on the cell containing 7, type another comma, click on the
cell containing 11, type a closing bracket and hit Enter. You should
see the number 11.
If you need to pass lots of cells as arguments you can type in a very long list, but so long as the cells are all next to each other in a row or column, you can just tell it the first and last cell, separated by a colon and Excel works out the rest. This is called a range. Let’s use this to find the oldest patient admitted for an ultrasound:
Some functions operate on a whole dataset at once, such as MAX()
but
others you want to apply to each row (or column) individually. In that
case just put a copy of the function on every row. Excel is set-up to
help you do this. Write the function once on the first row then drag
the dot in the bottom right corner until all the rows are
covered. Excel updates the cell address of the argument as it copies
the function into each new cell so that it points to the correct
row. It will do this for the row and column part of the address. This
is sometimes called a “copy down”. If you don’t want that to happen
you can fix, or anchor the row or column or both by prefixing that
part of the address with a dollar sign.
Chaining functions and loops
The idea of applying a function to every row in a dataset is important. It will form the basis of most of our programming. Programs can be thought of as combinations of functions applied to rows of a spreadsheet. Let’s have a look at how to combine functions.
We have seen how functions can get their arguments from cells. If
those cells in turn get their values from functions then we are
chaining functions together with the outputs of one function being
passed as the inputs to another. There are two ways to do this. Either
put the results of the first function into a cell and use that cell
address as the argument to the next function, or if you don’t need to
see the intermediate result, write the first function directly as the
argument of the second. A common use for this is with the IF
function which allows you to make a decision in your calculation. If
the condition given as the first argument is true then return the
second argument, else return the third one. In this example we change
our calculation depending on whether the patient is male or female.
Chaining functions is the start of building larger, more sophisticated programs. In theory you could keep piling up functions in a single cell, but in practice it becomes incomprehensible. Excel doesn’t care but from the human author’s point of view things are more comprehensible and managable if they are written out in plain text with intermediate values being stored in what are called variables.
Writing your own functions
Although Excel has dozens of built-in functions we soon need one that it doesn’t have or we want to combine them to form a program of our own. In Excel this means writing “Visual Basic for Applications” or VBA. To do this you need to open the Visual Basic editor. How to do this varies on different versions of Excel, so you may need to check Excel Help or Google around a bit. On my version it is under the Tools menu.
You should see a new window with a tree-like navigator on the left. This shows a “VBA Project” with a “Microsoft Excel Objects” below it, with “Sheet1” and “This Workbook” below that. The function we are going to write will live in what is called a “module” and you need to add one to start with. To do that either right-click anywhere on the tree to open a menu and find Insert > Module, or there is a button on the toolbar. Either way you should end up with a window called “Module 1”.
My first function
Click in the module window and enter the text you see in the image
below. This is Visual Basic for Applications, and it is creating a new
function called twice
. It takes one argument, which it expects to be
a whole number, and it will be refered to in the code below by the
name x
. It returns another whole number to the cell in the sheet from
where it is called. That returned value is just two times whatever x
is.
To call the function, find an empty cell in the spreadsheet and type
=twice(2)
and hit enter. You should see 4.
As with built-in functions you can call your function on every row of a sheet, by copying it down, to apply it throughout your dataset.
Tests
A good habit to cultivate is checking that the code you just wrote does what you expected it to do. This is called testing. Even better is to write down what you expect the result of a function to be before you write it. In this way the test becomes the specification for the function. This is called Test Driven Development and leads to very high quality code.
In Excel we can use conditional formating to give visual feedback on whether a test is passing. We colour a cell green if its value matches the expected result and red otherwise. Getting conditional formatting to work varies across different versions of Excel, but on my version I select Format > Conditional Formatting… to get to a “Manage rules” dialog. Mine shows the two rules I already set-up but yours will be empty to start.
Click +
at the bottom to add a rule and select a style of
“Classic”. You will need two of these rules. A rule to show green
formatting when the result equals the expected outcome and another to
show red when the two are not equal.
With that conditional formatting applied to all our “Actual” cells,
here are some tests for our =twice()
function:
For a trivial function such as =twice()
tests do not contribute
much. But for even slighty more complicated functions they really help
in several ways:
- Edge cases. Write some tests for what the function should return for out-of-range or unusual values.
- Defending against future changes. Often we return to code in order to fix or improve it, but those changes can break previously made assumptions which you may have forgotten but other functions depend on. By having tests with the code that check those assumptions your code is protected against the unintended consequences of future changes.
- Documentation. Simple tests explain to other users what the intention of the function is, especially for edge cases. That other user is often a future you, when you have forgotten the details of what you did months ago.
- Start simple then add complexity. For complex functions it can be impossible to just write down the whole thing in one go. Practically it helps to start with some simple cases then make adaptions for more complex ones. By building up a set of test cases you can ensure that your changes are not regressing and breaking what you have already achieved.
In the next article we will put everything we have learned together and write our first real program.