2.3 – Producing Robust Programs

This section is all about creating programs that don’t fall apart as soon as something unexpected happens. We should all know by now that computers have zero intelligence and can only do exactly as they are told. Therefore, when we do something like taking input in a program and we tell the computer to expect a number to be typed in – it literally does not know what to do should the user decide to faceplant the keyboard and enter a random string of characters. Without some form of defensive or robust design, such a program would either crash or produce very unexpected output as a result.

Defensive design encourages programmers to think about the ways in which a program may be used, abused or misused when it is launched to the general public. Each program will require its own specific design and code to mitigate against issues that could arise, but there are generic methods of protecting and strengthening a program. We will look at these in turn now.

In this section:

2.3.1 – Defensive Design
2.3.2 – Testing

2.3.1 – Defensive Design

“It’s a defensive masterclass, they’ve absolutely parked the bus, Gary, game of two halves, jumpers for goalposts. Marvellous, isn’t it?”

Anticipating Misuse

Computer users are an absolute nightmare. They never use a system as you intend – and don’t ever offer “tech support” to anyone, ever. You’ll be 9 hours into resolving a printer and driver conflict, questioning your very existence.

Anticipating misuse simply means that as programmers and designers we actively try to predict and plan for ways in which users might try to use a system. This can include simple things like accidentally entering the wrong kind of data into a form, to actively trying to break in to a system. By proactively working to predict ways in which our programs might be abused by users, they can be made more robust and secure during development.

Taking the time to test, improve and secure a piece of code is costly – any time a programmer spends on a project will have a cost implication. However, this is nothing compared to the money a company may lose if software is compromised, a database stolen, and user information made available on the internet. In cases such as these, large and expensive legal action tends to follow, resulting in significant losses.

The methods of anticipating and preventing misuse are, unsurprisingly, the ones we are about to cover from the specification:

Authenticating users – only letting in authorised users into your system or program.
Validation – checking data entry matches a set of rules to remove problematic or garbage data.
Verification – ensuring data entered is correct and free of errors.
Testing and test planning – actively seeking out and fixing errors in code.

Authentication

Authentication is one of the simplest and most common methods of securing either an entire system, an account or a piece of software. The use of usernames and passwords ensures that there is a log or record of every user, when they accessed the system and in some cases what they did whilst logged on.

By controlling access in this way, administrators can ensure that the system is only used by approved users, that there is an audit trail of evidence should something go wrong and there is control over who can use a system and when.

Authentication is covered in more detail during 1.5 – Systems Software (Operating Systems) as this is almost ubiquitous in all modern operating systems except those found on mobile phones. In a mobile environment it is more common to authenticate users with biometrics such as a fingerprint, facial recognition or in some cases a pattern drawn, or code entered, on screen.

In recent years, there has been a large push by big software companies such as Microsoft to move away from passwords as they are largely insecure – users often select terrible passwords such as “letmein” and “password123” which can be instantly broken using brute force and dictionary attacks. Biometrics are far more secure, but not always available or convenient. Therefore, the future of passwords is probably in the use of authenticator applications which generate single use codes to log users in. These cannot be re-used and are useless if intercepted and stolen.

Input Validation

Forms are everywhere – on websites when we sign up for a new account, online shops to enter your address, surveys… the list goes on and on. Any time we allow users to enter data into a form, we are inviting them to misuse our website or program. When a program collects data from users it needs to be as accurate, correct and useful as possible. If we allow users to enter garbage into a form (as we surely all have done to get past a registration screen or similar with email addresses like aaa@aaa.com) then the data collected is useless. This can cause all kinds of problems, but there is worse.

Back in unit 1.4 we learned about threats to computer systems. One method of attack is SQL injection, and for the sake of revision, the definition is “a mal-formed or malicious query sent through a web form, which if executed by the webserver, will grant access / delete data / amend data etc.” To prevent SQL injection attacks, websites can implement some form of sanitisation (literally cleaning) of any text which is entered into a form. Sanitising text would mean things like symbols are removed automatically to prevent “code like” input.

Another method of sanitising data is to apply what are called “validation rules.”

Validation ensures that whatever has been typed in adheres to a particular rule. There are a set of rules that we can apply when validating data:

Length check – how long (in characters) is the data that has been entered? A boundary can be set so users cannot enter any text longer than this limit. This prevents a lot of attacks that may happen when users try to enter code or code like queries into text input boxes.
Type check – specifies a certain data type that must be used. For example, if integer data is expected and the inputted data does not match this type, then it is rejected.
Range check – specifies a range for numeric data, such as “between 1 and 100.” Any data that falls outside this range is rejected.
Format check – specifies a format that data must follow. The most common is for dates such as “DD/MM/YYYY”. If data entered is not in this exact format then it is rejected.
Presence check – quite simply, something must be entered! Blank entries are not allowed. This prevents users from skipping or missing out important or compulsory data fields.

Validation rules are particularly powerful as they prevent a lot of incorrect/poor data from being entered in to a system. Validation rules may be combined to create a particularly restrictive set of checks that would prevent all but the correct type of information being input. However, this does not stop people simply lying – for example during an age entry field, there is no combination of validation rules that could stop someone putting in a made up date of birth.

Furthermore, whilst validation can stop a lot of erroneous data being entered, it is not foolproof. There are ways of circumventing data validation rules, especially in websites where it is possible to manipulate the data sent by web forms or simply bypass the validation checks entirely. For this reason it is often the case that validation on website data is carried out both at the client and again when it arrives at the server to ensure it has not been altered.

In programs and applications, validation is important as it prevents a range of scenarios where incorrect input would crash a program. When we write solutions to exam questions at GCSE level, we rarely worry about if the user types in the wrong thing. However, in reality if a user typed in text into an input prompt which expected numeric data and stored this in an integer variable, it would cause the application to crash unless you had used validation, error trapping code or both.

In other words, programs are extremely fragile and we as programmers have a responsibility to ensure that code is robust – that we specify exactly what happens when users do the wrong thing or unexpected input is provided. Validation simply helps to reduce the amount of this error correction code that we need to write.

Maintainability

It works, doesn’t it? I don’t see your problem…

According to Google (and who doesn’t just automatically take everything that comes up on there as cold, hard facts?) there are approximately 50 million lines of code inside Windows 10. Furthermore, it is estimated that there are 2 billion lines of code that have been written by Google engineers in their own systems (search, mail, YouTube, drive etc). To put it one way, that’s a lot of code. To put it another, if you printed it out, that’s about 17.5 badgers worth of code.

We don’t think about other people when we write code ourselves, simply because it is never meant to be seen by anyone else. We don’t want other people seeing the deep, dark secrets of our awful, unstable code that would happily break at a moments notice should you just press the wrong key. I can remember several occasions where I’ve written code, it somehow works as I wanted it to, but I couldn’t for the life of me explain why. Best not to touch it just in case it breaks.

In the real world, however, programmers work in teams on large projects that can be managed across different countries even. This means they have to work in a way in which other people can understand what they are doing or have done. People leave their jobs and move on to pastures new, it’s just a fact of life, but this would cause huge problems if your lead programmer left and they were the only person in the world who could decrypt their cosmic code that only they could understand.

Many, many millions of pounds have been spent in the past, and will be spent in the future, on hiring programmers to update old systems. Most of that money is not spent on actually doing the upgrades, but spending weeks and weeks going through old, badly documented code, trying to work out what on earth it does. It’s like taking your car to the garage because it doesn’t work properly, but point blank refusing to tell them what’s wrong – “it’s your job to work it out!” This is utter madness, but happens in computing all the time.

This section, then, is all about a few things we can do as programmers and project leaders to ensure that in future our systems can be maintained, fixed, upgraded or expanded with the minimum amount of pain for developers who may have to delve in to our code long after we have moved on.

Sub Programs

One day you might understand this joke. Then you’ll be the most interesting person in the room…

One of the simplest methods of making code easier to read and understand is to break it up into logical chunks. If you cast your mind back to the start of unit 2, you learned about abstraction and decomposition. These are not just essential tools in the concept/design phase, but throughout development. If you decompose a problem properly, you should end up with some very specific “blocks” of code that then need to be written. If you can keep these as discrete (they work on their own) blocks, then you can give them to other people to write, you can break your problems down efficiently and you ultimately make your program much easier to write and maintain.

There are lots of ways of breaking up a program into “sub-programs.” Some of the most common are:

Classes
Functions
Procedures

A class is a special type of structure used in programming languages which are Object Oriented. This is not something we study at GCSE, but you have probably used without realising it. For now, you just need to know that a “Class” is simply a template for an object you wish to create in your program, and they contain both functions and procedures.

Procedures

Procedures are one of the oldest methods of programming and breaking up programs. A Procedure is:

A block of code.
With a clear, well-defined purpose.
That has a name.
Which can be called (and will then execute when called).
That can take in data in the form of parameters (things you give to the procedure so it can do its job).

Procedures are used for two main reasons. The first is that it simply helps us to write clear and well-defined blocks of code. When you decompose a problem you will have a list of significant things that your program needs to do such as “read data from file” and “display menu.” These are obviously well-defined problems that need to be solved and also have nothing to do with each other. It would make so much sense to put them in their own block of code – so we do!

The second reason we use procedures is efficiency and reusability. You should know by now that we are constantly doing the same things over and over again in a program. There are lots of things we do repeatedly as users or in a program – for example, displaying a menu, moving the mouse across the screen. If we write a block of code to do common operations such as these then it can be re-used as many times as we wish. Even better than this, we could save these procedures to a file and use them in other programs. Now that is efficient use of your time.

There is one final reason to use procedures and that is robustness. As a procedure is a small, tightly defined block of code, it should be the case that it can be thoroughly tested, refined and perfected. This means that you can take these blocks of code and put them in other programs knowing that they are not going to fail or produce unexpected results. This again is efficient and saves time.

The way in which Procedures are used is covered in our programming lessons.

Functions

Functions are almost identical to Procedures. Procedures are a block of code which perform a specific task. We can send data to a Procedure (like variables or arrays) in the form of parameters, however they give you nothing back. A Function has all of these characteristics but, when it is finished, returns a value. In simple terms, Functions give you something back when you call them.

You’ve used functions all the time just without realising it. Every time you do INPUT in a program, you are calling a function – in this case a block of code which goes away, reads whatever is typed in on the keyboard and then when the user presses the enter key, it sends you back whatever they typed in.

Other Functions you’ve used might include things like string.Length() which gives you back the length of a string and string.Substring(x,y) which gives you back some characters from a string. You get the idea that functions give you something back!

To conclude, procedures and functions make our programs more maintainable because:

They break the program up into clear chunks / parts.
Each procedure or function has a single, well-defined role.
They can be re-used repeatedly, even in other programs.
They are usually robust or can be easily tested to ensure they are robust.

Naming Conventions

“We turn now to our Scottish naming correspondents, Mr T. What do you think of Mr Musks naming convention?”
“We don’t know what to think, so we asked someone else 4real.”
“For real?”
“4real.”

Maths is odd. When you use algebra, you are constrained to single letters such as x, y and t. When mathematicians run out of letters they start using the Greek alphabet for fun. Whilst Mr Musk has used this as a naming convention for his own children, it doesn’t make anything easy to read or understand. After all, what is x? What does it represent?

In programming, we use the concept of algebra when making variables, only they’re much better because we can use whole words instead of letters. If you want to store the score a player has attained in a game, you’d call it “score” and not “s” or perhaps we are storing an email address, the variable would be “email” and not “e.”

This ability to give variables sensible, descriptive names is essential to making programs easier to follow and to aid in the future maintenance of any program that has been written. Weird, unfathomable variable names are one of the worst things a programmer can come across when trying to understand how a piece of code works. Conversely, when variable names are obvious and descriptive, the problem becomes trivial.

Due to this, there are a number of rules or conventions that programmers tend to follow when naming variables. Once agreed upon, everyone who works on the project should follow the same method to ensure consistency. Some possible conventions are:

Using long, descriptive variable names
Using Camel Case – EachWordIsCapitalisedAtTheStart
Using underscores – each_word_is_separated_by_underscores
Using prefixes to indicate data types that have been used – int_NumOfLivesLeft, char_Initial, str_PlayerName

Indentation

Once upon a time, programmers used to have to number each line of code manually and then type in program statements up to a certain limit before moving on to a new line. If you wanted to break your program up, you had to just invent a new block of numbers that only you knew the purpose of and off you went. This made programs horrific to decipher – with programmers using all sorts of tricks to cram statements on to one line or using seemingly arbitrary line numbers without commenting what they were for.

With the advent of procedural languages, things moved on and programs lost their reliance on line numbering and were stored in text files instead. This had numerous advantages, especially that we could now apply some basic formatting to our code to make it easier to read – the best of which was indentation.

These days, some programming languages are entirely dependent on indentation. Python, for example, literally will not work without lines of code being correctly indented to show which block of code they belong to. We rarely think about indentation now because modern IDE’s and code editors will often automatically indent our badly written code for us and tidy it up as we go.

Indentation does not make a single difference to how code is executed by the CPU, when the compiler or interpreter converts code into machine code, all of the spaces, tabs etc are deleted and ignored. Indentation is purely for us as programmers to make code which is logically laid out, easy to follow and intuitively shows which lines of code belong to which “blocks” in the program. In the example above, you can quite obviously see that “total = total + 1” belongs inside the IF statement and is only executed if that evaluates to true.

Commenting

Tip 1: Don’t bother. Tips 2-6: See tip 1.

Finally, the most obvious and important method of making code maintainable – comments. Comments are the answer to all maintainability problems, they are the golden ticket. A well maintained program will have a huge amount of comments, indeed they will have more comments than actual lines of code.

As with variable names, there are lots of conventions and ideas about how comments should be used, but a general rule of thumb is that at the top of each file containing code, there should be a block of comments that explains what the code does, what data comes in and what data is saved or sent out. Important variables should be identified and their purpose explained. Each block of code, whether that is a function, procedure or simply a complex algorithm should be preceded by a block of comments explaining what is happening, purpose etc.

Commented code – it all makes sense now.

In particularly complex algorithms, each line should have a comment which explains what is going on. This not only helps the code become easier to maintain, but gives a future programmer the most valuable of insights – what the previous programmer was thinking. Because there is always more than one way of solving a problem, what may have been obvious to the first programmer may be complete gibberish to the next – we all think differently and commenting code can go a long, long way to fixing this issue.

All programming languages have the facility to add comments to code and all comments are ignored by compilers and interpreters, so there is no concern that lengthy comments will somehow make the program slower or less efficient than one without comments. In other words – there is no excuse for not commenting code.

The only problem is programmers hate writing comments because it slows them down, might break their chain of thought or simply because adding them in is… boring!

2.3.2 – Testing

Without wishing to sound dramatic, thorough testing of code can be the difference between life and death. If there is a small bug in the latest Rockstar Games AAA title, someone might be a little disgruntled when their save game doesn’t quite work as they expected. If there is a small compound rounding error in the autopilot for a passenger plane, the outcome could be the deaths of hundreds of people.

Whilst not glamourous or even particularly interesting, testing is more important than writing the code for a system. People are weird and will find ways of breaking programs that programmers wouldn’t have even considered – there is no greater surprise to a system developer than when a user does something with your program that you never envisaged.

When Microsoft were developing Windows 95, usability was one of their key goals. They were trying to bring PC’s to the masses, and that meant making them easy to use for people who’d never even seen a computer before. During development, they did lots of tests with members of the public and the developers would watch with equal amounts of amazement and confusion through one-way windows, as users would completely fail to complete simple tasks such as printing a document. What the developers thought was utterly, blindingly obvious was a complete mystery to novice computer users.

As the victims of the Horizon Post Office scandal discovered, computer systems are almost implicitly trusted and relied on by organisations, to the point where the results they produce can be used to bring prosecutions against innocent people. A lack of testing in the Horizon system meant that Fujitsu and the Post Office were convinced that Post Masters had cheated the accounting system out of thousands of pounds. Multiple convictions were made in court, changing the lives of hundreds of people and their families. Turns out the software contained errors which meant it regularly reported incorrect accounting figures.

Testing is important.

The Purpose of Testing

This is a classic exam question. Programs are tested for the following reasons:

To try and break them!
To ensure the program works as expected.
To check the program meets all user / client requirements.
To check how usable the program is – will users be confused?
To prevent crashes and data loss when the program is in use in the real world.

Ultimately, testing takes place for one overarching reason – so that problems can be discovered and fixed before software is released to users. The reasons why this is important should be obvious and they are:

It saves money. Fixing errors after release is far more expensive than during production.
Reputation – consumers and users do not like buying and using products that are broken. Their tolerance for dealing with errors is usually low and users will find alternatives if a product does not work.

Types of Testing

You’d think it would be great to be a games tester for a living. However, the reality is not the dream you think it is. Testers who worked on GTA V reported the job was mind numbingly dull and repetitive. They would sign in to work in the morning – handing over all smart phones, watches and bags because of privacy and security concerns. Then, they would sit in a room with a certain level, scenario or mission to test and there would be a prescribed set of tests that had to be carried out that day. They would then play the same part of the game over and over and over again until the tests were complete. I’m fairly sure you’ve come up against a hard section in a game before and perhaps you over came it in the end, but if I asked you to play the same part of a game 8 hours straight you’d probably shrug and walk away.

Testing is an arduous, complex and long task. Testing needs to be extremely detailed and methodical so that no part of the system can be left untested. Each part, each function or button must be tested in every conceivable scenario to ensure that when the program is released, all bugs have been found and corrected. Failure to do so can result in some hefty bills for the developers, damage to their reputation or as we discussed earlier – even loss of life.

Errors in programs fall under one of two main types – syntax or logical. Syntax errors are your best friend. Logical errors can be the most frustrating things in the world to find, let alone fix.

Syntax Errors

I cannot begin to imagine how interesting a newsletter dedicated to debugging would be. No, really, I literally cannot imagine it.

Syntax errors are errors in the spelling or grammar of program code.

When you learn to program, you realise that there is a very fixed way in which code can be written. This is because a programming language is based on pure logic – true or false rules. When you write a line of code it is either logically correct or it is logically incorrect, there are no grey areas here.

The grammar rules, or the way in which you are allowed to use key words in your code is often called the “syntax.” Therefore, if you get this wrong, you generate a syntax error.

Syntax errors are easy to fix for two reasons:

Nearly 100% of IDE’s and code editors will highlight or alert you in some way to syntax errors as you write your code.
Programs will not run if they have a syntax error in them.

This is brilliant because when you click run… nothing happens! This is quite the obvious indication that something has gone wrong. Of course, an IDE will give you an error message to alert you to the fact you have syntax errors and will provide a host of tools to help you correct them. If a program does not run, then you have no choice but to fix any problems!

Logic Errors

Logic errors are nasty. If you get over the hurdle of fixing syntax errors in your code, you’ll probably be very relieved when your program runs. That must mean that it works! Well… Maybe it does. Logic errors do not stop the code from executing, instead logic errors in your program will produce unexpected results. The biggest problem here is that some logic errors may not come to light under normal use and the program may work perfectly under normal, light or simple testing conditions.

Our definition of a logic error is an error which produces unexpected results when the program is run. To be clear – the program still runs.

Logic errors are extremely hard to find because you have to then work out where the error is in the code and why the results are incorrect. This is where a range of debugging tools come in to their own – breakpoints, code stepping, variable watch are all valuable tools which can help programmers to step through the code and observe exactly where incorrect results are being generated.

Some logic errors are easier to spot, for example where < has been used instead of > or a missing symbol. Others can be far more difficult such as when a formula does not produce the answers you expected or just simply the logic is so wrong you need to re-think how you coded an entire section of the program!

Test Plans and Test Data

Ok, I admit, I completely failed when I Googled “Test Plans” but I do have questions – why is he wearing a hard hat, indoors, to give a presentation? What kind of reception is he used to when he starts talking?!

I would hope by now that you understand the importance of testing and why it is an essential phase of the software development process. The final piece of the puzzle is to create a test plan which will ensure two things – that all areas of the software are thoroughly tested and secondly that there is clear documentation of the tests that were carried out and the outcomes of each test.

There are three main types of test that can be carried out on a particular part of a software system. These are:

Normal – the data used is exactly as the system would expect, or you would expect the user to enter.
Incorrect/erroneous – the data used is deliberately incorrect, or of the wrong type.
Boundary/extreme – the data used is correct but it is at the very edge of what’s acceptable. For example in a date field, 1 and 31 would be boundary data – you can’t have dates outside this range.

These tests can be carried out at two different points of the development process:

Iteratively, during the development of the program. Usually this takes place as each small section/block/procedure/function has been completed. These tests inform the development process.

It is much easier to iteratively test code as there is usually only a small amount of code to be tested. Once a fix has been applied, tests can be repeated to check that the changes have actually worked.

Terminally – terminal testing is carried out at the end of the development process. This is to ensure that the whole program functions as expected. Most terminal testing is to ensure that the program functions in the expected way and that the program performs as expected in a production or “real world” environment.