2.1.3 – Searching and Sorting Algorithms

This is one of the most straightforward parts of unit 2. All of the algorithms here have easy to understand methods and are sufficiently different that you shouldn’t confuse one for the other. As usual, at GCSE level, you need to know the algorithm and some advantages and disadvantages. It seems like a lot at first but once you have been through the methods you shouldn’t have too much difficulty remembering them.

Just remember, before we start, that an algorithm is a set of logical steps used to solve a problem.

In this section (click to jump)

Sorting Algorithms

Sorting algorithms are used to put data in ascending or descending order. For simplicity, we use numbers to illustrate these algorithms and show you how they work, but they also apply equally to text based data. In the exam you may be asked questions which use numbers, letters or words – you’ll be fine as long as you remember to double check whether the sort is ascending or descending!

Note that, whilst you don’t need to remember the code for these algorithms off the top of your head, the specification does say that you should be able to recognise the code. That’s a bit of a sitting on the fence situation for you because without learning the code, you’d find it hard to recognise…! My advice is that if you struggle with code in general, you should look for the key characteristics of each algorithm in the code examples and hopefully you will see some differences that stand out to help you differentiate each one.

Bubble Sort

Bubble sort is the simplest sorting algorithm there is. It’s also the most useless as you will soon find out…

Bubble Sort Method

To sort an array of numbers, bubble sort will perform the following steps:

Set a flag (boolean variable) to record if any swaps were made:
Swaps = FALSE

Starting at position 0 in the array (the start):

Step 1: Get two elements in the array - A and B
Step 2: Compare them. If element A is bigger than element B then:
        Swap the elements
        Set Swaps = TRUE
Step 3: Move one position up the array
Step 4: Repeat from Step 1 until the end of the array is reached
Step 5: If Swaps = TRUE, repeat the entire algorithm.

Bubble Sort Advantages

  • It is a very simple algorithm to implement in code
  • It is a simple algorithm to understand
  • It works on very small arrays of numbers

Bubble Sort Disadvantages

  • It is an extremely slow and inefficient algorithm
  • Bubble sort has an exponential worst case time complexity. This means that as the size of the data set is doubled, the time taken is squared. In other words – bad.
  • It will effectively take forever to sort large data sets.

Bubble Sort Code

//The following code would bubble sort an array called "sortData[]" 

swaps = true //no swaps have been made at the start but we set this flag to true so that the loop will work at least once.

//we will loop through the list repeatedly if a swap has been made
While (swaps == true)

swaps = false //no swaps have been made yet this time round the loop.

for i=0 to sortData[].length - 1 
//we will loop round the data set checking each pair of numbers to see if the first is bigger than the second

    if sortData[i + 1] > sortData[i] then 
        //the first number is bigger than the second, so need to be swapped

        temp = sortData[i + 1]
        sortData[i + 1] = sortData[i]
        sortData[i] = temp 
        //these three lines do the swap in the array
        
        swaps = true //a swap was made, so set the flag
    end if

    next i 

end while

Merge Sort

Merge sort breaks a list of numbers down into atomic elements (lists containing one item each) and then combines these lists together to sort it into order. This is often referred to as a “divide and conquer” approach. In the exam, the focus is on the reconstruction of the list rather than the first few steps involved in breaking the list down into individual elements.

Merge sort example

Imagine we have the list of numbers shown below and they need to be sorted into ascending order.

Step one: Break the list down into “atomic lists.” This is far less exciting than it sounds – an atomic list is a list with only one single element in it.

The image below shows seven numbers, all in their own list:

Step two:

  • Take a pair of lists.
  • Get the first element from each list and compare.
  • Whichever element is smallest, put this in a new list.

In this example, we compare 23 and 4. 4 is obviously smallest so it is moved in to the new list as shown:

The process repeats:

Take the first element in each list, compare and then add the smallest to the new list.

This continues until all elements from the two lists have been placed in a new list. However, there is a problem! What do we do when one list is empty and the other still has some elements?

In our example, the next step is to compare 23 with….? In the case of an empty list we assume it contains “infinity” – therefore all numbers left to compare will be smaller.

So, 23 is less than infinity, so we add it to the new list.

Our first pair of lists is now empty and we move to the next pair of lists and repeat the exact same actions – compare the first number in each list, add the smallest to the new list.

We have now combined 8 and 99 successfully into a new list. Again, repeat the process – get the next two lists, compare and combine:

We now hit our final problem – what do you do if there is an odd number of lists? As you can see above, the list containing the number 1 is on its own – we do not have another list to combine this with.

The answer is simple – just leave it! In the next iteration, we will again combine pairs of lists and at that point we can fix the problem of the lonely list…

After one iteration of the algorithm, our collection of lists now looks like this:

Step 3: Go back to step two and repeat the process:

  • Take a pair of lists,
  • Get the first element from each list
  • Compare – place the smallest element in a new list
  • Continue this process until all elements have been added to the new list.

Take the first pair of lists…

Compare the first elements from each – 4 and 8. 4 is smallest, so place it in the new list.

Continue until all elements have been placed in the new list:

Take the final two lists, compare and combine as before:

All elements have now been compared and combined into a new list.

This is the final iteration.

We now have only two lists remaining. The comparing and combining is exactly the same as before:

When all numbers have been compared and placed in the new list, giving only one list containing all elements, then the list is sorted:

Merge sort code

I'll let you in to a secret. At GCSE you don't need to know the code for Merge Sort. This is because it uses a dark, mythical technique passed down by generation after generation of the most beardy computer scientists. Only those who are true of spirit and brave of heart ever get to know of it's legendary powers. Only the foolish would attempt to use it without the guidance of a 10th dan master of algorithms.
The force of which we speak is.... Recursion!

Seriously, we do it at A-Level and it makes your brain hurt.

Insertion Sort

Everyone says it and I can’t help myself either – insertion sort is the method you’d probably use to put a pile of cards in order. You start with an unsorted list and then gradually build a sorted list by taking one element at a time and… inserting it into the correct place in the sorted list.

There are a couple of ways of doing this:

  • Take an element off the unsorted list
  • Compare with the first element in the sorted list
  • if it is smaller, insert at position one
  • If not, move up the list one element/position/index
  • Repeat as above – compare and if smaller, insert. If not, move up one element in the list

In code you can just add the unsorted element to the end of the sorted list, then compare with the previous element and swap them over if the new element is smaller. Have a look at the code example below.

Here’s a pseudocode algorithm that OCR printed in a recent exam:

names = ["Kareem", "Sarah", "Zac", "Sundip", "Anika"] //an array of names to sort

for count = 1 to names.length – 1 //loop through the array
    pos = count //where to start in the sort in the array

    //loop round, moving elements up the array, past the current
    //element until it is in the correct position
    while (pos > 0 and names[pos] < names[pos – 1])
        temp = names[pos]
        names[pos] = names[pos – 1]
        names[pos – 1] = temp
        //move an element up the array

        pos = pos – 1 //move back one element in the array
    endwhile
next count

Method and Example

Set up:

Make a new list – call this the “sorted list.”

Take one element from the unsorted list – place this in the sorted list. It’s in the correct place!

Main algorithm:

  1. Take an element from the unsorted list.
  2. Place it at the end of the sorted list.
  3. Is the new element bigger than the previous element in the sorted list?

                        If yes – leave it, it is in the correct place.

                        If no, move it one place to the left. Repeat step 3 until it is in the correct place.

Example:

Sort the following list of numbers:

23491125
Sorted ListUnsorted List
23   23 is placed in the sorted list. It is the first element so it is already in the correct place.  49, 1, 12, 5
23, 49   49 is placed in the sorted list at the end. It is compared to 23. Because 49 is bigger than 23 it is in the correct place.  1, 12, 5
23, 49, 1   1 is placed at the end of the list. It is compared with 49. 1 is not bigger so it is moved down one place:   23, 1, 49   1 is compared with 23. 1 is not bigger so it is moved down one place:   1, 23, 49   1 is now at the start of the list and in the correct place.  12, 5
1, 23, 49, 12   12 is placed at the end of the list. 12 is compared with 49, it is not bigger than 49 so it is moved down one place.   1, 23, 12, 49   12 is compared with 23, it is not bigger than 23 so it is moved down one place.   1, 12, 23, 49   12 is compared with 1, it is bigger so it is now in the correct place.  5
      1, 12, 23, 49, 5   5 is placed at the end of the sorted list. It is compared with 49. It is not bigger so it is moved down one place.   1, 12, 23, 5, 49   5 is compared with 23. It is not bigger so it is moved down one place.   1, 12, 5, 23, 49   5 is compared with 12. It is not bigger, so it is moved down one place.   1, 5, 12, 23, 49   5 is compared with 1, it is bigger so it is in the correct place. 
1, 5, 12, 23, 49The unsorted list is empty so stop.

Searching Algorithms

Searching algorithms are used when we need to find a specific item in a given data set.

And people say maths isn’t exciting….

This is by far the simplest algorithm you will ever learn.

Method:

  • Take a list of values.
  • Start with the first element in the list.
  • Compare with the target value – if they match, report found and stop.
  • If they do not match, move to the next element in the list.
  • Repeat until a match is found or the end of the list is reached.

The above method is a more complex way of saying “look at each element, one after the other, until you find the value you’re looking for.”

There is no intelligence or optimisation in Linear Search – it is a brute force method in every sense of the phrase, systematically trying until a value is found or not found. This might seem like an inefficient or even stupid method of finding a piece of data but it does have some advantages that should not be overlooked. You must also remember that computers love this kind of repetitive logic task and is something that an average CPU can do many billions of times per second.

“…and I still haven’t found what I’m looking for.” Should’ve used Linear Search, Bono. Schoolboy error.

Advantages of linear search are:

  • The list does not need to be in order or sorted in any way. This reduces “overheads” meaning work that must be done by the CPU before the task of finding a piece of data can be completed.
  • It can work with multiple data types.
  • It is extremely simple to implement in code.
  • It has a worst-case time complexity which grows in a linear manner – meaning the time taken to find an element as the data set grows is proportional to the size of the data set. In simple terms, things don’t rapidly get out of control if the data set grows in size.

Obvious downsides of linear search are:

  • The larger the data set grows, the longer the average time to find an element becomes.
  • Performance is affected by where the target is in the data set – the closer to the end of the list, the worse performance will be.

Example code:

Counter = 0

Found = FALSE

WHILE found == False AND counter <= list.length()

     IF list[counter] = find THEN

          found = True

          PRINT (“Found at position: “ + counter)

        ELSE

          counter = counter + 1

        ENDIF

ENDWHILE

IF found = False THEN

   PRINT (“The item is not in the list”)

ENDIF

Worked example

Find the value 3.

Step 1: start at index position 0, get the element in this position, compare to the target of 3.

22 is not equal to the target of 3 so move on.

The current index is now 1. Repear the process – get the element at position 1, compare to the target of 3.

909 is not equal to 3 increment the index counter to 2, move to the next element in the list.

The element has been found, we can no return “true” or “success” and also report at which index it was found.

The weird thing about Binary Search is that there’s no binary involved. The term comes from a binary choice that is made during each iteration – do we throw away bigger numbers or smaller numbers?

Binary Search is one of the most effective and elegant algorithms in computing. Not only is it relatively simple to understand, its performance is incredible – even for extremely large data sets. Binary Search has only one downside, that the list must be in order (sorted) before the search can take place. This fact should not be overlooked as with large data sets this could result in huge overheads. If you were planning to search a data set of billions of items, it would be wise to consider storing elements in order in the first place!

Method:

Pre-requisite: The list must be sorted.

  1. Find the median element (middle)
  2. Should there be no direct median, simply choose the element to the right (or left, it actually doesn’t matter as long as you stick with this decision each time).

Once the median has been found:

  1. Compare the median with the target – if they match, stop.
  2. If the median is lower than your target, discard all smaller elements in the list (including the median)
  3. If the median is higher than your target, discard all larger elements in the list (including the median)
  4. Repeat the algorithm with the new, shorter list.

Example:

“Find the letter X “

Iteration 1:

Find the median, which is “M”

M does not match with the target of X.

M is SMALLER than X so we discard all smaller letters, including M.

Our new list is as below:

Iteration 2:

The median is T.

T does not match with the target of X.

T is SMALLER than X, so we discard all smaller letters, including T.

Our new list is as below:

Iteration 3:

The median is X.

X is equal to the target of X, so we have found our target at position 23. Stop.

Example Code:

The example below finds the number 7 in a list called “list”

Find = 7, Start = 0, End = list.length()

Found = False

WHILE Found == False AND Start <= End

     Mid = (Start + End) DIV 2

     IF list[Mid] = Find THEN

        PRINT(“Found at ” +  Mid)

        Found = True

     ELSE

        IF List[Mid] > Find THEN

           End = Mid - 1

        ELSE

          Start = Mid + 1

        ENDIF

     ENDIF

ENDWHILE

Note, the code for binary search is often not implemented in this way. The process normally used is called “recursion” which is beyond the scope of your GCSE. For this reason, it is unlikely you will see many code based questions on Binary Search in your exam – but not impossible!

Summary:

  • Binary search is an extremely efficient searching algorithm
  • It has a logarithmic worst case time complexity, which is a posh  way of saying “it hardly slows down at all when the size of the data set increases.”
  • Binary search must be performed on a sorted list or it will not work
  • Binary search performs so well because it is reducing the size of the data set by half during each iteration.
  • You may not see any great improvement over other searching algorithms if you are searching a very small data set.