3 Functions and Loops

3.1 Functions

Functions in mathematics, is an assignment of an element from a set to a different set. Here we give a more “practical” definition.

In the realm of programming, a function is a reusable piece of code that performs a specific task. Think of it as a mini-program within your program, a tool you create to do a job so you don’t have to. And as a program does, you give it an input, it will do some voudou magic, and it will spit an output.

What’s the point of a function you might ask? Well, in programming, we use functions when we have tasks that must be performed again and again. When we notice a repetitive task occurring, it is a good idea to write a function that performs this task. If you find yourself copying and pasting the same piece of code more than three times, it’s time to stop and think: “Should I write a function for this?” Remember, in the world of coding, being lazy is often a good thing! We aim for efficiency, not repetitive strain injury.

Imagine if every time you need to compute the square root, you need to write the code to compute the square root over and over again. Well, but in practice, you don’t have to! There’s a function that does the job for you already: sqrt In fact, it won’t come as a surprise that we’ve already encountered many functions in the past weeks. For instance:

  • Mathematical functions like sqrt() or sin() in R, and math.sqrt() or math.sin() in Python.
  • Functions like np.array() or c() to create vectors.
  • Functions like length() in R or len() in Python to get information about our data structures.

Again, each function I mentioned, has an input, and an output: sqrt takes a number as input and returns the square root of such number, length or len take a whole vector as input, and return the length of it…

Another incredibly useful function is the help() function, available in both R and Python. This function takes a function as an input, and provides us with information said function as output! For example:

Whenever you will meet a built-in function, or a function from a documented external library you can call the help on the function to have an explanation about it! Documentation is a requirement in R libraries, so pretty much all the functions you will meet are documented! Unfortunately this is not the case for Python, where documenting functions is not a requirement, so not always the help will return something.

3.1.1 Writing new functions

The real power of functions comes when we start creating our own. We can write new functions that perform specific tasks, tailored to our needs. This allows us to do more complex, interesting, and fun things with our code. So let’s dive in and learn how to create our own functions! We can wrap our code in a function, and every time the function is called, this code is run. This is incredibly useful for tasks we need to perform multiple times. Let’s build for example a simple function that converts pounds (lbs) to grams (g).

Mathematically, this is simply done by the formula: \[ g = 453.5924 * lbs \]

In R, we define a function using the function() command. The arguments of the function are placed within the parentheses. Here’s how we can create a function in R:

In this function, lbs is the input (or argument), and the function returns the equivalent weight in grams. In R, we create a new function with the function statement, and then we assign it to a variable, which is going to be containing our function. The return() statement is used to specify the result that the function should output.

In Python, we define a function using the def keyword. The arguments of the function are placed within the parentheses. Again we need to use indentation: this is crucial in Python as it determines the code blocks. Here’s how we can create a function in Python:

In this function, lbs is the input (or argument), and the function returns the equivalent weight in grams: following the def keyword we need both the function name, followed by the argument. The return statement is used to specify the result that the function should output.

Now, once we have made the new function, we can call it with:

By creating functions like these, we can make our code more efficient and easier to read. Plus, it saves us from having to remember the conversion rate each time we want to convert pounds to grams!

The name of the function lbs_to_grams and its argument lbs are just names that I chose. There are a few guidelines that could be useful when naming functions:

  • Names should be lowercase.

  • Use an underscore, _, to separate words within a name.

  • Strive for names that are concise and meaningful (this is not easy!).

  • Avoid existing function names in R and python, such as length() or len().

Also, you might have noticed we made a variable within the function code, grams. But what does it mean to have variables within functions? The variables used inside a function are local to that function. Think of your function as a guarded sandbox, where no child leaves unless you specifically tell them to. This sandbox is called and enviroment. Let’s cover this concept formally! Enviroments

In programming, an environment refers to a structure that holds variables. When you create a variable in a program, the environment is where this variable lives. The environment keeps track of the variable’s name and its current value.

There are two types of environments: global and local.

  • The global environment is the default environment where your variables live unless you specify otherwise. When you create a variable at the top level of your script, it’s stored in the global environment.

  • A local environment is created when you call a function. Each time a function is called, a new local environment is created for that function call. This environment holds the variables that are created within the function. These variables are only accessible within the function call and cease to exist once the function call is over.

Let’s consider the lbs_to_grams function we made:

In these functions, lbs and grams are variables in the local environment of the function. They are created when the function is called and cease to exist when the function call is over. If you try to access grams outside of the function, you’ll get an error because grams is not in the global environment.

On the other hand, glo_grams is in the global environment because it’s created at the top level of the script, not within a function. You can access glo_grams anywhere in your script even within a function.

This distinction between global and local variables helps keep our code clean and reduces the chance of errors. It ensures that the function does its job without interfering with the rest of our script! However, it can be prone to errors too. Say for instance, I make a typo in the argument name of my function above, and, for whatever reason I created a lbs variable in the global… Like for instance:

What happened above, is that I pass down as an input the lb argument (without the s) but this is nowhere used in the function as inside the code, at grams = 453.5924 * lbs I call the lbs. Now since the compiler can’t find any lbs within the local enviroment, it will just assume that this is the lbs I specified in the global. Hence, no matter what I feed to the function, it will return the evaluation with the global lbs. For this reason try not to name the variables inside your functions as those outside it, in the rest of your script. What happens if I do name a local variable and global variable in the same way? Well, in this case, the function will use the local lbs variable, and the global will use the global lbs variable but this is still risky and prone to bugs.

For those who are newer to programming, don’t worry if the idea of dealing with bugs and errors seems daunting. One of the most effective ways to understand what’s happening in your code is to use print statements. If you were confused by this section, try to add those print statements to display the value of variables lbs and grams within the functions above! In fact, by strategically placing print statements in your code, you can see the values of variables at different points in your program’s execution. This can help you understand how your code is working and where potential issues might be. As you gain more experience and confidence, you’ll naturally start to develop more advanced strategies for managing and resolving bugs in your code. Remember, everyone was a beginner once, and every expert has made plenty of mistakes along the way. So don’t be discouraged by bugs in your code - they’re just opportunities to learn and improve!

NOTE for the more experienced users. It’s worth noting that debugging tools can be incredibly helpful for identifying and resolving issues in your code. A debugger is a program that helps you inspect what’s happening in your program while it’s running. It allows you to pause your program, inspect the values of variables in a given environment at any point in time, and step through your code line by line. This can be particularly useful for identifying issues like the one we’ve discussed above. RStudio includes a built-in debugger that can be a great help in these situations. You can learn more about debugging in RStudio in this guide. Multiple arguments

Say we want now to convert pounds to milliliters. To convert from mass to volume, we will need an additional information, the specific mass. In physics, the specific mass (also known as the volumetric mass density) of a substance is the mass per unit volume.

Fortunately for us, functions can take multiple arguments, allowing us to give more elements to the local environment of the function. This means we can customize the function’s behavior based on these inputs. Let’s see how we can add more arguments to a function.

We’ll create a new function to convert pounds to milliliters. This function will take a second argument: specific_mass. We’ll use the function we created earlier to convert pounds to grams, and then, using the specific_mass, we’ll convert grams to liters.

Here’s how we can do this in both R and Python:

To run the functions we just made:

Let’s say that 90% of the time when we are doing these calculations, they are relative to using water. Then, to be more efficient, rather than continuously having to give the mass of water, we could set the arguments to have these as default values.

This is where default arguments in functions come into play. They are incredibly useful as they allow us (and any potential future user) to make certain parameters optional. This makes the function easier to use and the the code more readable in general!

Here’s how we can do this in both R and Python:

As you can see, default arguments make our function more flexible and easier to use. They allow the function to handle a wider range of scenarios while keeping the code clean and readable.

3.1.2 Exercise: cups to grams converter

So I once watched this movie with a small rat chef named Remy. Like it’s a very popular one, but I can’t say the title for copyright reasons. Anyway, let’s say that one day, Remy decides to leave his home in France and set sail for the culinary world of the United States.

But as soon as Remy gets onto American soil, and he starts to explore American recipes, he encounters a problem. All the measurements are in cups! Back in France, he was used to grams and liters. “Mon Dieu!” he exclaimed, “How am I supposed to cook with these cup measurements?” [Imagine this phrase with a French accent].

But Remy is not a rat to be easily defeated: To make his life easier, he decides to hire a programmer on Fiverr to convert cups to grams. You are that programmer.

As we did with the functions we created before, you should:

  1. Create a new function cups_to_ml that takes as input the number of cups and returns as output the corresponding value in milliliters. Use the relation \[ cps = 236.588 * ml \]
  2. Create a new function called ml_to_grams. This function, as above, will need the specific weight, but now the relation is given by the inverse: \[ g = ml * \rho, \] where \(\rho\) is the specific weight.
  3. Create a last function called cups_to_grams. This function should:
    1. Take as input two arguments: the number of cups, and a string that specifies an ingredient, e.g. "flour" or "water". Default this second argument to water.

    2. Convert the amount of cups in milliliters using the function cups_to_ml.

    3. With an if-else if-else statement, it should check the argument ingredient, and set a specific_weight variable based on the ingredient. You can find the values in the table below:

      Ingredient Specific Weight (g/ml)
      water 1
      flour 0.53
      oil 0.92
      oat Milk 1.03
    4. Call the ml_to_grams function with the relative ml and specific weight computed above, and return the result.
  4. Run the function to convert the following:
    1. 2 cups of water
    2. Half a cup of oil
    3. 3 cups of oat milk
    4. 2 cups of flour plus one cup of water

3.2 State and For Loop

Normally, if you have to run some operations on multiple objects, you would store these objects in a vector, and then work with the vector directly. This is what we’ve been doing in the past weeks, and what works for most of the times. For instance, let’s say we have a vector called measurements in lbs and we want to convert all these measurements to grams. Then, we can simply take our function lbs_to_grams we made above, and run it on the measurements vector:

This will run the function on each element of the vector independently.

More formally: in the first two weeks, we learned how to work with vectors and vectorised functions, designed to operate on whole vectors directly. Technically, we say that vectorised functions are trivially parallelizable because there’s no dependency between elements (the value of an element of the output vector does not depend on any other element).

However, while you should use vectorised functions as much as you can, they are not suitable for all programming tasks, particularly when the computation of an element depends on the previous ones, i.e., when there is a state involved.

A state is a scenario where certain steps in your code must be executed in a specific order because the output of one step is the input to the next step. In these circumstances, we find an answer in looping. Loops allow us to execute a block of code multiple times, which is exactly what we want in these scenarios.

The Fibonacci sequence is a classic example of a state. If you’re not familiar with it, the Fibonacci sequence is a series of numbers in which each number is the sum of the two preceding ones, usually starting with 0 and 1. The first 10 values in the series will be:

This table shows the first 10 numbers of the Fibonacci sequence, starting from \(r_{0}\) to \(r_{9}\).
\(r_{i}\) \(r_{0}\) \(r_{1}\) \(r_{2}\) \(r_{3}\) \(r_{4}\) \(r_{5}\) \(r_{6}\) \(r_{7}\) \(r_{8}\) \(r_{9}\)
Value 0 1 1 2 3 5 8 13 21 34

More formally, the Fibonacci sequence is defined by the recurrence relation:

\[ r_{i} = r_{i - 1} + r_{i - 2} \quad \text{for}\quad i > 2, \]

with initial conditions \(r_{1} = 0\) and \(r_{2} = 1\).

Well, unsurprisingly, in mathematics, you have already met the “for” statements! As in mathematics a “for” allows to build a relation, a “for” loop is a control flow statement that allows code to be executed repeatedly.

Therefore, to generate the Fibonacci sequence using a “for” loop, we could use the following algorithm:

  1. Start by defining the first two numbers in the sequence, 0 and 1.
  2. For a given number of iterations, do the following:
    • Calculate the next number in the sequence as the sum of the previous two numbers.
    • Update the previous two numbers to be the last number and the newly calculated number.

We will return to the Fibonacci example after having introduced the for loop syntax.

3.2.1 The syntax

As the syntax in R and python for looping is quite different, R and python chunks will be separate. You should be able to read each language independently.

We can code a for loop as following.

In R:

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

In this R code, to explain the syntax:

  • for (i in 1:10) is the start of the for loop. The iterator i goes from 1 to 10. 1:10 creates a vector: this is the sequence of values we want to iterate over.
  • print(i) is the code chunk that we want to repeat. In this case it prints the current value of i.
  • i is the object that stores our current index. It gets updated at each step over the sequence of values for 1 to 10.
  • The for loop ends when it has exhausted the sequence, i.e., when i has taken all values from 1 to 10.

We can also use the for loop to iterate over the elements of a vector directly, rather than their indices. For example, a for loop that prints the elements of a vector:

## [1] "apple"
## [1] "banana"
## [1] "cherry"

In this case, fruit is the iterator that gets updated at each step over the sequence with the current element of the sequence. The for loop ends when it has exhausted the sequence.

Try to edit both code chunks above, by adding a new variable called iter_counts. Initialize this variable at 0, e.g. iter_counts <- 0, and update it in the cycle with iter_counts <- iter_counts + 1. Print it at the end: what’s the value for the first cycle? And for the second?

In Python:

In Python, we can print the numbers from 1 to 10 using a for loop as follows:

## 1
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10

In this Python code, to explain the syntax:

  • for i in range(1, 11) is the start of the for loop. The iterator i goes from 1 to 10. This is the sequence we want to iterate over. We use range function to generate this figure: this function is analogue to np.arange from numpy, but generates a different object called “range” that is specific to iterators
  • print(i) is the code chunk that we want to repeat. It prints the current value of i.
  • i is the object that stores our current index. It gets updated at each step over the sequence.
  • The for loop ends when it has exhausted the sequence, i.e., when i has taken all values from 1 to 10.

We can also use the for loop to iterate over the elements of a numpy array directly, rather than their indices. For example, a for loop that prints the elements of a numpy array:

## apple
## banana
## cherry

In this Python code:

  • arr = np.array(["apple", "banana", "cherry"]) creates a numpy array with the elements “apple”, “banana”, and “cherry”.
  • for fruit in arr: is the start of the for loop. The iterator fruit goes over each element in the numpy array arr.
  • print(fruit) is the code chunk that we want to repeat. It prints the current value of fruit.
  • fruit is the object that stores the current element of the array. It gets updated at each step over the sequence.
  • The for loop ends when it has exhausted the sequence, i.e., when fruit has taken all values in the numpy array arr.

Try to edit both code chunks above, by adding a new variable called iter_counts. Initialize this variable at 0, e.g. iter_counts = 0, and update it in the cycle with iter_counts = iter_counts + 1. Print it at the end: what’s the value for the first cycle? And for the second?

3.2.2 Coding The Fibonacci sequence

Having covered how to write a for cycle, we come back to our Fibonacci example. Differently from the loop above, in order to implement our algorithm, we will need to update a variable within our loop. In our case, this variable, is going to be a vector storing our Fibonacci sequence, called fibonacci. At iteration 1, the vector will be 2 elements long (the first two number of the sequence), but as we go through the loop, we will be adding more and more elements to the vector.

In the R code:

  • fibonacci <- c(0, 1) initializes the first two numbers in the Fibonacci sequence.
  • for (i in 3:20) is the start of the for loop. The iterator i goes from 3 to 20. This is the sequence we want to iterate over.
  • fibonacci[i] <- fibonacci[i - 1] + fibonacci[i - 2] is the code chunk that we want to repeat. It calculates the ith number in the Fibonacci sequence as the sum of the two preceding numbers and appends it to the fibonacci vector. Given that at iteration i we have no element i in the vector yet (this is still i-1 long), R will automatically extend the vector of one extra element. Alternatively, you can use the c(fibonacci, fibonacci[i - 1] + fibonacci[i - 2]) to concatenate an extra vector (with one element) to the existing vector.

In the Python code:

  • fibonacci = [0, 1] initializes the first two numbers in the Fibonacci sequence.
  • for i in range(2, 20) is the start of the for loop. The iterator i goes from 2 to 19. This is the sequence we want to iterate over.
  • fibonacci.append(fibonacci[i - 1] + fibonacci[i - 2]) is the code chunk that we want to repeat. It calculates the ith number in the Fibonacci sequence as the sum of the two preceding numbers and appends it to the fibonacci list using the append() method. The append method extends our vector, adding a new element at the end of the existing one. Alternatively, you can use the fibonacci + [fibonacci[i - 1] + fibonacci[i - 2]] to concatenate an extra vector (with one element) to the existing vector.

Our vectors, where we stored our results at every iteration, should be 20 elements long, containing the sequence.


##  [1]    0    1    1    2    3    5    8   13   21   34   55
## [12]   89  144  233  377  610  987 1597 2584 4181


## array([   0,    1,    1,    2,    3,    5,    8,   13,   21,   34,   55,
##          89,  144,  233,  377,  610,  987, 1597, 2584, 4181])

Before you proceed make sure you understand what’s going on inside the for loop. Add extra print statements to understand what’s happening at every iteration, e.g. try to add: print(i), print(fibonacci[i]) and print(fibonacci[i-1]) before the computation, or print(fibonacci) after the computation has been done. print statements are extremely useful for debugging for cycles! Safer for loops

The for loop above is straightforward and works fine for the specific task of computing the Fibonacci sequence alone. However, it could potentially lead to issues if you wanted to modify or extend the code. For example, if you wanted to change the calculation logic or use it in a different context, you would need to modify the loop itself. This could introduce errors and make the code harder to maintain.

However, there is an approach that is generally safer and more robust: that of working with functions within your loops.

Have a look at the code below:

This solution is better on a lot of ways. The calculation logic of the Fibonacci sequence is encapsulated within its own function, update_fibonacci. Now, suppose you found a better method to deal with that, you’d only need to change that part (spoiler: you can, ask a GPT model if it has ways to improve on update_fibonacci).

Also, the computation of the full Fibonacci sequence is encapsulated in its own function, compute_fibonacci! This makes the code more organized and less prone to errors due to variables defined elsewhere as it does not touch global variables.

For this reason, you can easily reuse the compute_fibonacci function in other parts of your code or even in different programs! For instance, this function might turn out to be useful in the next exercise, where you’ll have to call it with n=10.

3.2.3 Exercise

Write now a new function my_cusum to compute the cumulative sum of a list of numbers, using a for loop. Use the function to calculate the cumulative sum of the first 10 elements of the Fibonacci sequence.

The function should:

  • Take as input a vector

  • Initialize a new vector to store the cumulative sum

  • Then, in a cycle, update this vector with the cumulative sum of the elements of the input vector

    • The output vector should be of the same length of the input vector

    • The first element of the output vector should be the same first element of the input vector

  • Using the code above, generate a Fibonacci sequence of 10 numbers

  • Run the function my_cusum(fibonacci) to obtain the cumulative sum of the first then numbers of the Fibonacci sequence.

  • Compare the output of my_cusum with the built in function cumsum (in R) or numpy’s np.cumsum (in Python).

3.3 More about looping

In this section we will be giving few extra details and concepts you should know about looping. If you got familiar with the for cycle above, they should be fairly straightforward to understand!

3.3.1 Nested loops

Nested loops are useful when we have to repeat a block of code for each combination of elements from two or more sequences. This can be incredibly useful in many situations, such as when we want to perform an operation for each pair of elements in two lists or vectors.

A simple example, for instance, could be a nested loop that prints the multiplication table from 1 to 4.


## [1] "1 x 1 = 1"
## [1] "1 x 2 = 2"
## [1] "1 x 3 = 3"
## [1] "1 x 4 = 4"
## [1] "2 x 1 = 2"
## [1] "2 x 2 = 4"
## [1] "2 x 3 = 6"
## [1] "2 x 4 = 8"
## [1] "3 x 1 = 3"
## [1] "3 x 2 = 6"
## [1] "3 x 3 = 9"
## [1] "3 x 4 = 12"
## [1] "4 x 1 = 4"
## [1] "4 x 2 = 8"
## [1] "4 x 3 = 12"
## [1] "4 x 4 = 16"


## 1 x 1 = 1
## 1 x 2 = 2
## 1 x 3 = 3
## 1 x 4 = 4
## 2 x 1 = 2
## 2 x 2 = 4
## 2 x 3 = 6
## 2 x 4 = 8
## 3 x 1 = 3
## 3 x 2 = 6
## 3 x 3 = 9
## 3 x 4 = 12
## 4 x 1 = 4
## 4 x 2 = 8
## 4 x 3 = 12
## 4 x 4 = 16

NOTE for Python users In the print statement print(f"{i} x {j} = {i*j}") we used f-string formatting, which was introduced in Python 3.6 (the code above will break on previous versions). It’s a way to embed expressions (like i*j) inside strings, using curly braces {}. The expressions will be replaced with their values when the string is created. The letter f at the beginning of the string tells Python to allow these embedded expressions.

In our example {i} and {j} will be replaced by the values of the variables i and j, and {i*j} will be replaced by the result of the expression i*j. So if i is 2 and j is 3, the string would become "2 x 3 = 6". F-string formatting can be very useful for debugging because it allows you to easily insert the values of variables into strings, which you can then print to see what’s happening in your loops!

3.3.2 While loops

Similarly to for loops, we have the while loops. They allow us to repeat a block of code until a certain condition is met. This can be incredibly useful in many situations, such as when we want to perform an operation until a certain threshold is reached.

We can see how a while loop works below:


## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10


## 1
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10

In these codes:

  • i <- 1 and i = 1 initialize the counter at 1.
  • while (i <= 10) and while i <= 10: start the while loop. The condition i <= 10 is what we check at each step. If it’s true, we execute the code chunk inside the loop.
  • print(i) and print(i) are the code chunks that we want to repeat. They print the current value of i.
  • i <- i + 1 and i += 1 are crucial. They update i at each step, ensuring that our condition will eventually be false. Without these lines, i would always be 1, the condition would always be true, and the while loop would run indefinitely.
  • The while loop ends when the condition i <= 10 is no longer met, i.e., when i is greater than 10.

We are missing one last ingredient to looping! Break and next.

3.3.3 Break and next/continue

Break and next/continue are control flow statements that can be used to alter the flow of a loop. They can be used when we want to stop a loop if a certain condition is met or skip an iteration if a certain condition is met.

To see how they work, we write a loop that prints the numbers from 1 to 10, but skips the number 5 and stops after the number 8.

Let’s break it down:

  • if (i == 5) { next } (R) and if i == 5: continue (python) are the next/continue statements. If i is equal to 5, they skip the rest of the loop and continue with the next iteration. 5 won’t be printed!
  • if (i > 8) { break } and if i > 8: break are the break statements. If i is greater than 8, they immediately terminate the loop, regardless of the loop condition. As a result, we stop at 8, and don’t continue to up to 10. break could be an alternative stopping condition to a while loop.

Using break and next (or continue in Python) statements in loops can sometimes make the code harder to understand and debug, because they can lead to unexpected jumps in the control flow. This is especially true in more complex loops where it’s not immediately clear when or if the loop will be prematurely terminated or skipped.

Usually, there’s always alternatives to breaks. Let’s consider a simple example where we want to find the first number in a list that is divisible by a certain number. Using break:

In this case, as soon as we find a number that meets our condition, we break out of the loop. However, if we want to avoid using break, we could rewrite the loop with a while to use a boolean flag that indicates whether we’ve found a suitable number:

In this version, instead of breaking out of the loop, we use the found variable to keep track of whether we’ve found a number that meets our condition. If we have, we skip the rest of the loop iterations without explicitly using a break statement.

