Blog of Andrés Aravena
CMB2:

18 June 2019

This course has three main ideas:

• computational thinking,
• simulation of deterministic systems,
• simulation of random systems.

Therefore the makeup will have three questions.

# Computational Thinking

This is essentially programming. There are three elements to combine:

• functions (identify input and output)
• for loops
• if conditionals

These are the “Lego” pieces that you have to combine. What we want to measure is how well can you solve a complex problem using these basic parts. For that you will use decomposition, pattern recognition, abstraction and algorithm design.

Examples of this kind of problems are questions 1 and 2 on the midterm exam. In most cases I give you a description of the program in English. You have to translate the English description into R.

Pay special attention to keywords like function, vector, list or data frame. Sometimes you have to make a function, sometimes a vector, sometimes a list. This is always indicated in the text.

# Simulation of Deterministic Systems

These are the systems that we model with boxes and circles. There are several classes and exercises on this, including the zombies question in midterm, the water formation system, and the qPCR simulation.

We teach this because many real scientific questions can be answered with this philosophy. Even if you do not use R in the future, this kind of models are always true. Moreover, they are a nice way to explain your ideas to other people.

The procedure to answer these question is always the same. I will give you a diagram, you have to build the formulas, using the procedure I described previously on the blog. Then you have to answer three parts, like these questions:

• Find the formulas and simulate the system starting with given initial conditions and rates
• Find how the result changes when the initial conditions (or rates) change in a range. For example in the qPCR case.
• Find how the result changes when the initial conditions (or rates) change randomly. We want to see the variability of final conditions when the initial conditions change randomly. In other words, we want to test the sensitivity to initial conditions.

# Simulating Random Systems

The last part of the course focused on the idea that you can decompose a complex random system into a series of simpler systems, easier to simulate, and then you combine them. Then you can simulate the combination-of-simple system and see what will be the result of the complex system. When you simulate several times (that is, when you replicate), you can make a barplot of the frequencies of each outcome. In the exam we replicate ten thousands times, but in real life we use bigger numbers.

Examples of this questions are: number of people with epilepsy in a group (depending on group size and probability), total travel time (depending on how many transports are used, and the probabilities of each delay. Total travel time is the sum of all waiting and transport times), same-birthday probabilities, and the GC content of a random piece of DNA.

The typical questions here are: simulate many times and find mean and average, write a function to find events in the simulation (that is, determine if there is a pattern in the simulation result, and return TRUE or FALSE), find a confidence interval containing (1-𝛼) of the total cases (for example 95% and 99%).

The confidence intervals can be written as two numbers: mean(data) - k*sd(data) and mean(data) + k*sd(data). That is, the same formula, just changing the sign of k. The value of k depends on the chosen 𝛼.

There are several ways to determine the k, depending on how much you know about the probability distribution, and if the population standard deviation is known or not.

• If you know that the distribution is Normal, and you know the standard deviation of the population, then k <- qnorm(1-alpha/2). This is the most common case, but it is not always true.
• If you know that the distribution is Normal, but you do not know the standard deviation of the population, then k <- qt(1-alpha/2, df), where the degrees of freedom parameter df is usually length(data)-1.
• If you do not know anything, you have to use Tchbyshev formula. 𝛼