We need to do a lot of exercises to be ready for the midterm. Here you have several exercises. Some of them can be answered in short time, others require more thinking. Start thinking all of them. The deadline is valid only for the short term questions. Long term questions should be answered before the midterm exam.
Please use the official template for answers.
Short term questions
Calculate the GC content for only part of the genome
Instead of all the genome, we only look through a window. That is, we look only a region of the genome, with a fixed size, and starting in a given position. For example, we examine only the genome region starting at position 250000 and we look only for 100 letters. That is, only letters in the positions in
The result should depend on:
- the genomic sequence
- the position of the window
- the size of the window
Write a function called
window_gc_content(), that takes
size as input, and returns a single value with the window GC content. You can test this function with the genome of E.coli follwing these steps
Download the genome of E.coli from NCBI or from the blog. Take note of the folder where the file is downloaded. Different web browsers may use different folders.
library(seqinr). If you do not have it installed, pleas install it.
Set your working directory to the folder where the file was downloaded.
Read the sequences with the command
sequences <- read.fasta("NC_000913.fna"). Be careful that the file may have a different name in your computer.
Then you can test using the command
window_gc_content(sequences[], 250000, 100)
window_gc_content() in many places
We want to evaluate
window_gc_content on different positions of the genome. Specifically, we want to evaluate in these positions:
<- seq(from=1, to=length(genome)-window_size, by= window_size)positions
Obviously, the result depends on the genome and
window_size. Please write a function that takes as inputs
window_size, and returns a vector with the GC content of each of the windows in each of the
Write a function that takes a list of genes, and calculate the ratio
(nG-nC)/(nG+nC) for each gene. The function should be called
gene_gc_skew and takes only one input: a list called
genes. What should be the output?
Long term questions
In many important cases we have a vector
x with growing values. That is, each value is bigger or equal to the previous one, so
x[i+1] >= x[i]
for all values of the index
i. It is easy to see that the position of the minimum value has to be 1. We also know that the position of the maximum value is the last position. What about the position of the half value?
The half value is the average of the minimum and the maximum. For example if
x is the vector
c(1, 4, 4, 6, 10, 15) then the half value is
(1+15)/2, that is 8.
The position of the half value of the vector
x is the index of the first value that is equal or bigger than the half value of
x. In the example the position of the half value is 5, since
x is the smallest value that is bigger or equal than 8.
Please write a function called
position_of_half(), with one input called
x. The function must return a single number, which is the index of the smallest value in
x that is bigger than or equal to the average of minimum and maximum of
You can test your functions with the following code.
<- 1:9 x position_of_half(x) position_of_half(x + 20) position_of_half(x * x) position_of_half(sqrt(x))
The answers should be 5, 5, 7, 4, respectively.
Merge two sorted vectors
Please write a function called
vector_merge(x, y) that receives two sorted vectors
y and returns a new vector with the elements of
y together sorted. The output vector has size
You must assume that each of the input vectors is already sorted.
in your code you have to use three indices:
k; to point into
y and the output vector
answer, respectively. On each step you have to compare
x[i] < y[j] then you make
answer[k] <- x[i], otherwise make
answer[k] <- y[j].
You have to increment
k carefully. To test your function, you can use this code:
<- c("a", "d", "e", "h", "i", "k", "m", "s", "t", "u", "v", "w", "z") x <- c("b", "c", "f", "g", "j", "l", "n", "o", "p", "q", "r", "x", "y") y vector_merge(x, y)
The output must be a sorted alphabet.
"a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"