We need to do a lot of exercises to be ready for the midterm. Here you have several exercises. Some of them can be answered in short time, others require more thinking. Start thinking all of them. The deadline is valid only for the short term questions. Long term questions should be answered before the midterm exam.

Please use the official template for answers.

# Short term questions

## Calculate the GC content for only part of the genome

Instead of all the genome, we only look through a *window*. That is, we look only *a region* of the genome, with a fixed *size*, and starting in a given *position*. For example, we examine only the genome region starting at position 250000 and we look only for 100 letters. That is, only letters in the positions in `seq(from=250000, length=1000)`

.

The result should depend on:

- the genomic
**sequence** - the
**position**of the window - the
**size**of the window

Write a function called `window_gc_content()`

, that takes `sequence`

, `position`

, and `size`

as input, and returns a single value with the window GC content. You can test this function with the genome of *E.coli* follwing these steps

Download the genome of

*E.coli*from NCBI or from the blog. Take note of the folder where the file is downloaded. Different web browsers may use different folders.Load

`library(seqinr)`

. If you do not have it installed, pleas install it.Set your working directory to the folder where the file was downloaded.

Read the sequences with the command

`sequences <- read.fasta("NC_000913.fna")`

. Be careful that the file may have a different name in your computer.Then you can test using the command

`window_gc_content(sequences[[1]], 250000, 100)`

## Using `window_gc_content()`

in many places

We want to evaluate `window_gc_content`

on different positions of the genome. Specifically, we want to evaluate in these positions:

`<- seq(from=1, to=length(genome)-window_size, by= window_size) positions `

Obviously, the result depends on the genome and `window_size`

. Please write a function that takes as inputs `genome`

and `window_size`

, and returns *a vector* with the GC content of each of the windows in each of the `positions`

.

## GC Skew

Write a function that takes a list of genes, and calculate the ratio `(nG-nC)/(nG+nC)`

for each gene. The function should be called `gene_gc_skew`

and takes only one input: a list called `genes`

. What should be the output?

# Long term questions

## Algorithm design

In many important cases we have a vector `x`

with growing values. That is, each value is bigger or equal to the previous one, so

`x[i+1] >= x[i]`

for all values of the index `i`

. It is easy to see that the position of the minimum value has to be 1. We also know that the position of the maximum value is the last position. What about the position of the *half value*?

The *half value* is the average of the minimum and the maximum. For example if `x`

is the vector `c(1, 4, 4, 6, 10, 15)`

then the *half value* is `(1+15)/2`

, that is 8.

The *position of the half value* of the vector `x`

is the **index of the first value** that is equal or bigger than the *half value* of `x`

. In the example the *position of the half value* is 5, since `x[5]`

is the smallest value that is bigger or equal than 8.

Please write a function called `position_of_half()`

, with one input called `x`

. The function must return a single number, which is the index of the smallest value in `x`

that is bigger than or equal to the average of minimum and maximum of `x`

.

You can test your functions with the following code.

```
<- 1:9
x position_of_half(x)
position_of_half(x + 20)
position_of_half(x * x)
position_of_half(sqrt(x))
```

The answers should be 5, 5, 7, 4, respectively.

## Merge two sorted vectors

Please write a function called `vector_merge(x, y)`

that receives two **sorted** vectors `x`

and `y`

and returns a new vector with the elements of `x`

and `y`

together **sorted**. The output vector has size `length(x)+length(y)`

.

You *must assume* that each of the input vectors is already sorted.

in your code you have to use three indices: `i`

, `j`

, and `k`

; to point into `x`

, `y`

and the output vector `answer`

, respectively. On each step you have to compare `x[i]`

and `y[j]`

. If `x[i] < y[j]`

then you make `answer[k] <- x[i]`

, otherwise make `answer[k] <- y[j]`

.

You have to increment `i`

or `j`

, and `k`

carefully. To test your function, you can use this code:

```
<- c("a", "d", "e", "h", "i", "k", "m", "s", "t", "u", "v", "w", "z")
x <- c("b", "c", "f", "g", "j", "l", "n", "o", "p", "q", "r", "x", "y")
y vector_merge(x, y)
```

The output must be a sorted alphabet.

```
"a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
"n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
```