This course is an introduction to Data Science for students of Molecular Biology. We use the R language to learn the basic tools to handle structured data and extract valuable scientific information from it.

This page will be updated during the semester. Please check it regularly.

# Welcome survey

Everybody **must** register to the course forum. To do so you must fill the *Welcome survey* and register your email address. I recommend that you use a *gmail* account, such as the `ogr.iu.edu.tr`

email service provided by the university. In any case, use an email address that you check regularly. We will use this forum to send important material.

# Course Forum

Once you have registered for the forum, you should send all your questions by email to iu-cmb@googlegroups.com. You can also write your questions, and check for previous messages, on the web page https://groups.google.com/d/forum/iu-cmb.

# Homework

All quizzes and homework should be sent to **andres.aravena+cmb@istanbul.edu.tr** before the deadline to get a grade. Please be careful, otherwise you will get a grade zero.

**Homework 1***(Deadline: Tuesday 8 of October at 9:00).*

Create a RMarkdown document with the same content and the same*structure*of a published paper.**Homework 2***(Deadline: Tuesday 15 of October at 9:00).*

Practice for midterm exam. Vectors, indices, and general ideas about using R.**Homework 3***(Deadline: Monday 4 of November at 9:00).*

Practice for midterm exam. Lists and data frames.**Homework 4***(Deadline: Tuesday 3 of December at 9:00).*

Plot vectors, choose colors, symbols, and size.**Homework 5***(Deadline: Tuesday 10 of December at 9:00).*

Scatter plots, choose colors, size, titles, and scale.**Homework 6***(Deadline: Tuesday 17 of December at 8:00).*

Subsets and linear models.

# Classes

Here you will find the slides from the classes and other supplementary material. Notice that some things are said but not written, so you better take good notes. We recommend taking notes with pen and paper using the Cornell Method.

**Why “Computing in Molecular Biology”?.***(Sep 17, 2019).*

What is a*computer*? Why do we care?**[Slides]**.**Structured Documents.***(Sep 17, 2019).*

Introduction to Rstudio and to Markdown.**[Slides]**.**Practice with Structured Documents.***(Sep 24, 2019).*

Introduction to Rstudio and to Markdown.**[Slides]**.**Using R and RStudio.***(Oct 1, 2019).*

Basic usage of RStudio. Introduction to R. Basic Data Types: Numeric, Character, Logic and Factor.**[Slides]**.**Making and Indexing Vectors.***(Oct 8, 2019).*

Handling structured data.**[Slides]**.**Combining Markdown and R.***(Oct 8, 2019).*

How to answers Quizzes, Exams and Make-ups.[class06.Rmd], [slides06.Rmd],**[Slides]**.**Lists: Mixing different types of data.***(Oct 15, 2019).*

Also, a comment about digital signatures, and a Quiz you have to do.**[Slides]**.**Welcome to the Matrix.***(Oct 15, 2019).*

Structures in two dimensions. Matrices and Data Frames.**[Slides]**.**Using Data Frames.***(Oct 22, 2019).*

Telling stories**[Slides]**.**Telling stories.***(Oct 22, 2019).*

Introduction to Descriptive Statistics.**[Slides]**.**Data Visualization.***(Nov 19, 2019).*

Telling stories with pictures. “One image worths a thousand words”. Plots, barplots, histograms. Making “nice” drawings. Adding points and lines.[survey1-tidy.txt], [midterm.txt],**[Slides]**.**More Data Visualization.***(Nov 26, 2019).*

Plotting two vectors, numeric or factor. Formulas.**[Slides]**.**Handling Lists and Data Frames.***(Nov 26, 2019).*

**[Slides]**.**Subsets and formulas.***(Dec 3, 2019).*

Easier ways to plot. Also, introduction to Linear Models.[survey2019.txt],**[Slides]**.**Hooke’s Law.***(Dec 3, 2019).*

A simple application of linear models.[rubber.txt],**[Slides]**.**Logarithmic scales.***(Dec 10, 2019).*

Not all lines are straight lines. Exponential growth in Science and Technology. What will be your future?[kleiber.txt], [Transistor_count.txt], [dna_price.txt],**[Slides]**.**Logarithmic models.***(Dec 10, 2019).*

Not all lines are straight lines**[Slides]**.

## Attendance

By regulation from the Rectory, students need to attend at least 70% of the classes. The attendance book is updated every week and can be seen in Google Sheets.

# Some Free Online Resources about R

- How to read an R help page
- Getting Started with R
- Free Course: Introduction to R
- TryR
- Introduction to Data Science
- Book R for Data Science
- Book Data Visualization: A practical introduction by Kieran Healy, Duke University

## About RMarkdown

# Recommended readings

Polya, G. and Conway, John H.

*How to Solve It: A New Aspect of Mathematical Method.*Princeton Science Library.Zeeberg, Barry R, Joseph Riss, David W Kane, Kimberly J Bussey, Edward Uchio, W Marston Linehan, J Carl Barrett, and John N Weinstein.

*Mistaken Identifiers: Gene Name Errors Can Be Introduced Inadvertently When Using Excel in Bioinformatics*. BMC Bioinformatics 5 (2004): 80. doi:10.1186/1471-2105-5-80.