Monday, June 25, 2012

Intro to Bioinformatics


1.     Basic molecular biology: central dogma
    
       DNA:
                     
RNA:
                 
Protein:
                 

2.     NCBI
The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information.

3.     Sequence Databases
Primary sequence repositories: GenBank (USA), EMBL (Europe) & DDBJ (Japan)
Each of the three groups collects a portion of the total sequence data reported worldwide, and all new and updated database entries are exchanged between the groups on a daily basis.

4.     DNA sequencing and Next generation sequencing

5.     Get familiar with sequence format.

6.     Sequence alignment

7.     Blast: Basic Local Alignment Search Tool


9.     Programming Strategies in Bioinformatics
·       Avoid programming
·       Don’t change other people’s code… too much!
·       Reuse, Reuse, Reuse!
·       Wrappers on available tools.
·       Comment copiously
·       There is more than one way to do it!




Thursday, June 21, 2012

Kmer rarefaction

Tool: JellyFish 1.1.5
         Kmer=31
Four samples Complex Amazon Pasture72 2010 replicate I A100
Medium complex Sakinaw Lake metagenomics (120m)
Simple Newbly Island Compost Facility,Passage 4_SG
Isolate Genome E. coli

The Ecoli MiSeq data has only 12M reads such that the line of Ecoli sample on the kmer rarefaction stopped at ~12M. All other Metagenome samples were sampled with 50M reads. Both rarefaction and frequency plots can indicate the complexity of the samples. More flatter on the rarefaction curve is more simple. One major peak on the frequency plot is the isolate genome. The complex sample doesn't show any obvious peak. The upper right fig is a zoom-in portion of the frequency plot.