For those who do NGS sequencing, what is a reliable “ballpark” number of reads per each library? For example, for a global gene expression comparison between two conditions, control and treatment, how many reads would you want to have for each of the two libraries? This would be run on a single lane of a HighSeq 2500 flow cell, but there will be other libraries in the lane, so I want to know what is the maximum number of libraries I can fit in a lane to still get meaningful coverage? Thank you.
There is no one correct answer. It depends upon the relative representation of the genes in which you’re interested. If your goal is the top 100 genes in muscle, then 1-2M reads would suffice. If it’s low-abundance transcription factors in the DTC, then 50M might be inadequate.
Also, biological replicates (minimum of three) are essential for statistical analysis of the data. Replicates also provide more power to detect differential gene expression (i.e., 3 X 5M reads is much better than 1 X 15M; see http://www.ncbi.nlm.nih.gov/pubmed/24319002).
Gene. 2015 Feb 15;557(1):82-7. doi: 10.1016/j.gene.2014.12.013. Epub 2014 Dec 10.
Diminishing returns in next-generation sequencing (NGS) transcriptome data.
Lei R, Ye K, Gu Z, Sun X. PubMed link: http://www.ncbi.nlm.nih.gov/pubmed/25497830
From the abstract: “Our results indicated that as low as one million reads can provide the same sequencing accuracy in transcript abundance (r=0.99) as >30 million reads for highly-expressed genes in all six species [including C. elegans].”
This is consistent with the reply from “Sperm or Egg?” with respect to abundant mRNAs.
Thank you both for the great references. My design includes three biological replicates per group, and I am planning to do 6.25M reads per library. This is a global DGE experiment, so I’m looking at “everything” and realize that I am less likely to pick out low expressing genes, but it sounds like my current setup would be acceptable. On a related note, is anyone aware of, or would there be interest in setting up a worm “Bioinformatics” page/blog/thread within or outside this forum with a place for active discussion/questions? I am just getting into teaching myself the analysis part, and it would be wonderful to have this type of resource, especially for people that are just starting out, but also for more advanced users.