Question: for low coverage RNAseq how many reads assigned is the bare minimum for differential gene expression analysis
0
gravatar for senowinski
19 months ago by
senowinski30
European Union
senowinski30 wrote:

With low coverage RNAseq of human tissue - ~6million reads aligned using STAR. Of the 84 samples I have a range of reads aligned to genes of between 2-7 Million reads. What is the bare minimum number of reads I can use for differential gene expression analysis? What is a sensible cut-off? Ideally I would like to retain as many samples as possible.

rna-seq • 895 views
ADD COMMENTlink modified 19 months ago • written 19 months ago by senowinski30
3

Depends on the genome. For example, you need more read depth for human alignments than you do for fly alignments.

What is the bare minimum number of reads I can use for differential gene expression analysis?

There's not really a bare minimum. Depends how sensitive your analysis is. Also depends on sequencing quality (how many good reads remain after processing) and genome size, as I mentioned already.

You should go ahead with the differential expression analysis. That part doesn't take that long. And if you decide to do more sequencing, you will have the differential expression pipeline already setup.

ADD REPLYlink written 19 months ago by goodez480

It's human alignments and when you say go ahead with the differential gene expression analysis, do you think I should try this analysis with all the samples?

ADD REPLYlink written 19 months ago by senowinski30

Well, as I say, it depends if they are outliers on metrics other than read count.

ADD REPLYlink written 19 months ago by i.sudbery9.1k
2

What is your organism? Six million reads is low coverage for human, but it is not for yeast, for example. And how are the 84 samples distributed within treatments? Literature shows biological replicates are more important than read depth per sample when it comes to statistical power.

ADD REPLYlink written 19 months ago by h.mon31k
2

We normally talk about reads in the sample, rather than reads assigned to genes. A dirty little secrete that people often don't talk about is that often only around a third (total ribo-delpleted) to two thirds (polyA) reads map to exons. So when some says they have 20M polyA reads, the probably only really have 13M assigned to exons.

I'd normalise your sample with DESeq2s rLog and see which samples stand out on the PCA/MDS. Do you have two-read samples that are a million miles away from all the other samples? Do they have other thigns wrong with them (GC distribution, over-represented sequences etc). If your low coverage samples cluster on a PCA/MDS with the high coverage ones, I'd probably use them. If they are miles away I'd discard them.

As was pointed out by @h.mon, a lot of power in RNA-seq comes from replicates rather than read number.

ADD REPLYlink written 19 months ago by i.sudbery9.1k
1

For reference https://academic.oup.com/bioinformatics/article/30/3/301/228651

ADD REPLYlink written 19 months ago by grant.hovhannisyan2.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1213 users visited in the last hour