Question: Will variance between the number of reads mapped in samples impact my differential analysis?
gravatar for a.rex
3.2 years ago by
a.rex220 wrote:

I have mapped fastq files to my transcriptome of interest in Kallisto, and performed a differential expression analysis in Sleuth (control versus sample). Although I obtain a list of differentially expressed loci after performing a Wald test, a look at the read mapping metrics makes me unsure about the data. The mappings I obtain are as follows:

    sample            reads_mapped  reads_proc frac_mapped       bootstraps 
    control_replicate1  39333462    48943096    0.8037              100 
    control_replicate2  41571482    51237738    0.8113              100 
    sample_replicate1   15284000    21768028    0.7021              100 
    sample_replicate2   9515367     21270730    0.4473              100

As you can see, sample_replicate2 is considerably fewer reads mapped than the other sample and controls. Will this impact my differential analysis. Since they are scale in sleuth (i.e. sleuth normalised tpm) will this be an issue?

rna-seq kallisto sleuth • 935 views
ADD COMMENTlink modified 3.2 years ago by Istvan Albert ♦♦ 84k • written 3.2 years ago by a.rex220
gravatar for Istvan Albert
3.2 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

You don't need to worry unless you devise your own method.

Typical differential expression detection methods will account for the differences between the number or reads across the samples.

ADD COMMENTlink written 3.2 years ago by Istvan Albert ♦♦ 84k

Thanks for your reply Istvan. Typically though, for RNA-seq, I have heard that you need at least 20million reads for each replicate. Is this true, and do you know what this is based on? or can I get away with the above figures.

ADD REPLYlink written 3.2 years ago by a.rex220

As mentioned by Albert, the software will take account of this factor by such as normalization between samples. The biological results from the low read number samples are more less convincing. For example samples of low number reads will have bigger variation by chance and will resulted in less significance. But more often genes from these samples will lower expression compared with samples of high number reads -- if there is no reads as an extreme situation, the normalization will not help this. Any number of factor times with zero is zero.

ADD REPLYlink written 3.2 years ago by huwenhuo40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1085 users visited in the last hour