Good mean coverage but big std in coverage
2
0
Entering edit mode
2.9 years ago
Pin.Bioinf ▴ 300

Hello, I have computed the coverage of my reads with bamQC and I got, for one of my reads:

>>>>>>> Coverage

mean coverageData = 33.5984X
std coverageData = 4,599.979X

There is a 1,3% of reference with a coverageData >= 1X
There is a 1,14% of reference with a coverageData >= 2X
There is a 0,85% of reference with a coverageData >= 3X


This 33X is good for rnaseq analysis, but seeing the std so big and a very low percentage of reference above 3X, seems very bad. What should my conclusion be? Even if its a high mean coverage if it has a standard deviation so high, then this should be a bad coverage? Or is this mean coverage still good even if I have a huge std?

Please, help me interpret these results.

Thank you

bamqc qualimap coverage rnaseq • 953 views
0
Entering edit mode

What is the motivation to do this analysis, so what question do you aim to answer? With the things given, it is hard to interpret anything, as the question is lacking.

0
Entering edit mode

I want to know if my read is good enough to use it for differential expression analysis. I have read 2X is enough coverage for that. What do you think?

0
Entering edit mode

Fold coverage is generally a useless concept in RNAseq.

0
Entering edit mode

So how do i know if my fastq file is good enough for differential expression analysis? This sample mapped 4,7% of reads against reference

0
Entering edit mode

4.7% is crazy low, there is obviously something wrong, focus on figuring out why such a small percentage of your reads are aligning.

0
Entering edit mode

Okay, thank you, but i have read 2X is the minimum mapped coverage for using the reads for differential expression analysis. I eant to know if this fastq is good enough or if i need to repeat the sample sequencing

2
Entering edit mode

Bioconductor RNA-seq workflow: gene-level exploratory analysis and differential expression

0
Entering edit mode

Hello Corentin, thank you for your detailed answer. Then, looking at the % of uniquely mapped reads should be enough? How can I decide if the percentage of uniquely mapped reads is good enough for analysis?

0
Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

0
Entering edit mode

You can look in the literature and tutorials for additional steps to check and analyse your data. RNA-seq is a popular method and a lot of resources are available that will explain it more comprehensibly and better than me.

0
Entering edit mode

This is the table of some of the samples after mapping:

Sample Name % Aligned   M Aligned
I_S1    4.7%    1.0
I_S2    48.7%   3.2
I_S3    49.0%   3.0
I_S4    46.6%   7.6
I_S6    82.4%   49.3

2
Entering edit mode

Something appears to have gone wrong with your first 4 samples, presumably they're heavily contaminated or something. Blast some of the reads that didn't align to try and determine what went wrong.

2
Entering edit mode
2.9 years ago
michael.ante ★ 3.6k

Hi Pin.Bioinf,

You've tagged rnaseq. In RNA-Seq data, the coverage is not uniformly distributed. Expressed transcripts' exons have coverage, while introns and intergenic regions have nearly no coverage. Than difference in abundance lead to different read-depth levels. Thus, don't use coverage statistics in RNA-Seq.

Cheers,

Michael

2
Entering edit mode
2.9 years ago
Corentin ▴ 470

Hello,

From this report only it would seems that almost all your reads mapped to a very small portion of your genome. Do not forget that rna-seq will only map to protein-coding parts of your genome which is estimated (in humans) as 2% of the genome total size. Moreover, in rna-seq the coverage depends on the expression level of the transcript, if you have a heavily expressed transcript then you will have a deep coverage, this could explain the high std.

If you want a qc statistic on your alignment you can check how many of your reads mapped to the reference (for example with samtools flagstat).

And to have more insight you can try to visualize the alignment with tools like IGV https://software.broadinstitute.org/software/igv/. Just upload your bam file and you will be able to see the amount of reads mapped along the reference (on your case probably long regions of 0x coverage and then peaks where a transcript is expressed).

Regards,

0
Entering edit mode

For quality control in RNA-Seq data, I'd like to mention: