Question: Good mean coverage but big std in coverage
0
gravatar for Pin.Bioinf
8 weeks ago by
Pin.Bioinf170
Malaga
Pin.Bioinf170 wrote:

Hello, I have computed the coverage of my reads with bamQC and I got, for one of my reads:

>>>>>>> Coverage

     mean coverageData = 33.5984X
     std coverageData = 4,599.979X

     There is a 1,3% of reference with a coverageData >= 1X
     There is a 1,14% of reference with a coverageData >= 2X
     There is a 0,85% of reference with a coverageData >= 3X

This 33X is good for rnaseq analysis, but seeing the std so big and a very low percentage of reference above 3X, seems very bad. What should my conclusion be? Even if its a high mean coverage if it has a standard deviation so high, then this should be a bad coverage? Or is this mean coverage still good even if I have a huge std?

Please, help me interpret these results.

Thank you

coverage qualimap rnaseq bamqc • 204 views
ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by Pin.Bioinf170

What is the motivation to do this analysis, so what question do you aim to answer? With the things given, it is hard to interpret anything, as the question is lacking.

ADD REPLYlink written 8 weeks ago by ATpoint9.2k

I want to know if my read is good enough to use it for differential expression analysis. I have read 2X is enough coverage for that. What do you think?

ADD REPLYlink written 8 weeks ago by Pin.Bioinf170

Fold coverage is generally a useless concept in RNAseq.

ADD REPLYlink written 8 weeks ago by Devon Ryan86k

So how do i know if my fastq file is good enough for differential expression analysis? This sample mapped 4,7% of reads against reference

ADD REPLYlink written 8 weeks ago by Pin.Bioinf170

4.7% is crazy low, there is obviously something wrong, focus on figuring out why such a small percentage of your reads are aligning.

ADD REPLYlink written 8 weeks ago by Devon Ryan86k

Okay, thank you, but i have read 2X is the minimum mapped coverage for using the reads for differential expression analysis. I eant to know if this fastq is good enough or if i need to repeat the sample sequencing

ADD REPLYlink written 8 weeks ago by Pin.Bioinf170
2

I don't know where you read that, but please follow guidelines such as this one:

Bioconductor RNA-seq workflow: gene-level exploratory analysis and differential expression

ADD REPLYlink written 8 weeks ago by WouterDeCoster34k

Hello Corentin, thank you for your detailed answer. Then, looking at the % of uniquely mapped reads should be enough? How can I decide if the percentage of uniquely mapped reads is good enough for analysis?

ADD REPLYlink written 8 weeks ago by Pin.Bioinf170

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLYlink written 8 weeks ago by WouterDeCoster34k

"Good enough" is subjective and depends on your experiment, your genome, your reads quality etc... But the higher the percentage of mapped read the better.

You can look in the literature and tutorials for additional steps to check and analyse your data. RNA-seq is a popular method and a lot of resources are available that will explain it more comprehensibly and better than me.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Corentin170

This is the table of some of the samples after mapping:

Sample Name % Aligned   M Aligned
I_S1    4.7%    1.0
I_S2    48.7%   3.2
I_S3    49.0%   3.0
I_S4    46.6%   7.6
I_S6    82.4%   49.3
ADD REPLYlink written 8 weeks ago by Pin.Bioinf170
2

Something appears to have gone wrong with your first 4 samples, presumably they're heavily contaminated or something. Blast some of the reads that didn't align to try and determine what went wrong.

ADD REPLYlink written 8 weeks ago by Devon Ryan86k
2
gravatar for michael.ante
8 weeks ago by
michael.ante2.7k
Austria/Vienna
michael.ante2.7k wrote:

Hi Pin.Bioinf,

You've tagged rnaseq. In RNA-Seq data, the coverage is not uniformly distributed. Expressed transcripts' exons have coverage, while introns and intergenic regions have nearly no coverage. Than difference in abundance lead to different read-depth levels. Thus, don't use coverage statistics in RNA-Seq.

Cheers,

Michael

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by michael.ante2.7k
2
gravatar for Corentin
8 weeks ago by
Corentin170
Corentin170 wrote:

Hello,

From this report only it would seems that almost all your reads mapped to a very small portion of your genome. Do not forget that rna-seq will only map to protein-coding parts of your genome which is estimated (in humans) as 2% of the genome total size. Moreover, in rna-seq the coverage depends on the expression level of the transcript, if you have a heavily expressed transcript then you will have a deep coverage, this could explain the high std.

If you want a qc statistic on your alignment you can check how many of your reads mapped to the reference (for example with samtools flagstat).

And to have more insight you can try to visualize the alignment with tools like IGV https://software.broadinstitute.org/software/igv/. Just upload your bam file and you will be able to see the amount of reads mapped along the reference (on your case probably long regions of 0x coverage and then peaks where a transcript is expressed).

Regards,

ADD COMMENTlink written 8 weeks ago by Corentin170

For quality control in RNA-Seq data, I'd like to mention:

ADD REPLYlink written 8 weeks ago by michael.ante2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2181 users visited in the last hour