Question: To Understand The Output Of The Diagnosetarget And Depthofcoverage Walker In Gatk
0
gravatar for ivivek_ngs
7.3 years ago by
ivivek_ngs5.1k
Seattle,WA, USA
ivivek_ngs5.1k wrote:

Hello All,

I have run the DepthofCoverage and the DaignoseTarget walker for my tumor samples and its IPS lines. I am finding it difficult to settle with the Diagnose Targets output but the DepthofCoverage outputs are quite clear. I have some query I would like to get it clarified. I was looking at the file _cumulative_coverage_counts which helps to draw a histogram with cumulative frequency of all bases that have been mapped on the exomes. Here I see two columns at the beginning

           gte_0  gte_1

NSamples_1 80050421 64522700

Can you tell me what does gte stands for and if I want to understand what is the highest number of reads that got mapped on the exonic region with my samples it should the 64 million read count in the second column right? As this number varies across all samples but the gte_0 is same across all the samples and the same 80 million reads is showing up in all samples. So if I think of the reads that mapped ultimately on the exome it should be second column right? I am trying to understand how many reads of my aligned bam file ultimately got mapped on the exonic regions. I am doing it for a QC purpose. As for the reference genome I had a mapping of around 98% so I want to know how much reads got mapped on the exome as the FastQC report showed I had well over 50% duplicates so just to be sure if my reads that got mapped on the exome are near to that value or much less than that. It would be nice if someone can provide some statistics regarding the reads alignment on exome. I read from the wiki page of exome analysis that 60-70% reads map on the exomes provided the duplicates are removing and the errors as well but am a bit perplexed with my stas so need some advice from experts who already did such analysis with QC.Please suggest me.

Regards,

Vivek

gatk exome-sequencing qc • 3.6k views
ADD COMMENTlink modified 7.1 years ago by JuJo10 • written 7.3 years ago by ivivek_ngs5.1k
1

"gte" = "greater or equal than"

ADD REPLYlink written 7.3 years ago by Pierre Lindenbaum134k

Thank you for the reply Pierre but I would like to know which column shows me the reads that got mapped on the exonic region? is it the first with gte_0 or gte_1 as the first value is over 80 million which is same in all samples but the second column shows different cumulative read counts that got mapped on the exome. So it should be that value right?

ADD REPLYlink written 7.3 years ago by ivivek_ngs5.1k

Can anyone give me suggestions here?

ADD REPLYlink written 7.3 years ago by ivivek_ngs5.1k
0
gravatar for JuJo
7.1 years ago by
JuJo10
JuJo10 wrote:

Hi,

"As for the reference genome I had a mapping of around 98% so I want to know how much reads got mapped on the exome"

You could calculate this (dependent on the bases, not the reads) on your own by using the .DepthofCoverage file.

Header of the file should be:

Locus    Total_Depth    Average_Depth_sample    Depth_for_yoursample

Just grep all the bases where the depth is zero and count them. Divided by all lines of the .DepthofCoverage (-1 for header...) you will get the percentage of bases which are not covered.

Regards,

JuJo

ADD COMMENTlink written 7.1 years ago by JuJo10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1987 users visited in the last hour
_