Question

To Understand The Output Of The Diagnosetarget And Depthofcoverage Walker In Gatk

0

Entering edit mode

10.4 years ago

ivivek_ngs ★ 5.2k

Hello All,

I have run the DepthofCoverage and the DaignoseTarget walker for my tumor samples and its IPS lines. I am finding it difficult to settle with the Diagnose Targets output but the DepthofCoverage outputs are quite clear. I have some query I would like to get it clarified. I was looking at the file _cumulative_coverage_counts which helps to draw a histogram with cumulative frequency of all bases that have been mapped on the exomes. Here I see two columns at the beginning

           gte_0  gte_1

NSamples_1 80050421 64522700

Can you tell me what does gte stands for and if I want to understand what is the highest number of reads that got mapped on the exonic region with my samples it should the 64 million read count in the second column right? As this number varies across all samples but the gte_0 is same across all the samples and the same 80 million reads is showing up in all samples. So if I think of the reads that mapped ultimately on the exome it should be second column right? I am trying to understand how many reads of my aligned bam file ultimately got mapped on the exonic regions. I am doing it for a QC purpose. As for the reference genome I had a mapping of around 98% so I want to know how much reads got mapped on the exome as the FastQC report showed I had well over 50% duplicates so just to be sure if my reads that got mapped on the exome are near to that value or much less than that. It would be nice if someone can provide some statistics regarding the reads alignment on exome. I read from the wiki page of exome analysis that 60-70% reads map on the exomes provided the duplicates are removing and the errors as well but am a bit perplexed with my stas so need some advice from experts who already did such analysis with QC.Please suggest me.

Regards,

Vivek

exome-sequencing gatk qc • 4.6k views

ADD COMMENT • link updated 10.3 years ago by JuJo ▴ 10 • written 10.4 years ago by ivivek_ngs ★ 5.2k

1

Entering edit mode

"gte" = "greater or equal than"

ADD REPLY • link 10.4 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thank you for the reply Pierre but I would like to know which column shows me the reads that got mapped on the exonic region? is it the first with gte_0 or gte_1 as the first value is over 80 million which is same in all samples but the second column shows different cumulative read counts that got mapped on the exome. So it should be that value right?

ADD REPLY • link 10.4 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

Can anyone give me suggestions here?

ADD REPLY • link 10.4 years ago by ivivek_ngs ★ 5.2k

score 0 · Answer 1 · 2014-01-14

Hi,

"As for the reference genome I had a mapping of around 98% so I want to know how much reads got mapped on the exome"

You could calculate this (dependent on the bases, not the reads) on your own by using the .DepthofCoverage file.

Header of the file should be:

Locus    Total_Depth    Average_Depth_sample    Depth_for_yoursample

Just grep all the bases where the depth is zero and count them. Divided by all lines of the .DepthofCoverage (-1 for header...) you will get the percentage of bases which are not covered.

Regards,

JuJo