I have run the DepthofCoverage and the DaignoseTarget walker for my tumor samples and its IPS lines. I am finding it difficult to settle with the Diagnose Targets output but the DepthofCoverage outputs are quite clear. I have some query I would like to get it clarified. I was looking at the file _cumulative_coverage_counts which helps to draw a histogram with cumulative frequency of all bases that have been mapped on the exomes. Here I see two columns at the beginning
NSamples_1 80050421 64522700
Can you tell me what does gte stands for and if I want to understand what is the highest number of reads that got mapped on the exonic region with my samples it should the 64 million read count in the second column right? As this number varies across all samples but the gte_0 is same across all the samples and the same 80 million reads is showing up in all samples. So if I think of the reads that mapped ultimately on the exome it should be second column right? I am trying to understand how many reads of my aligned bam file ultimately got mapped on the exonic regions. I am doing it for a QC purpose. As for the reference genome I had a mapping of around 98% so I want to know how much reads got mapped on the exome as the FastQC report showed I had well over 50% duplicates so just to be sure if my reads that got mapped on the exome are near to that value or much less than that. It would be nice if someone can provide some statistics regarding the reads alignment on exome. I read from the wiki page of exome analysis that 60-70% reads map on the exomes provided the duplicates are removing and the errors as well but am a bit perplexed with my stas so need some advice from experts who already did such analysis with QC.Please suggest me.