Gatk Depthofcoverage Output
2
0
Entering edit mode
12.0 years ago
huskerjeff492 ▴ 160

One of the files DepthOfCoverage creates is *.sampleintervalsummary. The headings are Target totalcoverage averagecoverage totalcvg meancvg granularQ1 granularmedian granularQ3 %above_15

I don't understand the "granularQ1", "granularmedian", "granular_Q3" columns could someone help me interpret these? If its relavant I was looking at coverage for my exome data ie -L "list of all exons covered by the kit".

Thanks

gatk output • 9.1k views
ADD COMMENT
0
Entering edit mode
11.9 years ago

"Granular" is a term you see referring to histograms when the distribution of values is more finely rendered. I've seen it used in statistics documentation, without ever really seeing a good definition. Basically, it is used with discrete data, rather than continuous data. GATK uses granular histograms to profile coverage, without really defining it either.

http://www.broadinstitute.org/gsa/wiki/index.php/Depth_of_Coverage_v3.0

"granularmedian" is the middle value of the data set. "granularQ1" is the lower quartile, or the median of the first half of the data set. "granular_Q3" is the upper quartile, or the median of the upper half of the data set.

ADD COMMENT
0
Entering edit mode

speaking of the GATK DepthofCoverage output, the file which gives sample_summary of total, mean, median, quartiles, and threshold proportions, aggregated over all bases, does the first column refer to the number of bases that are covered in the exome region. I used the walker to understand how much the number of reads that got mapped on the whole genome got mapped on the exome region providing the interval bed files(which contains the probes that were used for target enrichment). This should help me in extracting the total number of bases that lie in the exome region of the entire genome specific to the bed file that has the regions mentioned right? Then from this total number of bases I can calculate the reads that actually lie on the exome region by simply diving it with 100 as we know each read is 100 bases. So can anyone suggest me if this sounds correct or not or in anyway can I calculate the total number of reads that got mapped on the exome region from the output of DepthofCoverage?

ADD REPLY
0
Entering edit mode
7.9 years ago
gsr9999 ▴ 300

Q3 - Q1 gives IQR (interquartile range). IQR is a measure for uniformity of the coverage of reads over genome or exome. source (http://www.illumina.com/science/education/sequencing-coverage.html).

We are doing exome sequencing and for NA12878 sample, I have got granular_Q3 as 144, and granular_Q1 as 60. So, my IQR is 84 (144-60).

Should I be worried that I don't have a uniform coverage as my IQR is too high ?

ADD COMMENT

Login before adding your answer.

Traffic: 2375 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6