GATK DepthOfCoverage: 'total coverage' and 'mean coverage' meaning
0
3
Entering edit mode
5.1 years ago
user31888 ▴ 100

Hi,

I ran DepthOfCoverage using a RefSeq interval list to use with the -genelist argument. I obtained a gene summary table displaying notably the 'total coverage' and 'mean coverage' values (see here for an example).

I annotated the file and got this kind of values:

Gene      total coverage      mean coverage
AGL       54765               5.75
AGMAT     1089                0.33


Q1: Does the 'total coverage' value is the sum of all the bases or reads (across all my samples) mapped in each RefSeq interval provided?

Q2: How is the 'mean coverage' calculated or related to the 'total coverage'? Does it takes the read length or gene length (or something else) into account? Is it the value usually reported in the papers as 100X coverage for example?

GATK DepthOfCoverage • 3.0k views
0
Entering edit mode

What output format is your -genelist for refseq. I tried bed, gtf and all fields from UCSC and none have worked for me. Can you attach this list?

0
Entering edit mode

Hello,

Did you sort the file according to your reference genome?

Method used to download the refseq file: GATK RefSeq file download instructions Only "all fields from UCSC" option is recommended by GATK (which worked for me)

As I use GATK 4.1.7 there is a bug where your genelist file name should end with .refSeq not .txt as the tool does not recognize it.

Following script was suggested in some posts regarding sorting of the refseq file. Suggested script link sortByRef.pl

This script did not work for me. I had to write an ad-hoc python script to create the sorted file and then the command ran smoothly.

#bin name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds score name2 cdsStartStat cdsEndStat exonFrames

0
Entering edit mode

Hello,

total coverage = Total reads in the region specified (for all the samples)

mean coverage = Average reads per base in the region specified