Gatk Snp Calling Test Set
1
3
Entering edit mode
12.2 years ago

Does anyone have/or know of a test exome data set for GATK? They have a single chromosome and another small example but I looking for a large set of reads that I could map and then call using my GATK pipeline. The obvious answer is to use existing data from something like the 1000 genome project. However, all of the exome coverage (as far as I can tell) is low coverage and I would like something with like 30-50x coverage. The second requirement is that it be called using current best practices V2 or V3. I also would like to compare the vcf I get to this know "good" vcf file. I have searched some papers but I cant find something that fits all of these requirements and was hoping someone out that might have a good idea.

Thanks

gatk • 3.2k views
ADD COMMENT
3
Entering edit mode
12.2 years ago

In the latest release of 1000kg there are 'high' coverage exomes that can be downloaded in BAM format. The VCF files are also available for these individuals. This would be a good training set.

Here is the link to phase 1: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/phase1/data/

ADD COMMENT
0
Entering edit mode

These new exome data sets are pretty big (15-20GB). A rough estimate is something like x4000 coverage?? Are there any more manageably sized exome data sets out there?

ADD REPLY

Login before adding your answer.

Traffic: 2164 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6