Need help finding data from Google's DeepVariant paper
0
0
Entering edit mode
4.2 years ago
Tails ▴ 80

Apologies if this is not the usual type of question posted here.

I'm a postgraduate student trying to understand Google's DeepVariant paper, and a step in the process is trying to run VQSR and seeing how it compares with DeepVariant. The authors have kindly provided their GATK code:

https://static-content.springer.com/esm/art%3A10.1038%2Fnbt.4235/MediaObjects/41587_2018_BFnbt4235_MOESM84_ESM.txt

I see the output of the variant calling step is a file called gatk3-raw.vcf.gz

Does anyone know where I might be able to find this file? The paper says "All data used in this manuscript is publicly available from Genome in a Bottle or the Mouse Genome project, with the exception of 35 NA12878 WGS replicates from the Verily sequencing laboratory, which were licensed from Verily for the current study and are not publicly available."

I went to the GIAB website and couldn't easily find it.

If that's not available, I can just recreate the files by calling haplotype caller. But I run into another problem since I can't find the input BAM file either. All I know is that they refer to it as "HG002_NIST_150bp_50x.bam". I'm not familiar with these abbreviations, so if anyone can guess where I might be able to find this dataset, I would be very grateful!

genome snp sequencing • 805 views
ADD COMMENT
0
Entering edit mode

Most of the data from the Genome in a Bottle project is available from this FTP page. You may need to root through to find the BAM file. Otherwise fastq's should also be there. Above referenced page is supposed to be replaced by this link at some point.

ADD REPLY
0
Entering edit mode

Thanks for that. I looked and couldn't find the 50x bam file, there were other 300x bam files. Is it possible they only have the most up-to-date versions on there, are there archived versions? By the way, the second link doesn't work?

ADD REPLY
0
Entering edit mode

Try the second link now.

ADD REPLY

Login before adding your answer.

Traffic: 2725 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6