Question: Best Human Genome Reference File For Gatk?
0
gravatar for newDNASeqer
6.6 years ago by
newDNASeqer670
United States
newDNASeqer670 wrote:

On GATK website http://gatkforums.broadinstitute.org/discussion/1204/what-input-files-does-the-gatk-accept and their public FTP server, I found a few difference human genome references in their resource bundle: b37, b36, hg18, and hg19. This made me wonder which one I should use for exome-sequencing data analysis?

In my pipeline, I started using BWA-MEM with the hg19 reference, should I stay consistent for the same reference with GATK? or it seems to me Broad Institute people recommend using b37, as they said hg18, b36, etc were lifted over from b37 - confused here. thanks

gatk reference bwa • 8.0k views
ADD COMMENTlink modified 6.6 years ago by Matt Shirley9.2k • written 6.6 years ago by newDNASeqer670

My practical advice would be to perform your BWA-MEM alignment using b37.

ADD REPLYlink modified 4 months ago by RamRS25k • written 6.6 years ago by Matt Shirley9.2k
4
gravatar for Matt Shirley
6.6 years ago by
Matt Shirley9.2k
Cambridge, MA
Matt Shirley9.2k wrote:

Yes, you do seem a bit confused. I find it's best to take a look at this FAQ from 1000 Genomes.

This GRCh37-derived alignment set includes chromosomal plus unlocalized and unplaced contigs, the rCRS mitochondrial sequence (AC:NC_012920), Human herpesvirus 4 type 1 (AC:NC_007605) and decoy sequence derived from HuRef, Human Bac and Fosmid clones and NA12878.

So, it's derived from GRCh37, just as UCSC hg19 is, but contains a different mitochondrial sequence, a herpesvirus, and some other unplaced contigs and sequences. The most practical difference is that the contigs are named 17 instead of chr17, using human chromosome 17 as an example.

And finally, yes, once you have generated an alignment to a specific reference genome you need to use the same reference genome in all of your downstream analyses. There is no "switching" for any good reason that I can think of. It's like asking if you should use a French dictionary to decode and English text.

ADD COMMENTlink modified 6.6 years ago • written 6.6 years ago by Matt Shirley9.2k
1

And note the that the GATK contig/chromosome ordering is important for downstream processing, so pick a reference and stick with it throughout.

ADD REPLYlink written 6.6 years ago by Sean Davis25k
1

Yes, I had just gone back to make this point in an edit!

ADD REPLYlink written 6.6 years ago by Matt Shirley9.2k
1

I switched references once in the GATK pipeline and was like walking on hot coals every step.

ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by Zev.Kronenberg11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1616 users visited in the last hour