Question: Dbsnp Vcf Data Corresponding To Hg19/Grch37 Assembly
5
gravatar for Khader Shameer
8.4 years ago by
Manhattan, NY
Khader Shameer18k wrote:

I have done my exome alignment using hg19 release (Data downloaded from here, hg19.2bit ). I have also included the chrUn and random data in my reference sequence following the discussion here. I followed the steps suggested by David and lh3 here now generated my vcf file using GATK UnifiedGenotyper (step 10). In my current output file I don't have dbsnp IDs.

Then I tried this step that included the dbsnp vcf format

java -jar /software/GenomeAnalysisTK.jar -R /data/hg19/hg19.fa -T UnifiedGenotyper -I FOO.bam -B:dbsnp,VCF /data/dbsnp132/00-All.vcf -o Foo_raw.vcf

Here am getting the following error

<h5>ERROR MESSAGE: Input files reads and reference have incompatible</h5>

contigs: Order of contigs differences, which is unsafe.

The error is obvious here, I have CHROM field of vcf file in 1, 2...22, X, Y, M, PAR format where as my reference genome is in the format chr1, chr2... chr22, chrX, chrY, chrM followed by chrUn and random chr. I can modify the fasta headers to fix this issue with Chromosomes, X, Y and M.

Before venturing into my re-alignment from the begining, I would like to know whether the VCF file with chrUn and random data in my reference genome may create any further errors. Also my reference genome don't have PAR which is in the VCF file.

Do you have any suggestion on how to deal with additional data in hg19 which is not dbSNP vcf file and PAR in dbSNP vcf file which is not in my reference genome ?

Is there any alternate hg19/GRCh37 assembly with corresponding dbSNP 132 in VCF format that I can use for my exome analysis ?

exome vcf dbsnp hg • 19k views
ADD COMMENTlink modified 8.4 years ago by Brad Chapman9.4k • written 8.4 years ago by Khader Shameer18k
10
gravatar for Brad Chapman
8.4 years ago by
Brad Chapman9.4k
Boston, MA
Brad Chapman9.4k wrote:

GATK requires consistency in the reference ordering and names; their FAQ has some discussion on this. My recommendation is to use the Broad reference genome for alignments:

ftp://ftp.broadinstitute.org/pub/seq/references/Homo_sapiens_assembly19.fasta

which is compatible with dbSNP:

ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/v4.0/00-All.vcf.gz

and will get you through the process cleanly. I believe you can also include the random and other chromosomes as long as they are after the karyotypically sorted reference chromosomes.

ADD COMMENTlink written 8.4 years ago by Brad Chapman9.4k
1

The NIH link appears to be broken.

ADD REPLYlink written 6.3 years ago by Lee Baker20
1

The current link is: ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz

ADD REPLYlink written 5.6 years ago by Zhaorong1.2k

Thanks a lot Brad. I didn't know about this hg19 assembly from Broad.

ADD REPLYlink written 8.4 years ago by Khader Shameer18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2547 users visited in the last hour