Dbsnp Vcf Data Corresponding To Hg19/Grch37 Assembly
1
5
Entering edit mode
11.6 years ago

I have done my exome alignment using hg19 release (Data downloaded from here, hg19.2bit ). I have also included the chrUn and random data in my reference sequence following the discussion here. I followed the steps suggested by David and lh3 here now generated my vcf file using GATK UnifiedGenotyper (step 10). In my current output file I don't have dbsnp IDs.

Then I tried this step that included the dbsnp vcf format

java -jar /software/GenomeAnalysisTK.jar -R /data/hg19/hg19.fa -T UnifiedGenotyper -I FOO.bam -B:dbsnp,VCF /data/dbsnp132/00-All.vcf -o Foo_raw.vcf


Here am getting the following error

<h5>ERROR MESSAGE: Input files reads and reference have incompatible</h5>

contigs: Order of contigs differences, which is unsafe.

The error is obvious here, I have CHROM field of vcf file in 1, 2...22, X, Y, M, PAR format where as my reference genome is in the format chr1, chr2... chr22, chrX, chrY, chrM followed by chrUn and random chr. I can modify the fasta headers to fix this issue with Chromosomes, X, Y and M.

Before venturing into my re-alignment from the begining, I would like to know whether the VCF file with chrUn and random data in my reference genome may create any further errors. Also my reference genome don't have PAR which is in the VCF file.

Do you have any suggestion on how to deal with additional data in hg19 which is not dbSNP vcf file and PAR in dbSNP vcf file which is not in my reference genome ?

Is there any alternate hg19/GRCh37 assembly with corresponding dbSNP 132 in VCF format that I can use for my exome analysis ?

hg dbsnp exome vcf • 23k views
12
Entering edit mode
11.6 years ago

GATK requires consistency in the reference ordering and names; their FAQ has some discussion on this. My recommendation is to use the Broad reference genome for alignments:

which is compatible with dbSNP:

ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/v4.0/00-All.vcf.gz

and will get you through the process cleanly. I believe you can also include the random and other chromosomes as long as they are after the karyotypically sorted reference chromosomes.

1
Entering edit mode

The NIH link appears to be broken.

2
Entering edit mode
0
Entering edit mode