Question

convert hg38 variant calls to hg19

0

Entering edit mode

7.3 years ago

bioguy24 ▴ 230

Looking for a tool that can covert variants that were aligned to hg38 to hg19. That is the below variants (~370) are throwing an error because I am using the hg19 and they were aligned to hg38 so the references may be different at positions.

chr4    70501545    rs28560191  C   A   UGT2A1;UGT2A2
chr5    112385005   rs7726162   A   C   MCC
chr7    2578238 rs62907961  C   T   BRAT1

Thank you :).

ngs • 4.5k views

ADD COMMENT • link updated 7.3 years ago by Pierre Lindenbaum 161k • written 7.3 years ago by bioguy24 ▴ 230

score 3 · Answer 1 · 2017-01-07

3

Entering edit mode

7.3 years ago

Pierre Lindenbaum 161k

picard : liftover VCF : https://broadinstitute.github.io/picard/command-line-overview.html#LiftoverVcf

ADD COMMENT • link 7.3 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

The tool seems to work using the command:

java -jar /home/cmccabe/Desktop/NGS/picard-tools-1.140/picard.jar LiftoverVcf \
 I=/home/cmccabe/Desktop/tvc/IDP.vcf \
 O=/home/cmccabe/Desktop/tvc/IDP.lifted_over.vcf \
 CHAIN=/home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/hg38ToHg19.over.chain \
 REJECT=/home/cmccabe/Desktop/tvc/IDP_rejected_variants.vcf \
 R=/home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta

However throws an error:

Exception in thread "main" htsjdk.tribble.TribbleException$MalformedFeatureFile: Error parsing line at byte position: htsjdk.tribble.readers.LineIteratorImpl@100fc185, for input source: /home/cmccabe/Desktop/tvc/IDP.vcf

the input vcf looks like:

##fileformat=VCFv4.1
##fileDate=20130610
##source=ensembl;version=75;url=http://e75.ensembl.org/homo_sapiens
##reference=ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/
##INFO=<ID=TSA,Number=0,Type=String,Description="Type of="" sequence="" alteration.="" Child="" of="" term="" sequence_alteration="" as="" defined="" by="" the="" sequence="" ontology="" project."="">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency"="">
##INFO=<ID=E_MO,Number=0,Type=Flag,Description="Multiple_observations.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_1000G,Number=0,Type=Flag,Description="1000Genomes.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_HM,Number=0,Type=Flag,Description="HapMap.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_Freq,Number=0,Type=Flag,Description="Frequency.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_C,Number=0,Type=Flag,Description="Cited.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=dbSNP_137,Number=0,Type=Flag,Description="Variants (including="" SNPs="" and="" indels)="" imported="" from="" dbSNP"="">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr4    70501545    rs28560191  C   A   UGT2A1;UGT2A2   .   .   .
chr5    112385005   rs7726162   A   C   MCC .   .   .
chr7    2578238 rs62907961  C   T   BRAT1   .   .   .

Any suggestions on the vcf format? Thank you :).

ADD REPLY • link 7.3 years ago by bioguy24 ▴ 230

0

Entering edit mode

did you check your java version?

ADD REPLY • link 7.3 years ago by jonessara770 ▴ 240

0

Entering edit mode

java version "1.8.0_111" Java(TM) SE Runtime Environment (build 1.8.0_111-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)

OS= ubuntu 14.04

Thank you :).

ADD REPLY • link 7.3 years ago by bioguy24 ▴ 230

0

Entering edit mode

Just putting the Gene in the INFO is not a valid VCF. should be something like

GENE=UGT2A1|UGT2A2

with a ##INFO header...

ADD REPLY • link 7.3 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

I have changed the vcf to the following (just filtering the original vcf to the regions of interest:

##fileformat=VCFv4.1
##fileDate=20130610
##source=ensembl;version=75;url=http://e75.ensembl.org/homo_sapiens
##reference=ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/
##INFO=<ID=TSA,Number=0,Type=String,Description="Type of="" sequence="" alteration.="" Child="" of="" term="" sequence_alteration="" as="" defined="" by="" the="" sequence="" ontology="" project."="">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency"="">
##INFO=<ID=E_MO,Number=0,Type=Flag,Description="Multiple_observations.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_1000G,Number=0,Type=Flag,Description="1000Genomes.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_HM,Number=0,Type=Flag,Description="HapMap.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_Freq,Number=0,Type=Flag,Description="Frequency.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_C,Number=0,Type=Flag,Description="Cited.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=dbSNP_137,Number=0,Type=Flag,Description="Variants (including="" SNPs="" and="" indels)="" imported="" from="" dbSNP"="">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Individual
chr1    11082610    rs11689432  G   A   .   .   TSA=SNV;E_MO;E_Freq;E_HM;E_C;CS=pathogenic;dbSNP_137    GT  1/0
chr1    11107061    rs77977199  T   G   .   .   TSA=SNV;E_Freq;dbSNP_137    GT  1/0
chr1    17380507    rs11203289  G   C   .   .   TSA=SNV;E_MO;E_Freq;E_1000G;CS=pathogenic;dbSNP_137 GT  1/1

the I run picard liftovervcf and get:

Exception in thread "main" java.lang.IllegalStateException: Key CS found in VariantContext field INFO at chr1:11082610 but this key isn't defined in the VCFHeader.  We require all VCFs to have complete VCF headers by default.

Thank you :).

ADD REPLY • link 7.3 years ago by bioguy24 ▴ 230

0

Entering edit mode

I guess I don't understand the error, I added the line in bold to ##INFO:

##fileformat=VCFv4.1
##fileDate=20130610
##source=ensembl;version=75;url=http://e75.ensembl.org/homo_sapiens
##reference=ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/
##INFO=<ID=TSA,Number=0,Type=String,Description="Type of="" sequence="" alteration.="" Child="" of="" term="" sequence_alteration="" as="" defined="" by="" the="" sequence="" ontology="" project."="">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency"="">
**##INFO=<ID=CS,Number=0,Type=String,Description="Classification">**
##INFO=<ID=E_MO,Number=0,Type=Flag,Description="Multiple_observations.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_1000G,Number=0,Type=Flag,Description="1000Genomes.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_HM,Number=0,Type=Flag,Description="HapMap.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_Freq,Number=0,Type=Flag,Description="Frequency.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
 ##INFO=<ID=E_C,Number=0,Type=Flag,Description="Cited.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
 ##INFO=<ID=dbSNP_137,Number=0,Type=Flag,Description="Variants (including="" SNPs="" and="" indels)="" imported="" from="" dbSNP"="">
 **##INFO=<ID=GT,Number=0,Type=String,Description="Tag.">**
 #CHROM POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Individual
chr1    11082610    rs11689432  G   A   .   .   TSA=SNV;E_MO;E_Freq;E_HM;E_C;CS=pathogenic;dbSNP_137    GT  1/0
chr1    11107061    rs77977199  T   G   .   .   TSA=SNV;E_Freq;dbSNP_137    GT  1/0
chr1    17380507    rs11203289  G   C   .   .   TSA=SNV;E_MO;E_Freq;E_1000G;CS=pathogenic;dbSNP_137 GT  1/1

Key GT found in VariantContext field FORMAT at chrX:153247722 but this key isn't defined in the VCFHeader.  We require all VCFs to have complete VCF headers by default.

Thank you :).

ADD REPLY • link 7.3 years ago by bioguy24 ▴ 230