convert hg38 variant calls to hg19
1
0
Entering edit mode
7.3 years ago
bioguy24 ▴ 230

Looking for a tool that can covert variants that were aligned to hg38 to hg19. That is the below variants (~370) are throwing an error because I am using the hg19 and they were aligned to hg38 so the references may be different at positions.

chr4    70501545    rs28560191  C   A   UGT2A1;UGT2A2
chr5    112385005   rs7726162   A   C   MCC
chr7    2578238 rs62907961  C   T   BRAT1

Thank you :).

ngs • 4.5k views
ADD COMMENT
3
Entering edit mode
ADD COMMENT
0
Entering edit mode

The tool seems to work using the command:

java -jar /home/cmccabe/Desktop/NGS/picard-tools-1.140/picard.jar LiftoverVcf \
 I=/home/cmccabe/Desktop/tvc/IDP.vcf \
 O=/home/cmccabe/Desktop/tvc/IDP.lifted_over.vcf \
 CHAIN=/home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/hg38ToHg19.over.chain \
 REJECT=/home/cmccabe/Desktop/tvc/IDP_rejected_variants.vcf \
 R=/home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta

However throws an error:

Exception in thread "main" htsjdk.tribble.TribbleException$MalformedFeatureFile: Error parsing line at byte position: htsjdk.tribble.readers.LineIteratorImpl@100fc185, for input source: /home/cmccabe/Desktop/tvc/IDP.vcf

the input vcf looks like:

##fileformat=VCFv4.1
##fileDate=20130610
##source=ensembl;version=75;url=http://e75.ensembl.org/homo_sapiens
##reference=ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/
##INFO=<ID=TSA,Number=0,Type=String,Description="Type of="" sequence="" alteration.="" Child="" of="" term="" sequence_alteration="" as="" defined="" by="" the="" sequence="" ontology="" project."="">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency"="">
##INFO=<ID=E_MO,Number=0,Type=Flag,Description="Multiple_observations.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_1000G,Number=0,Type=Flag,Description="1000Genomes.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_HM,Number=0,Type=Flag,Description="HapMap.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_Freq,Number=0,Type=Flag,Description="Frequency.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_C,Number=0,Type=Flag,Description="Cited.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=dbSNP_137,Number=0,Type=Flag,Description="Variants (including="" SNPs="" and="" indels)="" imported="" from="" dbSNP"="">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr4    70501545    rs28560191  C   A   UGT2A1;UGT2A2   .   .   .
chr5    112385005   rs7726162   A   C   MCC .   .   .
chr7    2578238 rs62907961  C   T   BRAT1   .   .   .

Any suggestions on the vcf format? Thank you :).

ADD REPLY
0
Entering edit mode

did you check your java version?

ADD REPLY
0
Entering edit mode

java version "1.8.0_111" Java(TM) SE Runtime Environment (build 1.8.0_111-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)

OS= ubuntu 14.04

Thank you :).

ADD REPLY
0
Entering edit mode

Just putting the Gene in the INFO is not a valid VCF. should be something like

GENE=UGT2A1|UGT2A2

with a ##INFO header...

ADD REPLY
0
Entering edit mode

I have changed the vcf to the following (just filtering the original vcf to the regions of interest:

##fileformat=VCFv4.1
##fileDate=20130610
##source=ensembl;version=75;url=http://e75.ensembl.org/homo_sapiens
##reference=ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/
##INFO=<ID=TSA,Number=0,Type=String,Description="Type of="" sequence="" alteration.="" Child="" of="" term="" sequence_alteration="" as="" defined="" by="" the="" sequence="" ontology="" project."="">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency"="">
##INFO=<ID=E_MO,Number=0,Type=Flag,Description="Multiple_observations.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_1000G,Number=0,Type=Flag,Description="1000Genomes.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_HM,Number=0,Type=Flag,Description="HapMap.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_Freq,Number=0,Type=Flag,Description="Frequency.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_C,Number=0,Type=Flag,Description="Cited.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=dbSNP_137,Number=0,Type=Flag,Description="Variants (including="" SNPs="" and="" indels)="" imported="" from="" dbSNP"="">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Individual
chr1    11082610    rs11689432  G   A   .   .   TSA=SNV;E_MO;E_Freq;E_HM;E_C;CS=pathogenic;dbSNP_137    GT  1/0
chr1    11107061    rs77977199  T   G   .   .   TSA=SNV;E_Freq;dbSNP_137    GT  1/0
chr1    17380507    rs11203289  G   C   .   .   TSA=SNV;E_MO;E_Freq;E_1000G;CS=pathogenic;dbSNP_137 GT  1/1

the I run picard liftovervcf and get:

Exception in thread "main" java.lang.IllegalStateException: Key CS found in VariantContext field INFO at chr1:11082610 but this key isn't defined in the VCFHeader.  We require all VCFs to have complete VCF headers by default.

Thank you :).

ADD REPLY
0
Entering edit mode

I guess I don't understand the error, I added the line in bold to ##INFO:

##fileformat=VCFv4.1
##fileDate=20130610
##source=ensembl;version=75;url=http://e75.ensembl.org/homo_sapiens
##reference=ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/
##INFO=<ID=TSA,Number=0,Type=String,Description="Type of="" sequence="" alteration.="" Child="" of="" term="" sequence_alteration="" as="" defined="" by="" the="" sequence="" ontology="" project."="">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency"="">
**##INFO=<ID=CS,Number=0,Type=String,Description="Classification">**
##INFO=<ID=E_MO,Number=0,Type=Flag,Description="Multiple_observations.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_1000G,Number=0,Type=Flag,Description="1000Genomes.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_HM,Number=0,Type=Flag,Description="HapMap.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
##INFO=<ID=E_Freq,Number=0,Type=Flag,Description="Frequency.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
 ##INFO=<ID=E_C,Number=0,Type=Flag,Description="Cited.http: www.ensembl.org="" info="" docs="" variation="" data_description.html#evidence_status"="">
 ##INFO=<ID=dbSNP_137,Number=0,Type=Flag,Description="Variants (including="" SNPs="" and="" indels)="" imported="" from="" dbSNP"="">
 **##INFO=<ID=GT,Number=0,Type=String,Description="Tag.">**
 #CHROM POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Individual
chr1    11082610    rs11689432  G   A   .   .   TSA=SNV;E_MO;E_Freq;E_HM;E_C;CS=pathogenic;dbSNP_137    GT  1/0
chr1    11107061    rs77977199  T   G   .   .   TSA=SNV;E_Freq;dbSNP_137    GT  1/0
chr1    17380507    rs11203289  G   C   .   .   TSA=SNV;E_MO;E_Freq;E_1000G;CS=pathogenic;dbSNP_137 GT  1/1

Key GT found in VariantContext field FORMAT at chrX:153247722 but this key isn't defined in the VCFHeader.  We require all VCFs to have complete VCF headers by default.

Thank you :).

ADD REPLY

Login before adding your answer.

Traffic: 1789 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6