Question

Lifting over from Hg37 to Hg38 coordinates doesn't work

0

Entering edit mode

4.6 years ago

Jamie Watson ▴ 10

I downloaded these csv files from the following link:

https://human.genome.dating/download/index

These csv files are comma delimited but have white spaces as well. That is each comma is followed by a white space. So I read each file using pandas and remove the white spaces using the following:

df = pd.read_csv('/Atlas/atlas.chr1.csv',skiprows = 3, sep=',',skipinitialspace=True)

And it does remove the white spaces and then I save these csv files again which I then convert to vcf using the following command:

#awk -F, '/^#/ {next;} /^VariantID/ {printf("##fileformat=VCFv4.2\n");split($0,header);for(i=5;i<=NF;i++) printf("##INFO=<ID=%s,Number=1,Type=String,Description=\"%s\">\n",$i,$i);printf("#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\n");next ;} {printf("%s\t%s\t%s\t%s\t%s\t.\t.\t",$2,$3,$1,$5,$6);for(i=5;i<=NF;i++) printf("%s=%s;",header[i],$i);printf("\n");}' /Atlas/cleanAtlas/atlas.chr1.csv > /Atlas/cleanAtlas/atlas_chr1.vcf

But when I use GATK Picard tool to liftover to hg38 coordinates it gives me an error:

Exception in thread "main" java.lang.IllegalStateException: Key  found in VariantContext field INFO at chr1:10642 but this key isn't defined in the VCFHeader.  We require all VCFs to have complete VCF headers by default.

What it means is that it finds a space in INFO column which I don't specify in the header. But when I look into my file I don't see any space because I set skipinitialspace=True. And when I specify space in my header as:

##INFO=<ID= ,Number=1,Type=String,Description="empty">

Then liftover command from GATK Picard works. But then I worry that these might be messing up things and it might not be the correct way to do it. So can someone give an alternative solution or insights into what I might be doing wrong.

liftover genome • 1.4k views

ADD COMMENT • link updated 4.6 years ago by Qiongyi ▴ 180 • written 4.6 years ago by Jamie Watson ▴ 10

score 2 · Accepted Answer · 2020-12-03

Upon reading the error message, it seems to me that your VCF file doesn't contain headers. Check you VCF file. You need something like below in the beginning of your VCF file:

##contig=<ID=chrM,length=16571,assembly=hg19>
##contig=<ID=chr1,length=249250621,assembly=hg19>
##contig=<ID=chr2,length=243199373,assembly=hg19>
##contig=<ID=chr3,length=198022430,assembly=hg19>
##contig=<ID=chr4,length=191154276,assembly=hg19>
##contig=<ID=chr5,length=180915260,assembly=hg19>
##contig=<ID=chr6,length=171115067,assembly=hg19>
...