Question: Error: malformed header while uploading .vcf.gz files to Michigan imputation server
0
gravatar for manprees
14 months ago by
manprees0
manprees0 wrote:

I have been going around a problem for a 2 days while uploading data to the Michigan imputation server. Any help is much appreciated!

I received .bgen files from 23andme and a single .bgen file contains all the participants and the genotype data. As per the guidelines of Michigan imputation server I converted .bgen file to .vcf file using qctools using the command:

$ qctool -g example.bgen -og example.vcf

Then I followed the following steps (using plink) so that the data could be uploaded to the server:

# compress vcf to gz
bgzip -c ${1}.vcf > ${1}.vcf.gz

# make tabix index
tabix -p vcf ${1}.vcf.gz

# split into 22 separate chromosomes.

tabix -h ${1}.vcf.gz 1 > ${1}.chr01.vcf
tabix -h ${1}.vcf.gz 2 > ${1}.chr02.vcf
tabix -h ${1}.vcf.gz 3 > ${1}.chr03.vcf
tabix -h ${1}.vcf.gz 4 > ${1}.chr04.vcf
tabix -h ${1}.vcf.gz 5 > ${1}.chr05.vcf
tabix -h ${1}.vcf.gz 6 > ${1}.chr06.vcf
tabix -h ${1}.vcf.gz 7 > ${1}.chr07.vcf
tabix -h ${1}.vcf.gz 8 > ${1}.chr08.vcf
tabix -h ${1}.vcf.gz 9 > ${1}.chr09.vcf
tabix -h ${1}.vcf.gz 10 > ${1}.chr10.vcf
tabix -h ${1}.vcf.gz 11 > ${1}.chr11.vcf
tabix -h ${1}.vcf.gz 12 > ${1}.chr12.vcf
tabix -h ${1}.vcf.gz 13 > ${1}.chr13.vcf
tabix -h ${1}.vcf.gz 14 > ${1}.chr14.vcf
tabix -h ${1}.vcf.gz 15 > ${1}.chr15.vcf
tabix -h ${1}.vcf.gz 16 > ${1}.chr16.vcf
tabix -h ${1}.vcf.gz 17 > ${1}.chr17.vcf
tabix -h ${1}.vcf.gz 18 > ${1}.chr18.vcf
tabix -h ${1}.vcf.gz 19 > ${1}.chr19.vcf
tabix -h ${1}.vcf.gz 20 > ${1}.chr20.vcf
tabix -h ${1}.vcf.gz 21 > ${1}.chr21.vcf
tabix -h ${1}.vcf.gz 22 > ${1}.chr22.vcf

# create gz files for each chromosome

bgzip -c ${1}.chr01.vcf > ${1}.chr01.vcf.gz
bgzip -c ${1}.chr02.vcf > ${1}.chr02.vcf.gz
bgzip -c ${1}.chr03.vcf > ${1}.chr03.vcf.gz
bgzip -c ${1}.chr04.vcf > ${1}.chr04.vcf.gz
bgzip -c ${1}.chr05.vcf > ${1}.chr05.vcf.gz
bgzip -c ${1}.chr06.vcf > ${1}.chr06.vcf.gz
bgzip -c ${1}.chr07.vcf > ${1}.chr07.vcf.gz
bgzip -c ${1}.chr08.vcf > ${1}.chr08.vcf.gz
bgzip -c ${1}.chr09.vcf > ${1}.chr09.vcf.gz
bgzip -c ${1}.chr10.vcf > ${1}.chr10.vcf.gz
bgzip -c ${1}.chr11.vcf > ${1}.chr11.vcf.gz
bgzip -c ${1}.chr12.vcf > ${1}.chr12.vcf.gz
bgzip -c ${1}.chr13.vcf > ${1}.chr13.vcf.gz
bgzip -c ${1}.chr14.vcf > ${1}.chr14.vcf.gz
bgzip -c ${1}.chr15.vcf > ${1}.chr15.vcf.gz
bgzip -c ${1}.chr16.vcf > ${1}.chr16.vcf.gz
bgzip -c ${1}.chr17.vcf > ${1}.chr17.vcf.gz
bgzip -c ${1}.chr18.vcf > ${1}.chr18.vcf.gz
bgzip -c ${1}.chr19.vcf > ${1}.chr19.vcf.gz
bgzip -c ${1}.chr20.vcf > ${1}.chr20.vcf.gz
bgzip -c ${1}.chr21.vcf > ${1}.chr21.vcf.gz
bgzip -c ${1}.chr22.vcf > ${1}.chr22.vcf.gz

Then I uploaded the zipped gz files to the server and got the error of malformed header:

Unable to parse header with error: Your input file has a malformed header: Unexpected tag Type in line , for input source: /data3/imputation-server/workspace/job-20190822-200703-201/input/files/64ba01fa-b382-4b48-80b7-fdced5a84e11.vcf (see Help).

I understand that the header is malformed. Is it due to the absence of .sample file (which contains header information) while I was converting .bgen to .vcf format using qctool ?(or something else)

It would be really appreciated if you could tell me a way around!

snp software error • 666 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by manprees0
1

Please show us the header and some examples of the variants within the vcf file. Otherwise we can just guess.

Thanks!

fin swimmer

ADD REPLYlink written 14 months ago by finswimmer13k

I created a link where you can see the .vcf file opened in bash and notepad for your reference: .vcf file in bash and notepad

Any help is much appreciated. Thanks!

ADD REPLYlink modified 14 months ago • written 14 months ago by manprees0
1

Using a vcf-validator may help you pinpoint exactly what could be causing the error.

ADD REPLYlink written 14 months ago by jared.andrews077.5k

I used checkVCF to pinpoint the source of error: https://github.com/zhanxw/checkVCF

It showed the following errors:

Line [ %d ] does not have GT defined in the FORMAT field 
Duplicated site [ 1:2526746 ]
Line [ 1845 ] does not have correct column number, exiting!

Does it mean that the .bgen file was not in the right format which i used to convert to .vcf file? Do you know of any way to go about it?

ADD REPLYlink modified 14 months ago by finswimmer13k • written 14 months ago by manprees0

I have no idea what a .bgen file is, nor what its format is like. However, there are only 3 errors (each of which is explained quite plainly). You can correct them manually easily enough. One line is duplicated, one has the improper number of columns based on the headers, and one is missing GT in the format field.

ADD REPLYlink written 13 months ago by jared.andrews077.5k

I am new to this! It would be really helpful if you could help me out with the errors!

ADD REPLYlink written 13 months ago by manprees0

It is best to learn by doing. We don't have the ability to scroll through your file. Based on what you've found, you know there's likely an issue with the header and maybe with certain records. Look at the VCF specs and ensure your file meets them (particularly the metadata/header sections).

ADD REPLYlink written 13 months ago by jared.andrews077.5k

I was able to rectify the duplicated site error using:

( grep  '^#' input.vcf ; grep -v "^#" input.vcf | LC_ALL=C sort -t $'\t' -k1,1 -k2,2n -k4,4 | awk -F '\t' 'BEGIN{ prev="";} {key=sprintf("%s\t%s\t%s",$1,$2,$4);if(key==prev) next;print;prev=key;}' )  > out.vcf

I figured out that using previous commands messed up my VCF header.

But i am still not able to solve the error:

Line [ %d ] does not have GT defined in the FORMAT field

I have defined the format field clearly which does not include GT. I am attaching the snip of the file for your reference.vcf file snip

Line 1841 does not have correct column number, exiting! I am highlighting the line in the snip for your reference. A roadmap to solve these errors will be much appreciated.

vcf snip with line 1841

ADD REPLYlink written 13 months ago by manprees0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1910 users visited in the last hour