Question: The pruned in file by bcftools gives error "Wrong VCF header" while calculating kinship matrix using Rvtests
0
gravatar for AVA
12 days ago by
AVA10
Finland
AVA10 wrote:

I want to calculate kinship matrix for autosomes in order to run a LMM. For this purpose, I first pruned each autosome using bcftools (version 1.11) and excluded the snps that had r2 greater than 0.15 by utilizing the command:

/bcftools +prune -m 0.15 -w 1000 input.vcf -Ob -o output.vcf

When I use the file output.vcf for calculating the kinship matrix using Rvtests utilizing the command:

/vcf2kinship --inVcf output.vcf --bn --out kinship_output

I get an error: 'Wrong VCF header'

However, if I use the above command with input.vcf (i.e., the original file without pruning from the first command), the same command runs with the Rvtest and I get a kinship matrix. I assume while pruning, I am ouputting the vcf file in a wrong format. I have tried reading the bcftools tutorial but I have no clue where I am going wrong.

software error • 212 views
ADD COMMENTlink modified 10 days ago • written 12 days ago by AVA10

Can you post the .vcf header for output.vcf

ADD REPLYlink written 12 days ago by 4galaxy77310

When I open the file,I get a warning: may be a binary file. See it anyway?

When I use an extension vcf.gz in the output file while pruning. The file header looks like below:

BCF^B^B^A<b6>^A^@##fileformat=VCFv4.2 A lot of meta-data in between followed by the usual vcf header CHROM POS etc followed by rest of the chromosome file.

Again, I get the same error: 'Wrong VCF header'

ADD REPLYlink written 11 days ago by AVA10

The header is compressed so you need to use `bcftools view -h output.vcf'. Please edit the output into your original question.

ADD REPLYlink written 11 days ago by 4galaxy77310

Do you mean in this command?

/bcftools +prune -m 0.15 -w 1000 input.vcf -Ob -o output.vcf

Can you please elaborate?

ADD REPLYlink written 11 days ago by AVA10

No. We need to see what the header of your vcf file looks like before we can diagnose the error. So you need to run bcftools view -h output.vcf which prints out just the header of the file which doesn't work in /vcf2kinship. Then edit the header into your original question.

ADD REPLYlink written 11 days ago by 4galaxy77310

Using bcftools view -h output.vcf, the header has IDs of the participants as: number_string

ADD REPLYlink written 11 days ago by AVA10

Can you edit your question to include the entire output of bcftools view -h output.vcf.

ADD REPLYlink written 11 days ago by 4galaxy77310

Thank you for your assistance. I have edited the original question

ADD REPLYlink written 11 days ago by AVA10

Are you sure that's the whole header? It should look something like this:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##source=atlas
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=AB,Number=1,Type=Float,Description="Allelic imbalance">
##FORMAT=<ID=AI,Number=1,Type=Float,Description="Binomial probability of allelic imbalance if Hz site">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Phred-scaled normalized genotype likelihoods">
##contig=<ID=1>
##contig=<ID=2>
##contig=<ID=3>
##contig=<ID=4>
##contig=<ID=5>
##contig=<ID=6>
##contig=<ID=7>
##contig=<ID=8>
##contig=<ID=9>
##contig=<ID=10>
##contig=<ID=11>
##contig=<ID=12>
##contig=<ID=13>
##contig=<ID=14>
##contig=<ID=15>
##contig=<ID=16>
##contig=<ID=17>
##contig=<ID=18>
##contig=<ID=19>
##contig=<ID=20>
##contig=<ID=21>
##contig=<ID=22>
##bcftools_viewVersion=1.9-207-g2299ab6+htslib-1.9-271-g6738132
##bcftools_viewCommand=view -h XN224_MaximumLikelihood.bgz.vcf.gz; Date=Tue Feb 23 18:13:25 2021
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  XN224
ADD REPLYlink written 11 days ago by 4galaxy77310

Sorry, I misunderstood you. I am quite new to working in genomics. Using bcftools view -h output.vcf, the header of a an output file for e.g. for chr10 appears as below. I am working on single chromosome imputed data.

fileformat=VCFv4.2

FILTER=<id=pass,description="all filters="" passed"="">

filedate=20190717

source="beagle.27Jan18.7e1.jar (version 4.1)"

FORMAT=<id=gt,number=1,type=string,description="genotype">

FORMAT=<id=ds,number=a,type=float,description="estimated alt="" dose="" [p(ra)="" +="" p(aa)]"="">

FORMAT=<id=gp,number=g,type=float,description="estimated genotype="" probability"="">

contig=<id=chr10>

Some more lines

INFO=<id=an,number=1,type=integer,description="total number="" of="" alleles="" in="" called="" genotypes"="">

INFO=<id=ac,number=a,type=integer,description="allele count="" in="" genotypes"="">

INFO=<id=ns,number=1,type=integer,description="number of="" samples="" with="" data"="">

INFO=<id=ac_hom,number=a,type=integer,description="allele counts="" in="" homozygous="" genotypes"="">

INFO=<id=ac_het,number=a,type=integer,description="allele counts="" in="" heterozygous="" genotypes"="">

INFO=<id=ac_hemi,number=a,type=integer,description="allele counts="" in="" hemizygous="" genotypes"="">

INFO=<id=af,number=a,type=float,description="allele frequency"="">

INFO=<id=maf,number=a,type=float,description="minor allele="" frequency"="">

INFO=<id=hwe,number=a,type=float,description="hwe test="" (pmid:15789306)"="">

INFO=<id=exchet,number=a,type=float,description="probability of="" excess="" heterozygosity"="">

bcftools_pluginCommand=plugin fill-tags -Ou; Date=Wed Jan 22 17:06:56 2020

INFO=<id=info,number=1,type=float,description="impute2 info="" score"="">

bcftools_pluginCommand=plugin impute-info -Oz -o Botnia_df2_R4_chr10.vcf.gz; Date=Wed Jan 22 17:06:56 2020

bcftools_viewVersion=1.11+htslib-1.11

bcftools_viewCommand=view -S IDs.txt -o filtered-chr10.vcf /fs/projects/botnia_study/FinnGen_Botnia_data/imputed/Botnia_FinnGen_df2_R4/Botnia_imputed_df2_R4_chr10.vcf.gz; Date=Thu Feb 11 17:49:33 2021

bcftools_viewCommand=view -h pruned-chr10.vcf; Date=Wed Feb 24 13:06:06 2021

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT

ADD REPLYlink modified 11 days ago • written 11 days ago by AVA10

Thank you for your time and suggestions. I used bcftools view (/bcftools view output.vcf > output_vcf.vcf) convert the binary format to vcf format and rvtest accepts this file.

ADD REPLYlink written 10 days ago by AVA10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1143 users visited in the last hour
_