Question: bcftools consensus not working when using -H flag
0
gravatar for kirannbishwa01
16 months ago by
kirannbishwa011.1k
United States
kirannbishwa011.1k wrote:

I am trying to get both the left and right reference genome using a sample name in phase VCF file.

This works:

bcftools consensus -f lyrata_chr01-short.fasta phasedVCF-short.vcf.gz -s ms01e -H 1 > ms01e-left.fa

This doens't work though: when I am trying to make the new Ref Sequence by imputing genotype from the right haplotype.

bcftools consensus -f lyrata_chr01-short.fasta phasedVCF-short.vcf.gz -s ms01e -H 2 > ms01e-right.fa
Broken VCF, too few alts at 1:141

What is the issue ?

ADD COMMENTlink modified 16 months ago by finswimmer12k • written 16 months ago by kirannbishwa011.1k

Hello kirannbishwa01,

could you show us the vcf entry of the position 1:141?

fin swimmer

ADD REPLYlink written 16 months ago by finswimmer12k

@finswimmer:

Here is my VCF data:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  MA605   MA611   MA622   MA625   MA629   Ncm8    Sp154   Sp164   Sp21    Sp3 Sp76    SpNor33 ms01e   ms02g   ms03g   ms04h
1   141 .   C   T   .   .   .   GT:PG:PG_al:PI  .   .   .   0|1:0|1:C|T:10763   .   .   .   .   .   .   .   .   .   .   .   .
1   3697    .   T   C   .   .   .   GT:PG:PG_al:PI  .   0|0:0|0:T|T:11398   .   0|1:0|1:T|C:8039    0|1:0|1:T|C:8202    0|1:0|1:T|C:4616    0|1:0|1:T|C:3938    .   0|1:0|1:T|C:5102    .   .   0|1:0|1:T|C:17340   1|0:1|0:C|T:1   .   .   .


The GT field is empty at POS 141 for SAMPLE ms01e, but in that situation it should by default pick a REF allele. Isn't it? And, it only is the problem when doing -H 2 not -H 1.

Thanks,

ADD REPLYlink modified 16 months ago • written 16 months ago by kirannbishwa011.1k

My guess is that at position 141, Alternate allele is supported by only one sample. Imputation might be considering all samples or certain percentage. Probably that is the reason it is failing. To cross check, try to fill up record 141 with dummy data and re run the imputation code.

ADD REPLYlink modified 16 months ago • written 16 months ago by cpad011212k

@cpad : that is already tested and works. But, I don't want to add a dummy GT in there - which on large data is lots of work. Since, I am trying to create a personal genome here, an empty GT = "." for any sample of interest should be automatically treated as reference allele.

I believe this is just a bug in the program or a feature that was not fixed on final software.

ADD REPLYlink written 16 months ago by kirannbishwa011.1k

an empty GT = "." for any sample of interest should be automatically treated as reference allele

This is not stated out in the documentation. You could try to include only sites where there is a genotype information.

bcftools consensus -f lyrata_chr01-short.fasta phasedVCF-short.vcf.gz -s ms01e -H 2 -i 'GT!~"\."'> ms01e-right.fa

fin swimmer

ADD REPLYlink written 16 months ago by finswimmer12k

@ finswimmer: I am still getting exactly the same error message.

ADD REPLYlink written 16 months ago by kirannbishwa011.1k

@fin swimmer: I have been trying to fix this issue by playing with -i and -e using the manual. But, still cannot find the fix.

ADD REPLYlink written 16 months ago by kirannbishwa011.1k

Hm,

it seems that -i and -e doesn't look only on the specified sample. What maybe work is first filter sites with no genotype for the sample using bcftools view and use then bcftools consensus.

$ bcftools view -s ms01e phasedVCF-short.vcf.gz -U|bcftools consensus -f lyrata_chr01-short.fasta - -H 2

According to your question here, this could work either:

$ bcftools +fixploidy phasedVCF-short.vcf.gz -- -f 2|bcftools +missing2ref - -- -p|bcftools consensus -f lyrata_chr01-short.fasta - -s ms01e -H 2

fin wimmer

ADD REPLYlink modified 16 months ago • written 16 months ago by finswimmer12k

I will try that tomorrow. I tried to take a different approach to this problem, but another issue came up. I raised a github issue. If you can shed light on the problem. Thanks for beginning helpful !

ADD REPLYlink written 16 months ago by kirannbishwa011.1k

What version of bcftools are you running, can you try with the latest github version? I believe it should work, the tests in contain cases with missing genotypes.

ADD REPLYlink written 16 months ago by pd3340

Meanwhile I'm pretty sure that the ploidy is the problem. It could be that just fixing this is enough, and there is no need to replace it with 0|0.

fin swimmer

ADD REPLYlink written 16 months ago by finswimmer12k

@fin swimmer: Did you had chance to look at this issue ?

ADD REPLYlink written 16 months ago by kirannbishwa011.1k
0
gravatar for finswimmer
16 months ago by
finswimmer12k
Germany
finswimmer12k wrote:

For documentation:

The discussion went on in the issue tracker of bcftools, because the OP guess that this is a bug. And he/she was right. This fix is already done.

Fixing the ploidy was also a working work-around. But it needs extra steps for compressing and indexing the modified vcf as bcftools consensus don't like direct streaming here.

fin swimmer

ADD COMMENTlink written 16 months ago by finswimmer12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 764 users visited in the last hour