Question: bcftools consensus not working when using -H flag
0
gravatar for kirannbishwa01
9 days ago by
United States
kirannbishwa01790 wrote:

I am trying to get both the left and right reference genome using a sample name in phase VCF file.

This works:

bcftools consensus -f lyrata_chr01-short.fasta phasedVCF-short.vcf.gz -s ms01e -H 1 > ms01e-left.fa

This doens't work though: when I am trying to make the new Ref Sequence by imputing genotype from the right haplotype.

bcftools consensus -f lyrata_chr01-short.fasta phasedVCF-short.vcf.gz -s ms01e -H 2 > ms01e-right.fa
Broken VCF, too few alts at 1:141

What is the issue ?

ADD COMMENTlink modified 4 days ago by finswimmer2.1k • written 9 days ago by kirannbishwa01790

Hello kirannbishwa01,

could you show us the vcf entry of the position 1:141?

fin swimmer

ADD REPLYlink written 9 days ago by finswimmer2.1k

@finswimmer:

Here is my VCF data:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  MA605   MA611   MA622   MA625   MA629   Ncm8    Sp154   Sp164   Sp21    Sp3 Sp76    SpNor33 ms01e   ms02g   ms03g   ms04h
1   141 .   C   T   .   .   .   GT:PG:PG_al:PI  .   .   .   0|1:0|1:C|T:10763   .   .   .   .   .   .   .   .   .   .   .   .
1   3697    .   T   C   .   .   .   GT:PG:PG_al:PI  .   0|0:0|0:T|T:11398   .   0|1:0|1:T|C:8039    0|1:0|1:T|C:8202    0|1:0|1:T|C:4616    0|1:0|1:T|C:3938    .   0|1:0|1:T|C:5102    .   .   0|1:0|1:T|C:17340   1|0:1|0:C|T:1   .   .   .


The GT field is empty at POS 141 for SAMPLE ms01e, but in that situation it should by default pick a REF allele. Isn't it? And, it only is the problem when doing -H 2 not -H 1.

Thanks,

ADD REPLYlink modified 8 days ago • written 8 days ago by kirannbishwa01790

My guess is that at position 141, Alternate allele is supported by only one sample. Imputation might be considering all samples or certain percentage. Probably that is the reason it is failing. To cross check, try to fill up record 141 with dummy data and re run the imputation code.

ADD REPLYlink modified 8 days ago • written 8 days ago by cpad01125.3k

@cpad : that is already tested and works. But, I don't want to add a dummy GT in there - which on large data is lots of work. Since, I am trying to create a personal genome here, an empty GT = "." for any sample of interest should be automatically treated as reference allele.

I believe this is just a bug in the program or a feature that was not fixed on final software.

ADD REPLYlink written 8 days ago by kirannbishwa01790

an empty GT = "." for any sample of interest should be automatically treated as reference allele

This is not stated out in the documentation. You could try to include only sites where there is a genotype information.

bcftools consensus -f lyrata_chr01-short.fasta phasedVCF-short.vcf.gz -s ms01e -H 2 -i 'GT!~"\."'> ms01e-right.fa

fin swimmer

ADD REPLYlink written 8 days ago by finswimmer2.1k

@ finswimmer: I am still getting exactly the same error message.

ADD REPLYlink written 8 days ago by kirannbishwa01790

@fin swimmer: I have been trying to fix this issue by playing with -i and -e using the manual. But, still cannot find the fix.

ADD REPLYlink written 8 days ago by kirannbishwa01790

Hm,

it seems that -i and -e doesn't look only on the specified sample. What maybe work is first filter sites with no genotype for the sample using bcftools view and use then bcftools consensus.

$ bcftools view -s ms01e phasedVCF-short.vcf.gz -U|bcftools consensus -f lyrata_chr01-short.fasta - -H 2

According to your question here, this could work either:

$ bcftools +fixploidy phasedVCF-short.vcf.gz -- -f 2|bcftools +missing2ref - -- -p|bcftools consensus -f lyrata_chr01-short.fasta - -s ms01e -H 2

fin wimmer

ADD REPLYlink modified 8 days ago • written 8 days ago by finswimmer2.1k

I will try that tomorrow. I tried to take a different approach to this problem, but another issue came up. I raised a github issue. If you can shed light on the problem. Thanks for beginning helpful !

ADD REPLYlink written 8 days ago by kirannbishwa01790

What version of bcftools are you running, can you try with the latest github version? I believe it should work, the tests in contain cases with missing genotypes.

ADD REPLYlink written 8 days ago by pd3330

Meanwhile I'm pretty sure that the ploidy is the problem. It could be that just fixing this is enough, and there is no need to replace it with 0|0.

fin swimmer

ADD REPLYlink written 7 days ago by finswimmer2.1k

@fin swimmer: Did you had chance to look at this issue ?

ADD REPLYlink written 8 days ago by kirannbishwa01790
0
gravatar for finswimmer
4 days ago by
finswimmer2.1k
Germany
finswimmer2.1k wrote:

For documentation:

The discussion went on in the issue tracker of bcftools, because the OP guess that this is a bug. And he/she was right. This fix is already done.

Fixing the ploidy was also a working work-around. But it needs extra steps for compressing and indexing the modified vcf as bcftools consensus don't like direct streaming here.

fin swimmer

ADD COMMENTlink written 4 days ago by finswimmer2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 655 users visited in the last hour