bcftools consensus not working when using -H flag
1
0
Entering edit mode
3.1 years ago
kirannbishwa01 ★ 1.3k

I am trying to get both the left and right reference genome using a sample name in phase VCF file.

This works:

bcftools consensus -f lyrata_chr01-short.fasta phasedVCF-short.vcf.gz -s ms01e -H 1 > ms01e-left.fa


This doens't work though: when I am trying to make the new Ref Sequence by imputing genotype from the right haplotype.

bcftools consensus -f lyrata_chr01-short.fasta phasedVCF-short.vcf.gz -s ms01e -H 2 > ms01e-right.fa
Broken VCF, too few alts at 1:141


What is the issue ?

bcftools vcf software error reference genome • 1.3k views
0
Entering edit mode

Hello kirannbishwa01,

could you show us the vcf entry of the position 1:141?

fin swimmer

0
Entering edit mode

@finswimmer:

Here is my VCF data:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  MA605   MA611   MA622   MA625   MA629   Ncm8    Sp154   Sp164   Sp21    Sp3 Sp76    SpNor33 ms01e   ms02g   ms03g   ms04h
1   141 .   C   T   .   .   .   GT:PG:PG_al:PI  .   .   .   0|1:0|1:C|T:10763   .   .   .   .   .   .   .   .   .   .   .   .
1   3697    .   T   C   .   .   .   GT:PG:PG_al:PI  .   0|0:0|0:T|T:11398   .   0|1:0|1:T|C:8039    0|1:0|1:T|C:8202    0|1:0|1:T|C:4616    0|1:0|1:T|C:3938    .   0|1:0|1:T|C:5102    .   .   0|1:0|1:T|C:17340   1|0:1|0:C|T:1   .   .   .


The GT field is empty at POS 141 for SAMPLE ms01e, but in that situation it should by default pick a REF allele. Isn't it? And, it only is the problem when doing -H 2 not -H 1.

Thanks,

0
Entering edit mode

My guess is that at position 141, Alternate allele is supported by only one sample. Imputation might be considering all samples or certain percentage. Probably that is the reason it is failing. To cross check, try to fill up record 141 with dummy data and re run the imputation code.

0
Entering edit mode

@cpad : that is already tested and works. But, I don't want to add a dummy GT in there - which on large data is lots of work. Since, I am trying to create a personal genome here, an empty GT = "." for any sample of interest should be automatically treated as reference allele.

I believe this is just a bug in the program or a feature that was not fixed on final software.

0
Entering edit mode

an empty GT = "." for any sample of interest should be automatically treated as reference allele

This is not stated out in the documentation. You could try to include only sites where there is a genotype information.

bcftools consensus -f lyrata_chr01-short.fasta phasedVCF-short.vcf.gz -s ms01e -H 2 -i 'GT!~"\."'> ms01e-right.fa


fin swimmer

0
Entering edit mode

@ finswimmer: I am still getting exactly the same error message.

0
Entering edit mode

@fin swimmer: I have been trying to fix this issue by playing with -i and -e using the manual. But, still cannot find the fix.

0
Entering edit mode

Hm,

it seems that -i and -e doesn't look only on the specified sample. What maybe work is first filter sites with no genotype for the sample using bcftools view and use then bcftools consensus.

$bcftools view -s ms01e phasedVCF-short.vcf.gz -U|bcftools consensus -f lyrata_chr01-short.fasta - -H 2  According to your question here, this could work either: $ bcftools +fixploidy phasedVCF-short.vcf.gz -- -f 2|bcftools +missing2ref - -- -p|bcftools consensus -f lyrata_chr01-short.fasta - -s ms01e -H 2


fin wimmer

0
Entering edit mode

I will try that tomorrow. I tried to take a different approach to this problem, but another issue came up. I raised a github issue. If you can shed light on the problem. Thanks for beginning helpful !

0
Entering edit mode

What version of bcftools are you running, can you try with the latest github version? I believe it should work, the tests in contain cases with missing genotypes.

0
Entering edit mode

Meanwhile I'm pretty sure that the ploidy is the problem. It could be that just fixing this is enough, and there is no need to replace it with 0|0.

fin swimmer

0
Entering edit mode

@fin swimmer: Did you had chance to look at this issue ?

0
Entering edit mode
3.1 years ago

For documentation:

The discussion went on in the issue tracker of bcftools, because the OP guess that this is a bug. And he/she was right. This fix is already done.

Fixing the ploidy was also a working work-around. But it needs extra steps for compressing and indexing the modified vcf as bcftools consensus don't like direct streaming here.

fin swimmer