Question: how to merge gvcf using bcftools
0
gravatar for evelyn
4 weeks ago by
evelyn60
evelyn60 wrote:

Hello Everyone,

I created gvcf files using bcftools:

bcftools mpileup -Ov --gvcf 0 -f ref.fa example.sorted.bam | bcftools call -m --gvcf 0 -o example.vcf

I excluded variants other than SNPs using:

bcftools view --exclude-types indels,mnps,bnd,other example.vcf -o example_1.vcf

Then I want to merge gvcf files using bcftools:

bcftools merge --file-list sample.txt -g ref.fa -O v -o merge.vcf

But it is not working. I am getting an error like Failed to merge alleles. I am not sure if I am using correct commands. Thank you so much for your help and time!

snp • 234 views
ADD COMMENTlink written 4 weeks ago by evelyn60

Failed to merge alleles.

Is this the whole error message? If not, please post also the rest.

Thanks a lot.

fin swimmer

ADD REPLYlink written 4 weeks ago by finswimmer12k

Hello,

I got:

The REF prefixes differ: AGT vs AGAAAGAATGAAAGAATG (3,18)
Failed to merge alleles at ch01:8381 in example_2.vcf.gz

I checked the block with this position in individual gvcf file:

ch01    8277    .   TGT .   .   .   END=8430;MinDP=67   GT:DP   0/0:67

In IGV, this position has the following information,

Total count: 103
A: 96 (93%, 44+, 52- )
C: 0
G: 0
T: 7 (7%, 1+, 6- )
N: 0
ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by evelyn60

Hello again,

could you please try to run bcftools norm on all your individual gvcf files before merging?

bcftools norm -f genome.fa input.gvcf > output.gvcf

You could also run a check if the REF alleles are set correct:

bcftools norm -f genome.fa -c e input.gvcf

fin swimmer

ADD REPLYlink written 29 days ago by finswimmer12k

Hello, Thank you! I used your suggestion and it got merged without error. However, I am still trying to understand the call symbols for each sample after merging. Here is a snippet of the merged file:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample_1    sample_2    sample_3
ch01    778 .   A   .   .   .   END=794;MinDP=0;AN=0    GT:DP   ./.:.   ./.:0   ./.:0
ch01    3358    .   C   .   .   .   END=3434;MinDP=0;AN=2   GT:DP   ./.:0   ./.:0   0/0:1
ch01    3435    .   A   G   225 . VDB=0.42042;SGB=-0.692831;MQSB=1;MQ0F=0;MQ=60;RPB=0.4;MQB=1;BQB=0.6;ICB=1;HOB=0.5;DP=36;DP4=0,2,11,18;MinDP=0;AN=4;AC=3   GT:DP:PL    ./.:0:. 1/1:24:255,72,0 0/1:7:183,0,48
ch01    3636    .   A   .   .   .   END=3651;MinDP=0;AN=4   GT:DP   ./.:0   0/0:12  0/0:1

However, I am not able to interpret the symbols here for each sample call. For example, at position 778, for sample_1, there are no reads in IGV so it gave ./.:. but for sample_2, there are reads and calls same as the reference and it gives ./.:0. I could not find any document to understand these symbols so that I can make a better sense of the merged file.

Additionally, for POS 3435, sample_1 has 38 reads same as reference but it calls ./.:0:. I will appreciate your help. Thank you!

ADD REPLYlink modified 29 days ago • written 29 days ago by evelyn60

Hello evelyn ,

could you please post the vcf lines of the individual gvcf's overlapping the pos 778?

Thanks.

fin swimmer

ADD REPLYlink written 28 days ago by finswimmer12k

Hello finswimmer,

Here are the individual lines including POS 778:

ch01    954 .   A   .   .   .   END=3254;MinDP=0    GT:DP   ./.:0
ch01    778 .   A   .   .   .   END=830;MinDP=0 GT:DP   ./.:0
ch01    778 .   A   .   .   .   END=2330;MinDP=0    GT:DP   ./.:0
ADD REPLYlink modified 28 days ago • written 28 days ago by evelyn60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1415 users visited in the last hour