Question: bcftools and VCFTools consensus functionality both not working unless there are INDELS in VCF input
0
gravatar for rc16955
3 months ago by
rc1695560
rc1695560 wrote:

Hi all,

This is a slightly strange issue and I'm sure that there's something simple I'm doing wrong, but if anyone could help me see what it is I'd be thrilled as this has been a very annoying issue for me!

Here is the situation. I have a big multi-sample VCF file with every variant found by sequencing. I also have a filtered version of this file which has low-coverage SNPs, all INDELs, and all variants that are only unique to the reference genome removed.

I would like to make a fasta files for each sample based off the filtered multi-sample VCF file. My first plan was to extract each individual from the filtered VCF file using VCFTools, then generate a fasta file vcf-consensus:

vcftools --vcf ../Main_NoInvariants.vcf --indv CL273_8_sorted.bam --positions positions.txt --recode --out CL273_8
bgzip -c CL273_8.recode.vcf > CL273_8.vcf.gz
tabix -p CL273_8.vcf.gz
cat reference.fa | bcftools consensus CL273_8.vcf.gz > CL273_8.fa

However, I soon realised that this could be done much more quickly in bcftools:

bcftools consensus -i -s CL273_8_sorted.bam -f reference.fa Main_NoInvariants.vcf.gz > CL237_8_1.fa

But... Neither of these worked. Both produced fasta files but they were identical to the reference (and I have confirmed that there are indeed differences between this sample and the reference in the filtered VCF file). No error message is produced in either case. I've also tried extracting individuals with vcftools using the --recode-INFO-all, to preserve info, but it didn't make any difference.

What's weird though is that both methods DO work when I use the original, unfiltered VCF file. From a lot of fiddling about I've realised that the problem seems to be the lack of INDELs in the filtered VCF file. I know this because both methods will work on virtually any VCF file, then as soon as I remove the INDELs, it stops working.

I've tried two ways to remove INDELs; first just using VCFTools --remove-indels and second using sed to delete any line with "INDEL" in it. No good.

So I'm completely at a loss now. Any help much appreciated.

Thank you!

vcftools bcftools fasta vcf • 223 views
ADD COMMENTlink written 3 months ago by rc1695560

Hello rc16955 ,

  • Is the sample name really CL273_8_sorted.bam (with .bam)?
  • Can you please show some example lines of your vcf file including the complete header?

fin swimmer

ADD REPLYlink written 3 months ago by finswimmer12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2102 users visited in the last hour