VCF to fasta incorporating heterozygous sites
1
1
Entering edit mode
2.5 years ago
Sarah ▴ 60

Hello, I am trying to generate a consensus fasta file for one sample from an unphased VCF. I have been using bcftools consensus, which works well, but I am running into problems with treating the heterozygous sites. I am not able to adequately phase the data, so I would like to randomly select one allele at each heterozygous site for the reference. bcftools allows options to use ambiguity codes, or to always select the reference allele or always the alternate allele, but each of these options would cause bias in my downstream analyses.

Is there a program that can either phase a VCF randomly, or that can generate a consensus fasta while randomly selecting one allele per heterozygous site?

Thank you!

(PS this is my bcftools consensus command):

bcftools consensus --fasta-ref reference.fasta --sample SampleName -M N -a N -H 1 MyVCF.vcf.gz
fasta heterozygous VCF unphased • 1.8k views
ADD COMMENT
2
Entering edit mode

I would explore writing a simple, text transformation tool to modify the genotype in the VCF file for each heterozygous genotype. Basically replacing 0/1 with either 0/0 or 1/1

ADD REPLY
3
Entering edit mode
2.5 years ago
Sarah ▴ 60

If anyone runs into the same task, I ended up using bcftools consensus with the --haplotype I option in order to interpret heterozygous sites as IUPAC codes, then I used seqtk randbase to replace each IUPAC code randomly with one of the two possible alleles. (would work as long as there are no other IUPAC codes in your consensus except the heterozygous sites, which was true in my case)

ADD COMMENT
1
Entering edit mode

thanks for following up with a solution

ADD REPLY

Login before adding your answer.

Traffic: 2478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6