IUAPC codes in consensus sequence ? samtools mpileup / bcftools call / bcftools consensus
1
1
Entering edit mode
7.1 years ago
Kevin D ▴ 30

Hello everyone!

I mapped Forward and Reverse reads from one plant genome against a reference fasta sequence. As expected, there are varaints (SNP and indels). I am interested in retrieving the consensus sequence for downstream analysis. The reads come from a diploid plant genome and the gene of interest is likely in single copy. However, my plant genome is clearly not 100% homozygous. So, in the consensus sequence I should see homozygous variants and heterozygous variants coded with IUAPC letters. I ran this codes and everything worked well:

samtools mpileup -d8000 -A -uf ref.fasta sample.sorted.bam > sample.mpileup
bcftools call -mv -Oz -o sample.calls.vcf.gz sample.mpileup
tabix sample.calls.vcf.gz
cat ref.fasta | bcftools consensus sample.calls.vcf.gz > sample.consensus.fasta

The only problem is that I have no heterozygous site in the consensus sequence despite I clearly see some in the raw mapping alignment. How to get heterozygous site codes with IUAPC in the consensus sequence ?

Kevin

Ps: I tried the -i option in bcftools consensus but then all variant sites were codes heterozygous. Also the coverage is quite OK (20x)

bcftools consensus IUAPC • 2.6k views
ADD COMMENT
0
Entering edit mode
7.1 years ago

as far as I can see there is a --iupac-codes option: in https://github.com/samtools/bcftools/blob/master/consensus.c#L626

 fprintf(stderr, "    -i, --iupac-codes          output variants in the form of IUPAC ambiguity codes\n");
ADD COMMENT
1
Entering edit mode

Yes, that is this option I mentionned I my post scriptum but when I turn it on, then all the variant positions appear heterozygous. In fact that is not true: I checked the mapping output and some variant are homozygous and others are heterozygous between reads, that is because they reflect two different alleles... I think that the reason why I got all variant heterozygous is beacause the comparison is made between reads from my plant genome and the reference and not just between my reads despite they sometimes carry two alleles (diploid plant, highly heterozygous, single copy gene) in a 1:1 proportion

ADD REPLY
0
Entering edit mode

Did you find a solution to this? Please share if you did.

ADD REPLY

Login before adding your answer.

Traffic: 1955 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6