Question: IUAPC codes in consensus sequence ? samtools mpileup / bcftools call / bcftools consensus
gravatar for Kevin D
3.9 years ago by
Kevin D30
INRA France
Kevin D30 wrote:

Hello everyone!

I mapped Forward and Reverse reads from one plant genome against a reference fasta sequence. As expected, there are varaints (SNP and indels). I am interested in retrieving the consensus sequence for downstream analysis. The reads come from a diploid plant genome and the gene of interest is likely in single copy. However, my plant genome is clearly not 100% homozygous. So, in the consensus sequence I should see homozygous variants and heterozygous variants coded with IUAPC letters. I ran this codes and everything worked well:

samtools mpileup -d8000 -A -uf ref.fasta sample.sorted.bam > sample.mpileup
bcftools call -mv -Oz -o sample.calls.vcf.gz sample.mpileup
tabix sample.calls.vcf.gz
cat ref.fasta | bcftools consensus sample.calls.vcf.gz > sample.consensus.fasta

The only problem is that I have no heterozygous site in the consensus sequence despite I clearly see some in the raw mapping alignment. How to get heterozygous site codes with IUAPC in the consensus sequence ?


Ps: I tried the -i option in bcftools consensus but then all variant sites were codes heterozygous. Also the coverage is quite OK (20x)

iuapc bcftools consensus • 1.6k views
ADD COMMENTlink modified 3.9 years ago by Pierre Lindenbaum133k • written 3.9 years ago by Kevin D30
gravatar for Pierre Lindenbaum
3.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:

as far as I can see there is a --iupac-codes option: in

 fprintf(stderr, "    -i, --iupac-codes          output variants in the form of IUPAC ambiguity codes\n");
ADD COMMENTlink written 3.9 years ago by Pierre Lindenbaum133k

Yes, that is this option I mentionned I my post scriptum but when I turn it on, then all the variant positions appear heterozygous. In fact that is not true: I checked the mapping output and some variant are homozygous and others are heterozygous between reads, that is because they reflect two different alleles... I think that the reason why I got all variant heterozygous is beacause the comparison is made between reads from my plant genome and the reference and not just between my reads despite they sometimes carry two alleles (diploid plant, highly heterozygous, single copy gene) in a 1:1 proportion

ADD REPLYlink written 3.9 years ago by Kevin D30

Did you find a solution to this? Please share if you did.

ADD REPLYlink written 2.3 years ago by deepti1rao30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1926 users visited in the last hour