Question: IUAPC codes in consensus sequence ? samtools mpileup / bcftools call / bcftools consensus
0
gravatar for Kevin D
8 months ago by
Kevin D10
INRA France
Kevin D10 wrote:

Hello everyone!

I mapped Forward and Reverse reads from one plant genome against a reference fasta sequence. As expected, there are varaints (SNP and indels). I am interested in retrieving the consensus sequence for downstream analysis. The reads come from a diploid plant genome and the gene of interest is likely in single copy. However, my plant genome is clearly not 100% homozygous. So, in the consensus sequence I should see homozygous variants and heterozygous variants coded with IUAPC letters. I ran this codes and everything worked well:

samtools mpileup -d8000 -A -uf ref.fasta sample.sorted.bam > sample.mpileup
bcftools call -mv -Oz -o sample.calls.vcf.gz sample.mpileup
tabix sample.calls.vcf.gz
cat ref.fasta | bcftools consensus sample.calls.vcf.gz > sample.consensus.fasta

The only problem is that I have no heterozygous site in the consensus sequence despite I clearly see some in the raw mapping alignment. How to get heterozygous site codes with IUAPC in the consensus sequence ?

Kevin

Ps: I tried the -i option in bcftools consensus but then all variant sites were codes heterozygous. Also the coverage is quite OK (20x)

iuapc bcftools consensus • 342 views
ADD COMMENTlink modified 8 months ago by Pierre Lindenbaum100k • written 8 months ago by Kevin D10
0
gravatar for Pierre Lindenbaum
8 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum100k wrote:

as far as I can see there is a --iupac-codes option: in https://github.com/samtools/bcftools/blob/master/consensus.c#L626

 fprintf(stderr, "    -i, --iupac-codes          output variants in the form of IUPAC ambiguity codes\n");
ADD COMMENTlink written 8 months ago by Pierre Lindenbaum100k

Yes, that is this option I mentionned I my post scriptum but when I turn it on, then all the variant positions appear heterozygous. In fact that is not true: I checked the mapping output and some variant are homozygous and others are heterozygous between reads, that is because they reflect two different alleles... I think that the reason why I got all variant heterozygous is beacause the comparison is made between reads from my plant genome and the reference and not just between my reads despite they sometimes carry two alleles (diploid plant, highly heterozygous, single copy gene) in a 1:1 proportion

ADD REPLYlink written 8 months ago by Kevin D10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1061 users visited in the last hour