Question

Samtools consensus sequence

2

Entering edit mode

8.2 years ago

Sissi ▴ 60

Hi guys!

It's probably the same question over and over but I can't find the solutions.

I mapped some contigs on a Plant reference genome with BWA, sorted the file with samtools and got the coverage by typing:

samtools depth my.bam > qry-depth>
wc -l qry-depth

The results of this command / sequence legth * 100 to have the % genome covered.

Then, I wanted to extract the consensus sequence from the bam file:

samtools mpileup -uf reference.fasta my.bam | bcftools call -mv -Oz -o calls.vcf.gz>

tabix calls.vcf.gz> cat reference.fasta | bcftools consensus calls.vcf.gz > con.fasta

And here I am. Now I have a problem and a general question.

For two files I got this error message after the first mpileup commad:
```
The QS annotation not present at gi|33590464|gb|AY244516.1|:2123 (NCBI header)
```
So, I can't get the consensus.
For other samples, I know the contigs don't cover the entire genome, so how does sometools export the consensus? By keeping in the not covered region the reference sequence? Is there a way to visualize this?

I hope I'm clear and not repetitive,

Thanks for your help :)

snp sequence • 9.2k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.2 years ago by Sissi ▴ 60

0

Entering edit mode

Thank you!

Yes, I realized that I changed the fasta file after mapping and that was the mistake ;)

ADD REPLY • link updated 5.6 years ago by Ram 43k • written 8.1 years ago by Sissi ▴ 60

Istvan Albert · Answer 1 · 2016-03-01

4

Entering edit mode

8.1 years ago

stolarek.ir ▴ 700

samtools mpileup -uf dna.fa sorted.bam | bcftools view -cg - | perl vcfutils.pl vcf2fq -d 2 > cns.fq

samtools 1.9, bcftools 1.9. Make sure you use the same reference fasta to which you mapped your BAM.

Not covered positions represented as N with this pipeline

ADD COMMENT • link updated 8.1 years ago by Istvan Albert 100k • written 8.1 years ago by stolarek.ir ▴ 700

0

Entering edit mode

But if the reads that have covered the gaps region in the reference genome, how can it output the consensus sequence that replace the gap with the covered reads? Thanks!

ADD REPLY • link 7.0 years ago by ant_genome • 0

0

Entering edit mode

I absolutely don't understand your question, but you might be asking how it reports consensus sequeunce where there is no coverage. It reports it based on the reference file, and it is a downside of the pipeline. In the next step you should mask the output and replace those 0 coverage positions with N

ADD REPLY • link 7.0 years ago by stolarek.ir ▴ 700