Unexpected characters at consensus fasta generated by mpileup
1
0
Entering edit mode
4.9 years ago
gokberk ▴ 90

Hi all,

I'm using mpileup function on Linux (compiler: Ubuntu 4.8.4, samtools version 1.6-5-gfe1a2e9) to generate a consensus fasta using the following command:

samtools mpileup -uf human_nanogv2.fa --bam-list bam_list | bcftools call -c | vcfutils.pl vcf2fq > consensus.fa

This command creates a consensus fasta, but with some characters other than ATCG such as M, R, Y, W and S. A sample from the generated consensus sequence is below:

AAGAMACAGTCTCGGGCCGGGCGTGGTGG
CTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGATYRCCTGAGGTCAGG
AGTTCGAGACCAGCCTGGSCAACAYGGTGAAACCCCCATCTCTACTAAAATACAAAAAAT
TAGCTGGGCGTGGTGGCATGCGCCTGTAGTCCCAGCTACTCGGGAGGCTGAGGCAGGAGA
ATTGCTTGAACCCGGGAGGYGGAGGTCAGTGAGCTGAGATTGCACCACTGCACTCCAGCC
TGGGCGACAGAGCGAGACTTCTGTCTCAAAAAGAAAAAAAAAGAAGATGCTTATCATGGG
CCGGGCGCAGTGGCTCACACCTGTAATCCCAGCACTTTGGGAGGCCGAGGCAGGCGGATC
ACCTGAGGTCAGGAGTTCAAGACCAGCCTGGCCAACATAGTGAAACCCTGTCTCTACTAA
AAATACAAGAAAATTAGCTGGGCATGGTGGCRCGTGCCTGTAGTCCCAGCTACTTGGGAG
GCTGAGGCAGGAGAATCACTTGAACCCAGGAGGTGGAGGTTGCAGTGAGCCGAGATTGCG
CCACTGCACTCCAGCNTGGGCAACAGAGTGAGACTCTGTCTCAGAAAAAAAAAAAAAAAA
AAAAAAAAAGATGCCTATGGCCGGGCGAAGTGTCTCACACCTGTAATCCCAGCATTTTGG
GAGGCCAAGGCGGCTAGATCACTTGAGGTCAGGAGTTCAAGACCAGCCTGGCCAACATGG
TGAAACACTGTCTCTACTAAAAATACAAAGAATTAGCTAGGCATGGTAGCGGGTGCCTGT
AATCACACCTACTCAGGAAGCTGAGGNNNNNNNNTCTTTTTTTCTTTTTTTTTTGAGACA
GAGTTTTGCTCTTGTTGCCCAGGCTRGAGTGCARTGGCRYGATCTTGGCTCACYGCAACC
TCCRCCTCCCRGGTTCAAGTGATTCTCCTGCCTCAGCCTCCCRAGTAGCTGGGATTACAG
GCATGYGCCACCACGCCCRGCTAATTTTGTATTTTTAGTAGAGACGGGGTTTCWCCATGT
TGGYCAGGCTGGTCTYGAACTCCTGACCTCAGGTGATCCACCYRCCTCRGCCTC

I was wondering if this is an issue about mpileup and my fasta is corrupted or if these letters indicate indels etc. However, I could not find any documentation about such letters online. I would be more than happy if you could help me with this.

Best regards, Gökberk

samtools mpileup consensussequence • 849 views
ADD COMMENT
3
Entering edit mode
4.9 years ago

These are standard IUPAC codes for ambiguous bases and the result of your vcf2fq call.

A potential solution to avoid these can be found here: Generate consensus sequence from BAM without ambiguity codes

ADD COMMENT

Login before adding your answer.

Traffic: 2015 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6