Question: Samtools mpileup: read base column symbols
0
gravatar for Marvin
4.4 years ago by
Marvin150
Australia
Marvin150 wrote:

I have checked the documenation here:

http://samtools.sourceforge.net/pileup.shtml

Here is an output of column 5 (read base column):

samtools view -h bamfile.bam | samtools mpileup -B -f reference.fasta - | awk '{split($5,chars,""); for(i=1;i<=length($5);i++) print chars[i]}' | sort | uniq -c

12864 $
 527 *
 590 +
 236 -
59664 .
 634 1
 130 2
  53 3
   6 4
   3 5
   3 =
6205 A
4793 C
12863 F
4787 G
   7 M
  15 N
6692 T
12863 ^

The F occurs because the mapping quality is always F (thus I always see "^F" where a read starts). I find it a little strange that the mapping quality is F for every read but that's actually not my question

[maybe it's because I chose -B ... need to continue reading what that BAQ computation is about ... I noticed that I lose information in the mpileup output if I don't use -B, that's why I use it]

I wonder about these symbols as they are not mentioned in the docs (at least with respect to column 5): * =

The = symbol occurs in only 1 line whereas the * symbols occurs in multiple lines. The * symbols also occurs in that line where = occurs, so here's an example showing both:

gi_identifier    378981    T    9    .$*+1=$*+1A.*+1=$*+1=$*+1M$*+1M$G+1T    ]]]JJJ]]J

What do these symbols mean?

ADD COMMENTlink written 4.4 years ago by Marvin150
1

Make sure to refer to the correct documentation: http://www.htslib.org/doc/samtools.html

ADD REPLYlink written 4.4 years ago by Istvan Albert ♦♦ 81k

Oh and I am also wondering about that "+1M" ... ?

ADD REPLYlink written 4.4 years ago by Marvin150

It does look weird to me as well. What aligner produced this bam file? Maybe post the sam header.

ADD REPLYlink written 4.4 years ago by dariober10k

The bam file in the command is the result of a Geneious function that lets you extract the SNP columns of an alignment/bam:

https://support.geneious.com/entries/22583802-Exporting-snp-data-for-phylogenetic-analysis

When I use the same command on the original bam file, I only get expected/documented symbols.

I guess I'll just exclude Geneious from the pipeline and identify the SNP columns myself.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Marvin150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 901 users visited in the last hour