Question: Samtools Mpileup Output
gravatar for Sam
9.7 years ago by
Sam40 wrote:


I have looked all over the web and cannot seem to find what the are the definitions of behind the the <, >, and ~ symbols in mpileup output. For example:

chr6 31506624 T 78 >>><><,$cCccccccccCCcCCCCCCCccCcCCCcCccccCccCCCcCccccccccccCCCCcCCcCcCcCCcccC^~c^~C

I appreciate your help!

samtools sam mpileup • 13k views
ADD COMMENTlink modified 9.6 years ago by Nina380 • written 9.7 years ago by Sam40

since this is still unanswered, i would suggest mailing to samtools support : or samtools author (heng li)

ADD REPLYlink written 9.6 years ago by Doctoroots790
gravatar for Nina
9.5 years ago by
Vancouver, BC, Canada
Nina380 wrote:

The symbols "<" and ">" were added to column 5 a few releases ago. These symbols mean that this position is "covered" by a large gap (ie we're inside the "N" element in the cigar of this read). Note that reads that have a gap at this position still contribute to the total coverage reported in column 4.

For completeness I should also mention that "*" is very similar, but in this case it means the position is covered by a small gap (ie a D element in the cigar)

Also, as described in the link drio provided, "^" is always followed by another symbol. This indicates that we are at the start of a read. If you subtract 33 from the ascii value of the symbol that follows "^" it gives you the mapping quality of the read whose first base covers this position.

In a similar vein "$" means one of the reads covers this position with its last base.

In your example, if you ignore "$" and the two instances of "^~" you will find that you have 78 characters remaining in col 5, which matches the coverage depth reported in col 4.

I learned about this because for the analysis I do, we don't want gaps to contribute to the coverage depth. Here's part of an awk command that we use to adjust the coverage depth to exclude gaps

{l=$4; if($5~/>/ || $5~/</ || $5~/*/ ) {gsub(/\^./,"");l-=split($5,a,"<")-1;l-=split($5,a,">")-1;l-=split($5,a,"*")-1}
ADD COMMENTlink written 9.5 years ago by Nina380
gravatar for Drio
9.7 years ago by
United States
Drio920 wrote:

chromosome coordinate ref_value num_of_Reads_covering_position alleles_seen_at_that_position base_quality_per_each_base

Details here.

The ascii value of the characters (minus 33) gives you the base qualities.

ADD COMMENTlink modified 9.7 years ago • written 9.7 years ago by Drio920

Hi Drio, thanks for the response. Could you be more specific about the <, >, and ~ characters?

ADD REPLYlink written 9.7 years ago by Sam40

Edited my answer per your request. Please check the link again. The original link was incorrect. All the details are there.

ADD REPLYlink written 9.7 years ago by Drio920

Quality score encoding:

ADD REPLYlink modified 16 months ago by _r_am32k • written 9.6 years ago by Rm8.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1245 users visited in the last hour