Samtools Mpileup Output
2
4
Entering edit mode
11.1 years ago
Sam ▴ 40

Hi,

I have looked all over the web and cannot seem to find what the are the definitions of behind the the <, >, and ~ symbols in mpileup output. For example:

chr6 31506624 T 78 >>><><,$cCccccccccCCcCCCCCCCccCcCCCcCccccCccCCCcCccccccccccCCCCcCCcCcCcCCcccC^~c^~C I appreciate your help! samtools sam mpileup • 14k views ADD COMMENT 0 Entering edit mode since this is still unanswered, i would suggest mailing to samtools support : samtools-help@lists.sf.net or samtools author (heng li) ADD REPLY 5 Entering edit mode 11.0 years ago Nina ▴ 380 The symbols "<" and ">" were added to column 5 a few releases ago. These symbols mean that this position is "covered" by a large gap (ie we're inside the "N" element in the cigar of this read). Note that reads that have a gap at this position still contribute to the total coverage reported in column 4. For completeness I should also mention that "*" is very similar, but in this case it means the position is covered by a small gap (ie a D element in the cigar) Also, as described in the link drio provided, "^" is always followed by another symbol. This indicates that we are at the start of a read. If you subtract 33 from the ascii value of the symbol that follows "^" it gives you the mapping quality of the read whose first base covers this position. In a similar vein "$" means one of the reads covers this position with its last base.

In your example, if you ignore "$" and the two instances of "^~" you will find that you have 78 characters remaining in col 5, which matches the coverage depth reported in col 4. I learned about this because for the analysis I do, we don't want gaps to contribute to the coverage depth. Here's part of an awk command that we use to adjust the coverage depth to exclude gaps {l=$4; if($5~/>/ ||$5~/</ || $5~/*/ ) {gsub(/\^./,"");l-=split($5,a,"<")-1;l-=split($5,a,">")-1;l-=split($5,a,"*")-1}

0
Entering edit mode
11.1 years ago
Drio ▴ 920

chromosome coordinate ref_value num_of_Reads_covering_position alleles_seen_at_that_position base_quality_per_each_base

Details here.

The ascii value of the characters (minus 33) gives you the base qualities.

0
Entering edit mode

Hi Drio, thanks for the response. Could you be more specific about the <, >, and ~ characters?

0
Entering edit mode

0
Entering edit mode