Question: Samtools Mpileup Output
4
gravatar for Sam
8.1 years ago by
Sam40
Sam40 wrote:

Hi,

I have looked all over the web and cannot seem to find what the are the definitions of behind the the <, >, and ~ symbols in mpileup output. For example:

chr6 31506624 T 78 >>><><,$cCccccccccCCcCCCCCCCccCcCCCcCccccCccCCCcCccccccccccCCCCcCCcCcCcCCcccC^~c^~C

I appreciate your help!

samtools sam mpileup • 11k views
ADD COMMENTlink modified 8.1 years ago by Nina340 • written 8.1 years ago by Sam40

since this is still unanswered, i would suggest mailing to samtools support : samtools-help@lists.sf.net or samtools author (heng li)

ADD REPLYlink written 8.1 years ago by Doctoroots780
4
gravatar for Nina
8.0 years ago by
Nina340
Vancouver, BC, Canada
Nina340 wrote:

The symbols "<" and ">" were added to column 5 a few releases ago. These symbols mean that this position is "covered" by a large gap (ie we're inside the "N" element in the cigar of this read). Note that reads that have a gap at this position still contribute to the total coverage reported in column 4.

For completeness I should also mention that "*" is very similar, but in this case it means the position is covered by a small gap (ie a D element in the cigar)

Also, as described in the link drio provided, "^" is always followed by another symbol. This indicates that we are at the start of a read. If you subtract 33 from the ascii value of the symbol that follows "^" it gives you the mapping quality of the read whose first base covers this position.

In a similar vein "$" means one of the reads covers this position with its last base.

In your example, if you ignore "$" and the two instances of "^~" you will find that you have 78 characters remaining in col 5, which matches the coverage depth reported in col 4.

I learned about this because for the analysis I do, we don't want gaps to contribute to the coverage depth. Here's part of an awk command that we use to adjust the coverage depth to exclude gaps

{l=$4; if($5~/>/ || $5~/</ || $5~/*/ ) {gsub(/\^./,"");l-=split($5,a,"<")-1;l-=split($5,a,">")-1;l-=split($5,a,"*")-1}
ADD COMMENTlink written 8.0 years ago by Nina340
0
gravatar for Drio
8.1 years ago by
Drio910
United States
Drio910 wrote:

chromosome coordinate ref_value num_of_Reads_covering_position alleles_seen_at_that_position base_quality_per_each_base

Details here.

The ascii value of the characters (minus 33) gives you the base qualities.

ADD COMMENTlink modified 8.1 years ago • written 8.1 years ago by Drio910

Hi Drio, thanks for the response. Could you be more specific about the <, >, and ~ characters?

ADD REPLYlink written 8.1 years ago by Sam40

Edited my answer per your request. Please check the link again. The original link was incorrect. All the details are there.

ADD REPLYlink written 8.1 years ago by Drio910

Quality score encoding: http://en.wikipedia.org/wiki/FASTQ_format#Encoding

ADD REPLYlink written 8.0 years ago by Rm7.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1404 users visited in the last hour