Question: Mpileup Output And Quality Scores
1
gravatar for NextGenSeek
6.9 years ago by
NextGenSeek290
NextGenSeek290 wrote:

I am looking at mpileup output from RNA-Seq data, specifically at a location that looks like this.

<<<<<<<<<<<<<<<<<<<<<<<<>>>>>><<<>><>><>>>><<<<<       
G>BB#GHJCCD#@#5#E;F##ICFEBIHDBDD;BB?IJGGGHJC##?EC

My understanding is that the ">" and "<" symbols mean these the current location is within an intron and reads span different exons in two directions. ( I also could not find any reference for ">" symbols in Samtools.)

What I do not understand is that the meaning of quality scores for these reads. I thought at these locations within introns there are no reads mapped and there should not be any quality score as well.

Am i completely missing something? Thanks in advance for any help.

mpileup • 5.1k views
ADD COMMENTlink modified 6.9 years ago • written 6.9 years ago by NextGenSeek290
1
gravatar for matted
6.9 years ago by
matted7.0k
Boston, United States
matted7.0k wrote:

The > and < are reference skip symbols and do not (directly) have any particular exon/intron interpretation. They are described in the samtools manual in the paragraph starting "In the pileup format...". The quality score encoding is described there too. The question titled Some help understanding with mpileup output also discusses the mpileup format.

ADD COMMENTlink modified 5.8 years ago by Istvan Albert ♦♦ 80k • written 6.9 years ago by matted7.0k
1

Thanks for pointing to the "reference skip" definition. I still have not fully understood the "quality score" aspect of some reference skips. Here is my question. I am looking at mpileup output from RNA-Seq data from one sample. And the pileup output is something like

chr1 3203517 T 30 <<<<<<<<>>>>>><><>>>>>><><><<< IIIIIIHIE@HFGHIFIIIGHHIIDIHHGD

I also looked at the location in IGV and found that the location is intronic. All the reads that map at the location covers the two exons adjacent. Here is a toy example of the scenario, showing three reads that spans two exons.

                       |<- location of interest
               EXON1-------EXON2
  R1              AT-------TAG
  R2            ATAT-------TA
  R3              AT-------TAGA

Basically, no real bases are at the location, but mpileup gives quality scores for the "bases".

Does mpileup come up with random quality scores just to keep the format of mpileup intact?

ADD REPLYlink modified 6.9 years ago • written 6.9 years ago by NextGenSeek290
1

The qualities are mapping qualities, which are a property (measurement) of the read, not the base. I guess you're thinking that they're base qualities. It's like -q versus -Q as flags to samtools mpileup.

ADD REPLYlink written 6.9 years ago by matted7.0k

"consisting of chromosome name, coordinate, reference base, read bases, read qualities and alignment mapping qualities". So here it is saying read qualities as well. @matter, if you don't intend to help, why the hell are you answering?

ADD REPLYlink written 5.8 years ago by cpcantalapiedra140

neither you answer nor the samtools manual paragraph is enough to answer the question.

ADD REPLYlink written 5.8 years ago by cpcantalapiedra140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1066 users visited in the last hour