Question: Parsing Base String From Samtools Mpileup
0
gravatar for Noushin N
4.3 years ago by
Noushin N390
Baltimore, MD
Noushin N390 wrote:

Hello everyone,

I am trying to construct B-allele frequency plots from NGS data; and for that I am using samtools mpileup output at positions of interest.

Unfortunately, I am not able to establish a one-to-one relationship between the bases in base string column [column 4], and the base quality string [column 5] for lines with indels.

Here is an example from samtools sourceforge page:

seq2 156 A 11  .$......+2AG.+2AG.+2AGGG    <975;:<<<<<

According to columns 4 and 6, there are 11 reads aligned to this position. But when trying to parse the base string column, I get

  1. 9 reads aligned to reference base (A)
  2. 3 reads with an insertion of AG
  3. 2 reads with base G

Can anyone please explain what is going on?

Thank you in advance!

samtools mpileup • 3.5k views
ADD COMMENTlink modified 16 months ago by kartong880 • written 4.3 years ago by Noushin N390
2

The only possible explanation I can think of is the fact that ".+2AG" accounts for an insertion of "AG" bases, as opposed to "+2AG". This way, the numbers match up.

.|.|.|.|.|.|.+2AG|.+2AG|.+2AG|G|G

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Noushin N390
1

I think you are correct. I've always interpreted insertions as something that's between the current base and the next base. So that means there is still a match to the current base.

ADD REPLYlink written 4.3 years ago by Damian Kao14k

Thank you for the information and the pointer to your script. I have also come across the following example in one of my mpileups:

chr3    128988392       G       110
c$CCCCCC.CCcccC,Cc.,C,C...c.C.C...cccCcCcCC..cC..,...CCCCCcccC.CCC.C,CCCC,CCcCccC.cC,C,c.CccCcC,CCc.CCcCC,+1ccc+1c.c^],

Here, there is an instance of ",+1c" and one of "+1c". What is your interpretation for each of them?

Thanks again

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Noushin N390
1

",+1c" is a match on the opposite strand and a single insertion of c between the current base and next base. "+1c" should probably be "c+1c" meaning there is a read with a mismatch c in the current position and a single insertion of c between the current base and next base.

ADD REPLYlink written 4.3 years ago by Damian Kao14k
2

I wrote a script for this a while back. I am not sure if it works 100% correctly though. You should definitely spot check it: http://blog.nextgenetics.net/?e=56

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Damian Kao14k
0
gravatar for Noushin N
3.7 years ago by
Noushin N390
Baltimore, MD
Noushin N390 wrote:

Since no completely correct answer was posted to this question, I would like to point out that Damian's script looks pretty functional. After spot checking, it looks to me that it does not catch the ',' or '.' bases preceding an insertion/deletion. Thus, it will overestimate the number of reference bases at such sites.

I wrote a quick parser, which corrects the issue above and is accessible at: https://github.com/noushin6/NGSTools/blob/master/baseParser.py

I have spot checked it quite a bit; but feel free to do so yourself and use at your own risk.

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Noushin N390
0
gravatar for konstantinkul
3.3 years ago by
Russian Federation
konstantinkul50 wrote:

@ Noushin N Thanks a lot

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by konstantinkul50
0
gravatar for kartong88
16 months ago by
kartong880
Singapore
kartong880 wrote:

The "+2AGGG" in the last part of the pileup string should be interpreted as "+2AG", "G", "G". The "+2" indicates that only the following two nucleotides are part of the insertion. The last 2 "G" characters indicate two variant "G" nucleotides and are not part of the insertion string. If you view it this way, everything matches up.

Edit: Noticed this has been addressed in the comments section by author of original post

ADD COMMENTlink modified 16 months ago • written 16 months ago by kartong880
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 645 users visited in the last hour