Question: Samtools Mpileup And Overlapping Paired-End Reads
2
gravatar for alpha2zee
5.5 years ago by
alpha2zee100
alpha2zee100 wrote:

I am using samtools mpileup to generate a pileup for paired-end RNA sequencing data. I am curious about how samtools handles pair mates whose read mappings overlap. With regard to the simple example below, for positions 7-11, are both pair mates enumerated?

position     1    6    11   16
reference    ATGCATGCATGCATGC
pair mate 1  ATGCATGCATG
pair mate 2'       GCATGCATGC    (reverse complemented)

I am parsing the pileup output to quantify RNA editing at some positions and do not want to count an 'RNA molecule' twice just because the read pair mates 'overlapped.'

Thanks.

paired-end pileup rna-seq • 5.5k views
ADD COMMENTlink modified 18 months ago by Owen S.340 • written 5.5 years ago by alpha2zee100
1

If I have to guess then I think that the overlapping portion or bases will be counted twice.

ADD REPLYlink written 5.5 years ago by Ashutosh Pandey11k

You might drop your comment as an answer, as I think you are correct.

ADD REPLYlink written 5.5 years ago by Sean Davis25k

cross posted on seqanswer: http://seqanswers.com/forums/showthread.php?t=36151

ADD REPLYlink written 5.5 years ago by Pierre Lindenbaum120k

Sorry, is cross-posting a no-no? I am trying to reach more people and plan to post here a link to any conclusive answer I get elsewhere.

ADD REPLYlink written 5.5 years ago by alpha2zee100

see "Avoid cross-posting" in How to ask Good Questions on Technical and Scientific Forums

ADD REPLYlink written 5.5 years ago by Pierre Lindenbaum120k
3
gravatar for Owen S.
18 months ago by
Owen S.340
Oakland CA
Owen S.340 wrote:

NOTE! This answer from Ashutosh Pandey is now misleading. At least for anyone using samtools >= 1.0 (which was released ~6 months after the answer was posted).

Since 1.0 (released Aug 15, 2014), we have the (somewhat confusingly named) --ignore-overlaps option:

-x, --ignore-overlaps  Disable read-pair overlap detection.

With this option, samtools mpileup 1.x behaves like it did in 0.19, but the default behavior is to NOT DOUBLE COUNT overlapping paired end reads.** It has been this way for the past 3.5 years.

You can try it yourself:

$ samtools
Program: samtools (Tools for alignments in the SAM format)
Version: 1.0 (using htslib 1.0)

$ cat test1.sam
@HD     VN:1.3  SO:coordinate
@SQ     SN:chr1 LN:248956422
seqid   163     chr1    39415   0       9M      =       39418   12      ATGTGTTTT       AAEE6EEEE
seqid   83      chr1    39418   0       9M      =       39415   -12     TGTTTTGCA       E<EAEEA<<

$ samtools mpileup test1.sam
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
chr1    39415   N       1       ^!A     A
chr1    39416   N       1       T       A
chr1    39417   N       1       G       E
chr1    39418   N       1       T       i
chr1    39419   N       1       G       Q
chr1    39420   N       1       T       i
chr1    39421   N       1       T       e
chr1    39422   N       1       T       i
chr1    39423   N       1       T$      i
chr1    39424   N       1       g       A
chr1    39425   N       1       c       <
chr1    39426   N       1       a$      <

$ samtools mpileup -x test2.bam
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
chr1    39415   N       1       ^!A     A
chr1    39416   N       1       T       A
chr1    39417   N       1       G       E
chr1    39418   N       2       T^!t    EE
chr1    39419   N       2       Gg      6<
chr1    39420   N       2       Tt      EE
chr1    39421   N       2       Tt      EA
chr1    39422   N       2       Tt      EE
chr1    39423   N       2       T$t     EE
chr1    39424   N       1       g       A
chr1    39425   N       1       c       <
chr1    39426   N       1       a$      <

Note that for this to work right, the paired read names must match (which they do for Illumina data).

ADD COMMENTlink written 18 months ago by Owen S.340
1
gravatar for Ashutosh Pandey
5.5 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

The overlapping bases in the paired end reads will be counted for both reads. In the above example, they will be counted twice.

ADD COMMENTlink written 5.5 years ago by Ashutosh Pandey11k
1

Thanks. I had a similar response to my cross-post, at http://seqanswers.com/forums/showpost.php?p=123101&postcount=4. The poster suggested clipping overlapping mates with, among other tools, bamUtil/clipOverlap.

ADD REPLYlink written 5.5 years ago by alpha2zee100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1161 users visited in the last hour