Question: SAM format paired-end alignment handling
0
gravatar for darxsys
4.0 years ago by
darxsys190
Croatia
darxsys190 wrote:

This is probably a beginner question, but I am not sure how to go about this. I am using bowtie2 to align RNA-Seq Illumina paired-end yeast reads to transcripts reconstructed from those reads using Trinity. 

Bowtie produces alignments in the SAM format. I've read SAM specification in detail, but I still don't know how to properly read the alignments of these paired-end reads to group the mates in the same alignment together. For example, in the SAM file, the first mate alignment record could be written, then some other read alignments could be written and then the second mate alignment could follow them. How do I know that this second mate alignment record belongs to the same alignment as the first mate record? This bothers me since some reads could map well to multiple positions and then I have no idea how to distinguish between different alignments of the same read and its mates if the alignments for corresponding mates are not written exactly one after the other in the SAM file (according to the SAM paper, they don't have to be). I hope this is clear enough.

rna-seq sam-format paired-end • 3.2k views
ADD COMMENTlink modified 4.0 years ago by Devon Ryan90k • written 4.0 years ago by darxsys190
2
gravatar for Devon Ryan
4.0 years ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

Have a look at the 7th and 8th columns. The 7th column (RNEXT) gives the reference contig of the mate. The 8th column (PNEXT) gives the mapping position of the mate. Those two combined should allow you to properly match mates. I should note that it's possible to have multiple mappings for only one of the reads in a pair, in which case only one of them will have multiple entries (and their PNEXT and RNEXT values will all be the same).

ADD COMMENTlink written 4.0 years ago by Devon Ryan90k

Thank you. I am wondering what do you mean by the fact that only one of the reads can have multiple mappings? How does that look like?

By multiple mappings, I was thinking of a case where the paired-end read (both mates concordantly) maps to different positions on the same transcript equally well, or similarly well. That is not possible? If not, how is it possible that only one mate can have multiple alignments? What happens with the other one then?

ADD REPLYlink written 4.0 years ago by darxsys190
1

I didn't say that only one mate in a pair can have multiple mappings. Rather, I said that such cases are possible. Typically this will happen when you have one read mapping to a unique sequence and the other to a repetitive element. The case you mentioned will also happen and you'll have to use the strategy I outlined in my answer to pair things properly.

ADD REPLYlink written 4.0 years ago by Devon Ryan90k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1161 users visited in the last hour