MatePosition & InsertSize in Bamtools
1
1
Entering edit mode
9.2 years ago
vimartin ▴ 60

Hi,

I'm currently using BamTools to parse a (paired-end) RNA-seq bam file for detecting possible chimera.

I'm focusing on reads where both pairs belong to different genes but I have a hard time to understand some of the BamTools::BamAlignment members.

For example, it is easy to get the starting position of the mate read using the "al.MateRefId" for the chromosome and "al.MatePosition" for the coordinate but I still lack its end position. For standard reads (both pairs belonging to the same gene) I can get the mate end-position using "al.Position + al.InsertSize" but is it still true for chimeric reads? If not, how is InsertSize computed and how can I get this mate end-position?

Many Thanks in advance

Chimera Bamtools RNA-Seq • 2.8k views
ADD COMMENT
2
Entering edit mode
9.2 years ago

The insert size it specified by the aligner and for RNAseq where the paired-ends don't overlap or when they span a splicing event or other type of junction (e.g., fusion bounds), these may be unreliable. The end position of the mate isn't stored in the BAM format, so you need to find the mate and then get its end. I'm not familiar with the Bamtools API, so I can't say whether it provides a convenient method for this or not, though I wouldn't be surprised if it doesn't.

ADD COMMENT
0
Entering edit mode

Thank for your answer. To be clear, when you say that if they span a (normal) splicing event the insertsize might be unreliable, you mean that it is unreliable for determining the actual insert size of the RNA fragment red or that we can't still have the genomic end position of the mate using it?

ADD REPLY
1
Entering edit mode

For chimeric alignments you're not even guaranteed to have an insert size (e.g., if the genes are on different chromosomes). Even if they're on the same chromosome, you're left with the question of whether the insert size is relative to the genomic coordinates output in the BAM file or the transcript coordinates that might have been used during the mapping. In practice, it's probably relative to the genomic coordinates, so adding the insert size should give you the end position of the mate. However, there's no guarantee in the BAM spec (currently) of this.

Assuming you're only using one aligner, just spot check a few alignments to confirm that the insert size represents the length of the genomic template, rather then the expected length of the transcriptome template.

ADD REPLY
0
Entering edit mode

Ok, I thought (at least it seems true on the file I'm parsing) that it was always in terms of genomic coordinates. Is there a complete specification of Bam/Sam format more precise than this one in which I cannot find the definition of the insertSize of the bam format?

Thanks again.

ADD REPLY
1
Entering edit mode

It should be relative to genomic coordinates, but some aligners don't adhere to the spec. that well. The pdf file you linked to is the SAM/BAM spec, so there's nothing more precise out there. It's actually getting revised at the moment to clear up some of the many ambiguities that have arisen over the years. In the spec., the insert size is referred to as the "template length" (TLEN).

ADD REPLY
0
Entering edit mode

Thanks a lot for all your quick and precise answers :-).

ADD REPLY

Login before adding your answer.

Traffic: 2718 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6