Question: MatePosition & InsertSize in Bamtools
1
gravatar for vimartin
4.1 years ago by
vimartin30
France
vimartin30 wrote:

Hi,

I'm currently using BamTools to parse a (paired-end) RNA-seq bam file for detecting possible chimera. 

I'm focusing on reads where both pairs belong to different genes but I have a hard time to understand some of the BamTools::BamAlignment members. 

For example, it is easy to get the starting position of the mate read using the "al.MateRefId" for the chromosome and "al.MatePosition" for the coordinate but I still lack its end position. For standard reads (both pairs belonging to the same gene) I can get the mate end-position using "al.Position + al.InsertSize" but is it still true for chimeric reads? If not, how is InsertSize computed and how can I get this mate end-position?

Many Thanks in advance.

 

rna-seq bamtools chimera • 1.3k views
ADD COMMENTlink modified 4.1 years ago by Devon Ryan88k • written 4.1 years ago by vimartin30
2
gravatar for Devon Ryan
4.1 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

The insert size it specified by the aligner and for RNAseq where the paired-ends don't overlap or when they span a splicing event or other type of junction (e.g., fusion bounds), these may be unreliable. The end position of the mate isn't stored in the BAM format, so you need to find the mate and then get its end. I'm not familiar with the Bamtools API, so I can't say whether it provides a convenient method for this or not, though I wouldn't be surprised if it doesn't.

ADD COMMENTlink written 4.1 years ago by Devon Ryan88k

Thank for your answer. To be clear, when you say that if they span a (normal) splicing event the insertsize might be unreliable, you mean that it is unreliable for determining the actual insert size of the RNA fragment red or that we can't still have the genomic end position of the mate using it? 

   

ADD REPLYlink written 4.1 years ago by vimartin30
1

For chimeric alignments you're not even guaranteed to have an insert size (e.g., if the genes are on different chromosomes). Even if they're on the same chromosome, you're left with the question of whether the insert size is relative to the genomic coordinates output in the BAM file or the transcript coordinates that might have been used during the mapping. In practice, it's probably relative to the genomic coordinates, so adding the insert size should give you the end position of the mate. However, there's no guarantee in the BAM spec (currently) of this.

Assuming you're only using one aligner, just spot check a few alignments to confirm that the insert size represents the length of the genomic template, rather then the expected length of the transcriptome template.

ADD REPLYlink written 4.1 years ago by Devon Ryan88k

Ok, I thought (at least it seems true on the file I'm parsing) that it was always in terms of genomic coordinates. Is there a complete specification of Bam/Sam format more precise than this one http://samtools.github.io/hts-specs/SAMv1.pdf in which I cannot find the definition of the insertSize of the bam format?

Thanks again.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by vimartin30
1

It should be relative to genomic coordinates, but some aligners don't adhere to the spec. that well. The pdf file you linked to is the SAM/BAM spec, so there's nothing more precise out there. It's actually getting revised at the moment to clear up some of the many ambiguities that have arisen over the years. In the spec., the insert size is referred to as the "template length" (TLEN).

ADD REPLYlink written 4.1 years ago by Devon Ryan88k

Thanks a lot for all your quick and precise answers :-).

ADD REPLYlink written 4.1 years ago by vimartin30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1440 users visited in the last hour