Question: Splice-aware insert size for RNA-seq
gravatar for shuelga
3.7 years ago by
United States
shuelga20 wrote:

Usually for DNA we map PE data to the genome and then use picard's CollectInsertSizeMetrics to give a insert size distribution. For RNA PE data we map to the genome with STAR and then want to look at insert size. I don't think that CollectInsertSizeMetrics will work, since it will think that spliced reads actually have the entire intron as the insert, thus artificially increasing the insert size. Is there a splicing aware tool that will calculate the actual insert size after mapping to the genome? I realize I can also map to the transcriptome and CollectInsertSizeMetrics should work, but I'm wondering if there is an alternate to having to do both mappings.


rna-seq alignment genome • 1.9k views
ADD COMMENTlink modified 18 months ago by Biostar ♦♦ 20 • written 3.7 years ago by shuelga20

You can filter out spliced reads from your bam file and use the non-spliced reads to calculate the insert size. See the post: Samtools Filter Reads Cigar Field

ADD REPLYlink written 3.7 years ago by Ashutosh Pandey11k
gravatar for Brian Bushnell
3.7 years ago by
Walnut Creek, USA
Brian Bushnell15k wrote:

BBMap will correctly calculate the insert size of spliced reads (and output them as a histogram with the "ihist=file" flag).  However, it will only be correct for reads in which a splice site is seen within a read, so not if the intron lies in the unsequenced middle area.

You can also generate an alignment-free insert-size histogram with BBMerge, if the inserts are short enough so that the reads overlap.  Again, this uses the "ihist" flag.

Both are part of BBTools.

ADD COMMENTlink written 3.7 years ago by Brian Bushnell15k
gravatar for geek_y
3.7 years ago by
geek_y8.7k wrote:

Aligning to transcriptome would be the best way to calculate the fragment length.

But in RNA-SEQ protocols,in general, there will be no gel-cutting or size-selection steps. hence, for DNA-SEQ (PE or MP), the insert sizes will be normally distribute histogram, where as in RNA-SEQ, the distribution is skewed.

ADD COMMENTlink written 3.7 years ago by geek_y8.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 846 users visited in the last hour