Question

Splice-aware insert size for RNA-seq

0

Entering edit mode

9.4 years ago

shuelga ▴ 20

Usually for DNA we map PE data to the genome and then use picard's CollectInsertSizeMetrics to give a insert size distribution. For RNA PE data we map to the genome with STAR and then want to look at insert size. I don't think that CollectInsertSizeMetrics will work, since it will think that spliced reads actually have the entire intron as the insert, thus artificially increasing the insert size. Is there a splicing aware tool that will calculate the actual insert size after mapping to the genome? I realize I can also map to the transcriptome and CollectInsertSizeMetrics should work, but I'm wondering if there is an alternate to having to do both mappings.

alignment genome RNA-Seq • 3.8k views

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.4 years ago by shuelga ▴ 20

0

Entering edit mode

You can filter out spliced reads from your bam file and use the non-spliced reads to calculate the insert size. See the post: Samtools Filter Reads Cigar Field

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.4 years ago by Ashutosh Pandey 12k

Ram · Answer 1 · 2014-12-05

BBMap will correctly calculate the insert size of spliced reads (and output them as a histogram with the "ihist=file" flag). However, it will only be correct for reads in which a splice site is seen within a read, so not if the intron lies in the unsequenced middle area.

You can also generate an alignment-free insert-size histogram with BBMerge, if the inserts are short enough so that the reads overlap. Again, this uses the "ihist" flag.

Both are part of BBTools.

score 0 · Answer 2 · 2014-12-05

Aligning to transcriptome would be the best way to calculate the fragment length.

But in RNA-SEQ protocols,in general, there will be no gel-cutting or size-selection steps. hence, for DNA-SEQ (PE or MP), the insert sizes will be normally distribute histogram, where as in RNA-SEQ, the distribution is skewed.