Question: Can anyone explain the weird Insert Size HIstogram
gravatar for SMILE
19 months ago by
SMILE90 wrote:

I'm aligning some mouse RNA-Seq data (paired, 2x101) against the genome using STAR. After that, I analysed the insert size with Picard. Here are the two kinds of Insert Size Histograms. Which the first one seems so weird, can anyone explain to me what reasons can lead to the first Histogram?

(The data with the first Histogram has low alignment rate, after some changes of parameters , the alignment rate is better but it has high percentage of reads mapped to multiple loci. I am looking for the reasons...)

Thanks a lot!

enter image description here enter image description here

sequencing rna-seq alignment • 943 views
ADD COMMENTlink modified 19 months ago by Ian5.4k • written 19 months ago by SMILE90

Were these samples ribo-depleted? If so have you checked to see the efficiency of that process? Use the rDNA repeat of mouse and align your data to it. Other thing to check is to see if you have optical/PCR duplicates. clumpify from BBMap suite would be the tool of choice there.

ADD REPLYlink modified 19 months ago • written 19 months ago by genomax65k

I have to ask the person who did the experiment to make sure whether is ribo-depleted or polyA captured. I have depleted the rRNA using sortmerna and mapped the data to mouse rRNA, and got 0% alignment. You mean the peak of small insert sizes in the first histogram can be caused by optical/PCR duplicates?

ADD REPLYlink written 19 months ago by SMILE90

Last time something like your observation was seen was in: Insert size in RNA-Seq library

Good to know that you have eliminated the possibility of presence of rDNA. Have you looked at some of those small inserts (left over after trimming)? What do they blast to? Is there is a possibility that the trimming missed some adapters?

ADD REPLYlink modified 19 months ago • written 19 months ago by genomax65k

Do you know how to extract those small inserts?

ADD REPLYlink written 19 months ago by SMILE90

Assuming your reads are just the inserts left after trimming you can use from BBMap suite to grab them in a new file. in=trimmed.fq.gz out=short.fq.gz maxlen=30 (adjust the length number as needed, use in1= in2= out1= out2= if you have PE reads).

ADD REPLYlink modified 19 months ago • written 19 months ago by genomax65k

Maybe I have to mention that to do comparison I also do alignment without any trimming, the firgure I got looked simmilar. Since without trimming the read length are all 101, I still have the small inserts. So how do the inserts calculate? I don't think I can use the maxlen=30 to extract the small inserts.

ADD REPLYlink written 19 months ago by SMILE90

The inserts are calculated based on alignment in BBMap. I assume it is the same for STAR.

You should be able to get the small inserts by trimming using in=your_reads.fq.gz out=small_inserts.fq.gz ktrim=r k=23 ref=/path_to/adapters.fa (included in BBMap) maxlength=30.

ADD REPLYlink written 19 months ago by genomax65k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1637 users visited in the last hour