Question: How to know intron lenghts
gravatar for
4.3 years ago by
jmramos.bio10 wrote:


I have a RNA-seq experiment and I would like to use STAR as aligner. I did an RNA-seq course and they told me that introducing as a parameter how long is the longest intron in the genome will save time. But...they forgot to tell us how obtain this information and we forgot to ask...Could you tell me how can I do that?

Thank you! J

rna-seq star intron • 1.3k views
ADD COMMENTlink modified 4.3 years ago by i.sudbery10k • written 4.3 years ago by jmramos.bio10

I don't know how providing the longest intron length will help the aligner, but there are a lot of things I don't know. Either way, you can find some nice transcriptome summary statistics from

ADD REPLYlink modified 3.7 years ago • written 4.3 years ago by spvensko220

If you know the sequenced genome, you can make a script that takes as input your annotation file (GFF) and then looking for the longest intron.

ADD REPLYlink written 4.3 years ago by glihm620
gravatar for Medhat
4.3 years ago by
Medhat8.8k wrote:

you can use this script

as follow

intron-length.awk TYPE=CDS yourGffFile.gff

it will report you

minimum intron length, maximum intron length, and the maximum sum-of-intron-lengths among all mRNA features

ADD COMMENTlink written 4.3 years ago by Medhat8.8k

I am trying to use the above script but getting error: please suggest how I can Resolve it.

[root@psgl genome]# awk intron-length.awk TYPE=CDS Rs_1.0.Gene.LFY.gff >intron_statistics awk: cmd. line:1: intron-length.awk awk: cmd. line:1: ^ syntax error

ADD REPLYlink written 4.0 years ago by Bioinfonext320

cat file.gff RUS05596 Ver1.2.2 CDS 2580 2690 . + 0 ID=Rs462540.1.cds3;Parent=Rs462540.1 RUS05596 Ver1.2.2 three_prime_UTR 2691 2973 . + . ID=Rs462540.1.utr3;Parent=Rs462540.1

Rs216420 RUS05606 1938 2500 +

RUS05606 Ver1.2.2 gene 1770 2753 . + . ID=Rs216420;Name=Rs216420 RUS05606 Ver1.2.2 promoter 270 1769 . + . Note=promoter region RUS05606 Ver1.2.2 mRNA 1770 2753 . + . ID=Rs216420.1;Parent=Rs216420;Product=Unknown protein RUS05606 Ver1.2.2 protein 1938 2500 . + . ID=Rs216420.1.protein1;Name=Rs216420.1;Derives_from=Rs216420.1;Product=Unknown protein RUS05606 Ver1.2.2 exon 1770 1996 . + . ID=Rs216420.1.exon1;Parent=Rs216420.1 RUS05606 Ver1.2.2 five_prime_UTR 1770 1937 . + . ID=Rs216420.1.utr5;Parent=Rs216420.1 RUS05606 Ver1.2.2 CDS 1938 1996 . + 0 ID=Rs216420.1.cds1;Parent=Rs216420.1 RUS05606 Ver1.2.2 exon 2197 2753 . + . ID=Rs216420.1.exon2;Parent=Rs216420.1 RUS05606 Ver1.2.2 CDS 2197 2500 . + 2 ID=Rs216420.1.cds2;Parent=Rs216420.1 RUS05606 Ver1.2.2 three_prime_UTR 2501 2753 . + . ID=Rs216420.1.utr3;Parent=Rs216420.1

ADD REPLYlink written 4.0 years ago by Bioinfonext320
gravatar for i.sudbery
4.3 years ago by
Sheffield, UK
i.sudbery10k wrote:

Specifying the maximum intron length helps because it limits the search space for the "other end" of a read when it is being aligned to the genome. If the second half of your gene maps several MB away, it is unlikely that this represents a valid, biologically relevant, splice junction and is probably the result of a miss-alignment. If this is the case, it makes no sense to spend time looking MBs away for the mapping position of the second half of a split read.

It is also the case that some reference genomes contain gene models with unreasonably long introns, often that merge two genes together (i.e. one half of the junction is in one gene, and the other half is in a different gene, usually a different member of the same protein family).

A little bit of knowledge about your genome of interest can help here. In humans we use 2Mb as our maximum intron length because there is a gene with an intron that long that we are pretty confident is real (I don't remember which right now).

Otherwise you could trust the reference annotation and use the method outlined by Medhat.

ADD COMMENTlink written 4.3 years ago by i.sudbery10k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1628 users visited in the last hour