Question: choosing a suitable max intron size for STAR (plant alignment)
gravatar for Biogeek
3.3 years ago by
Biogeek380 wrote:

Hey guys,

Any tips? I know STAR aligner is optimised for mammalian alignments. I have a reference genome with a gff3 file for a plant and there are only details for exon, CDS, UTRs, but not introns. Additionally the genome is presented in scaffolds only, not chromosomes. The default STAR settings put max intron length to around 500,000 nt which is huge. Can anyone suggest a suitable maxintron value, or point me to literature on such a matter. It seems this goes unreported in alignment methodology most of the time.


star plants intron size • 2.1k views
ADD COMMENTlink modified 2.3 years ago by claudiorivero920 • written 3.3 years ago by Biogeek380

If you map some reads with BBMap, you can produce a histogram of indel lengths with the "indelhist" flag, and use that to inform your decision. The distribution varies by plant species. in=reads.fq ref=genome.fa maxindel=500000 indelhist=ihist.txt reads=1m

That will just map the first million reads and stop.

If you already have a mapped sam/bam file, you can alternatively generate the indel length histogram with Reformat: in=mapped.sam indelhist=ihist.txt
ADD REPLYlink written 3.3 years ago by Brian Bushnell16k

Thanks Brian, seems like a handy little tool :-)

ADD REPLYlink written 3.3 years ago by Biogeek380

I'm assuming around 10,000 would be appropriate?

ADD REPLYlink written 3.3 years ago by Biogeek380

Hi, we have done RNA-seq analysis and to optimize parameters for plant genomes, minimum and maximum intron lengths were set as 60 and 6000 according to what was described for splicing in Arabidopsis (Márquez et al., 2010);

ADD REPLYlink written 2.3 years ago by claudiorivero920
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1298 users visited in the last hour