I tried to align an RNA-seq short-read data to a plant genome with STAR (version 2.7.3a), and found some of the mapped reads would be separated into two fragments with abnormally long intron. Then I searched for the setting of maximum intron length (
alignIntronMax), and got a number 10,000 used for A. thaliana. However, if I set the
alignIntronMax=10,000, some of the reads would fail to be mapped causing by the limit of intron length. I thought about using different
alignIntronMax parameters to do the alignments and integrate the results, but it cost a lot of time.
I have searched about this issue with different keywords, and only find a discussion posed 5 years ago (Do you have abnormally long introns in your RNA Seq alignment? How do you get rid of them? ). I am interested in the strategy using different
alignIntronMax parameters in STAR alignment proposed by h.mon as below.
“You could start with a small
--alignIntronMax, then remove the reads where both pairs mapped from your fastq. With the reduced dataset, map again increasing
--alignIntronMax. Wash, rinse, repeat until satisfied.”
I just wonder if there are some real examples used in researches, and how to set the alignIntronMax in each time of mapping? Or how do you deal with this problem? Any suggestion?