Why STAR Aligner limits alignments to soft-masked regions (lowercase letters in reference genome)?
1
0
Entering edit mode
2.0 years ago
MarVi ▴ 30

Hello All,

I have been trying to figure out what would be the best parameters to use in the STAR aligner to find all possible alignments to an rna read. I realized that somehow STAR limits alignments or does not find a way to do alignments in soft mask regions. As I tried GSNAP and with this aligner I found several different positions in the genome where the read has been located. Even though, there are overlaps of alignments between STAR and GSNAP, there are some interesting positions that STAR is not able to align even though there is no mismatch or soft-clip that would be needed in order to STAR do the alignment.

I use the same parameters for both aligners, like : set the max of multimapping (very high) and set the minimum match to output the alignment. The other parameters are pretty much the basic. I have a preference for STAR as it finds a more alignments compared to GSNAP, however in some cases it doesn't retain some positions that I would be interested.

alignment soft-masked regions genome GSNAP • 1.4k views
1
Entering edit mode

You may find this prior thread of interest: Hard vs soft masked references while aligning with STAR

There is an old thread in which Alex mentions (post #2) that soft-masked regions are not treated any differently by STAR.

0
Entering edit mode

Thanks for you reply ! So, in conclusion, STAR isn't a good aligner to align reads to repetitive regions! I didn't find any explanation of why this happens. I made some tests with Hard and Soft masked reference and actually, the result was the same. It seems that STAR simply doesn't align to repetitive zones in the genome. Then, I have either choose another aligner or use STAR + another aligner that do the job for the repetitive regions.

2
Entering edit mode
23 months ago

STAR doesn't limit alignment to soft-masked regions, it more generally puts limits on multi-mapping. You can adjust these with options like --outFilterMultimapNmax and --outAnchorMultimapNmax, as suggested for tools like TETranscripts, which seem to be dealing with datasets similar to yours.