Question

STAR alignment to small fasta file failing

0

Entering edit mode

3 months ago

campbio • 0

Hi,

I am working with some plant RNAseq data and trying to find the most abundant rRNA contaminant sequences.

I previously mapped my data to the ncRNA fasta (from ensembl) and got plenty of reads mapping to chloroplast (Pt) rRNA.

However, I am trying to separately map the data to chromosomal, Pt, and mitochondrial (Mt) rRNA to avoid losing reads to multimapping. To do this I took only the Pt rRNA sequences from the ncRNA fasta that I successfully used for mapping before, and put them into a Pt_rRNA.fa. I then tried mapping the data to this fasta, and got 0 reads mapped.

Any ideas why this is happening? I've double checked the fasta format and it really seems correct. I'm using STAR for the alignment and adjusted --genomeSAindexNbases accordingly for the small fasta index generation.

I also did this for the Mt rRNA and it seems to have worked (as in some reads mapped), so I'm really scratching my head over this.

Any help or suggestions would be greatly appreciated.

seq STAR alignment RNA • 1.1k views

ADD COMMENT • link 3 months ago by campbio • 0

0

Entering edit mode

What is the length of fasta reference sequences?

ADD REPLY • link 3 months ago by GenoMax 154k

0

Entering edit mode

There are 8 reference sequences in the fasta, total about 9,000 bases. I'm using STAR because it's what I've been using for these data and it's been working fine thus far. Would using something like bowtie be better for a small fasta? I don't believe splicing should be an issue with the Pt rRNAs so it should work.

ADD REPLY • link 3 months ago by campbio • 0

0

Entering edit mode

I suggest that you give another (bbmap, bwa-mem2, bowtie etc) NGS aligner a try. STAR is splice-aware and likely has default options that look for splicing, which may be leading to the reads not mapping at all.

ADD REPLY • link 3 months ago by GenoMax 154k

0

Entering edit mode

I tried to mess with the STAR arguments to ignore splicing, but nothing seemed to work. Bowtie seems to have worked well though, thanks!

ADD REPLY • link 3 months ago by campbio • 0

score 0 · Answer 1 · 2025-07-30

0

Entering edit mode

3 months ago

swbarnes2 15k

However, I am trying to separately map the data to chromosomal, Pt, and mitochondrial (Mt) rRNA to avoid losing reads to multimapping.

I'm not sure that is wise. If there really are sequences that appear in multiple places such that you can't tell where exactly a read with that sequence originated, you need to know that. And what you do not want is for aligners to force a read to map somewhere it does not quite fit because you didn't give it the reference where it really does fit.

ADD COMMENT • link 3 months ago by swbarnes2 15k

0

Entering edit mode

I would normally totally agree, however, in this case I just need to know which rRNA sequences are most common in my data (not where they came from) so that I can design antisense oligos to remove them with beads from future libraries. That's all this analysis is for.

ADD REPLY • link 3 months ago by campbio • 0