Hi,
I am working with some plant RNAseq data and trying to find the most abundant rRNA contaminant sequences.
I previously mapped my data to the ncRNA fasta (from ensembl) and got plenty of reads mapping to chloroplast (Pt) rRNA.
However, I am trying to separately map the data to chromosomal, Pt, and mitochondrial (Mt) rRNA to avoid losing reads to multimapping. To do this I took only the Pt rRNA sequences from the ncRNA fasta that I successfully used for mapping before, and put them into a Pt_rRNA.fa. I then tried mapping the data to this fasta, and got 0 reads mapped.
Any ideas why this is happening? I've double checked the fasta format and it really seems correct. I'm using STAR for the alignment and adjusted --genomeSAindexNbases accordingly for the small fasta index generation.
I also did this for the Mt rRNA and it seems to have worked (as in some reads mapped), so I'm really scratching my head over this.
Any help or suggestions would be greatly appreciated.
What is the length of fasta reference sequences?
There are 8 reference sequences in the fasta, total about 9,000 bases. I'm using STAR because it's what I've been using for these data and it's been working fine thus far. Would using something like bowtie be better for a small fasta? I don't believe splicing should be an issue with the Pt rRNAs so it should work.
I suggest that you give another (bbmap, bwa-mem2, bowtie etc) NGS aligner a try. STAR is splice-aware and likely has default options that look for splicing, which may be leading to the reads not mapping at all.
I tried to mess with the STAR arguments to ignore splicing, but nothing seemed to work. Bowtie seems to have worked well though, thanks!