Hello! I have 86 bp single reads from Illumina NextSeq500. Library preparation was carried out with the TruSeq stranded total RNA (Ribo-zero) kit for RNA extracted from mouse embryos. I've mapped the reads to the mm10 reference genome (chr1-19, chrX, chrY, chrM) with the subjunc junction-mapping aligner from the Rsubread software package (default settings). The mapping rate is only ~50% with raw or quality trimmed reads. I'd be glad to hear your ideas as to why.
Please inspect the Fastqc report of my raw reads yourself, if you wish to: https://drive.google.com/file/d/0B0NZ5u2nKR2qeG14Q25WSXFXNjQ/view?usp=sharing
The report is what one would expect from Illumina sequencing, I think. The slightly overrepresented sequences (1,4% in total) are small nuclear RNAs according to a BLAST search. I tried fastq_quality_trimmer from the fastx toolkit to trim 3' bases (quality threshold was set to 20). According to the Fastqc report some of the bases in the middle of the reads are poorer quality (<20, lower whiskers of the boxplots) – could this be affecting the mapping? Should I use more stringent trimming or even filtering based on overall sequence quality?
Thanks in advance.