Hi All, I am analyzing a RNA-seq data set and the alignment results I have been getting are really baffling me. I have tried exhaustive list of multiple conditions/parameters but none seem to improve my alignment rates significantly. Here are some details for my sample data:
-Data was obtained from total RNA obtained from tumor samples using Nugen Ovation Single Cell RNA-seq kit. We received ~80 million x 2 100bp paired end reads. - I obtained about 40-50% ballpark alignment rate using tophat2 using different parameters.
FastQC suggests high duplication rates. The quality seems ok (no red flags except dropping of quality to ~20 at 3' end of reads). I have used Tophat2 for all my alignments using default settings. I have tried the following conditions.
-Trimming of 8bp from forward reads (as suggested by Nugen library prep kit), trimming of low quality bases (quality>20) at the ends, using different trimming/clipping tools like fastx, fastq-mcf from ea-utils, trim-galore discarding reads below length < 20 bases after adapter/quality trimming.
I have also tried using different -library types for tophat and also changed -r option to reflect my fragment size. I suspect that my RNA-prep could possibly have a significant rRNA fraction and maybe removing the reads mapping to these could possibly improve alignment.
I would appreciate if you you could provide any suggestions for improving my alignment. Tophat is my preferred alignment as I have been using it for years now on other datasets and performs fairly robustly. However, I would be open to switching to other aligners if needed.
Thanks a lot for your help.