I have a problem with my RNA-seq data: a low rate of alignment of my samples using Tophat2, as can be seen below.
tophat2 -p 1 --report-secondary-alignments --no-convert-bam -G MSU/MSU.gff -o Os1_thout MSU/MSU Os1_R1_output_paired.fastq Os1_R2_output_paired.fastq Left reads: Input: 13644823 Mapped: 3075519 (22.5% of input) of these: 262852 ( 8.5%) have multiple alignments (187 have >20) Right reads: Input: 13644823 Mapped: 3093432 (22.7% of input) of these: 269827 ( 8.7%) have multiple alignments (164 have >20) 22.6% overall read alignment rate. Aligned pairs: 3044687 of these: 197245 ( 6.5%) have multiple alignments and: 15454 ( 0.5%) are discordant alignments 22.2% concordant pair alignment rate.
Sequencing was performed on the Illumina 2500 platform, paired-end 100X100. I used fastqc to view sequencing quality and used trimmomatic for removing adapters and bases with poor quality
java -jar /home/willian/softwares/Trimmomatic-0.36/trimmomatic-0.36.jar PE -phred33 Os1_S27_L004_R1_001.fastq Os1_S27_L004_R2_001.fastq Os1_R1_output_paired.fastq Os1_R1_output_unpaired.fastq Os1_R2_output_paired.fastq Os1_R2_output_unpaired.fastq ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:70. Attached is the result of FastQC, before and after the trimmer, respectively.
#Os1 Input Read Pairs: 15171431 Both Surviving: 13644823 (89.94%) Forward Only Surviving: 1287637 (8.49%) Reverse Only Surviving: 87906 (0.58%) Dropped: 151065 (1.00%)
One of the problems I see from these samples is that their RINs were low (6.0), compared to another rice cultivar in which the alignment results are good (8.0). The main difference that I see between the samples with RIN 6 in relation to those with RIN 8 is in the parameter overexpressed sequences.
I've already left the Tophat more flexible with respect to the mismatches, and I've also tested another alignment program (HPG Aligner), and the rate has increased very little.
What could I do to increase the alignment rate?