I'm currently working on RNA-seq data from A.thaliana, and I have questions about quality and GC content. I guess it's normal to have a higher GC% in RNA-seq data than in the genome itself, since coding sequences usually show a bias toward GC. However, A.thaliana has a GC rate of 36% and my samples go up to 51-53%, isn't that a bit too much?
I'm wondering because although the quality of the sequencing looked OK from the FastQC reports, I have a very low rate of mapping, like 10-20% of reads. I have only one sample that maps over 60%, and this one has a GC rate of 44%.
I tried mapping with bowtie2 and subread-align, both with default params (meaning 0 mismatches and 3 mismatches respectively).
I'm a bit confused here, any idea someone?
I tried aligning on the TAIR10 assembly instead of Araport11 and now I've got >90% of mapping for each sample! I'm still confused but at least it works...