I downloaded data from SRA database and fastqc shows many overrepresented sequences with no hits. I blast some sequences and they match with rRNA and mtDNA. The per sequence GC content is weird due to those contaminants. Should I trim the out before alignment or I should ignore them. I believe they will not align to the reference genome, do they?
RNASeq is supposed to have duplication when a gene is in heavy use. FastQC just says no hits because it searches a small database of artifacts. Chances are this is a real spliced gene FastQC won't know about.
If it is rRNA, they should align it to the genome. You could actually map the reads to rRNA and filter unmapped reads. Check the quality of unmapped reads and over-represented sequences. If they look OK, then you could mapped those unmapped reads to the genome.