I have tried to map raw RNA Seq (paired fastq) sequence with the reference fungal genome using STAR. My Log.final.out file looks like:
Number of input reads | 25763319 Average input read length | 202 UNIQUE READS: Uniquely mapped reads number | 6701254 Uniquely mapped reads % | 26.01% Average mapped length | 200.52 Number of splices: Total | 1409119 Number of splices: Annotated (sjdb) | 1326594 Number of splices: GT/AG | 1377189 Number of splices: GC/AG | 11322 Number of splices: AT/AC | 423 Number of splices: Non-canonical | 20185 Mismatch rate per base, % | 0.21% Deletion rate per base | 0.00% Deletion average length | 1.44 Insertion rate per base | 0.00% Insertion average length | 1.18 MULTI-MAPPING READS: Number of reads mapped to multiple loci | 76397 % of reads mapped to multiple loci | 0.30% Number of reads mapped to too many loci | 58765 % of reads mapped to too many loci | 0.23% UNMAPPED READS: Number of reads unmapped: too many mismatches | 0 % of reads unmapped: too many mismatches | 0.00% Number of reads unmapped: too short | 18780473 % of reads unmapped: too short | 72.89% Number of reads unmapped: other | 147037 % of reads unmapped: other | 0.57% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00%
I want to know why only 26.01% of read Uniquely mapped with the reference genome? Is it due to the adapter sequence contamination (not provided by the service provider). If so how to identify and remove them.
I am a beginner in RNA Seq analysis and any suggestion in this regard will be highly useful in my work.