Using the STAR aligner, I am getting a very low mapping percentage for my single cell RNA seq data (5-10%). A majority of my reads are being considered "too short" (>90%). My current parameters are
STAR --genomeDir --outFilterScoreMinOverLread 0.3 --outFilterMatchNminOverLread 0.3 --outReadsUnmapped Fastx --outSAMstrandField intronMotif --readFilesCommand zcat --readFilesIn *.fq.gz --runThreadN 6
I am also trimming the reads with trim galore as follows:
trim_galore $R2_file --trim-n -a AAAAAAAA -clip_R1 9 -o $dir_name
Is there any hypothesis for why we are getting such low percentage of mapped reads? I am particularly interested in assessing contamination. Is there a good software for just quickly assessing whether my samples could be contaminated? I have no good idea with what they could be contaminated with.