I have simulated "error-free" reads from the transcriptome using
randomreads.sh from BBMap suit, but when I map them back to the genome using
STAR I get just 88% of unique mapping, and around 11% of unmapped reads.
The command for simulation is
randomreads.sh \ ref=cglab_transcriptome.fa \ out1=reads1.fastq \ build=1 \ length=100 \ reads=500000 \ replacenoref=t \ simplenames=t \ seed=-1 \ paired=f \ metagenome=f \ addpairnum=t \ snprate=0 \ midq=35 \ maxq=39 \ minq=30 \ maxsnps=0 \ maxinss=0 \ maxdels=0 \ maxsubs=0 \ maxns=0
So I assume that it should produce ideal reads without snps/snvs. However, I see that its not the case when I align the non mapped read against the corresponding transcript.
Any suggestions what's going on? Do I miss any parameter of