Entering edit mode
5.7 years ago
grant.hovhannisyan
★
2.6k
I have simulated "error-free" reads from the transcriptome using randomreads.sh
from BBMap suit, but when I map them back to the genome using STAR
I get just 88% of unique mapping, and around 11% of unmapped reads.
The command for simulation is
randomreads.sh \
ref=cglab_transcriptome.fa \
out1=reads1.fastq \
build=1 \
length=100 \
reads=500000 \
replacenoref=t \
simplenames=t \
seed=-1 \
paired=f \
metagenome=f \
addpairnum=t \
snprate=0 \
midq=35 \
maxq=39 \
minq=30 \
maxsnps=0 \
maxinss=0 \
maxdels=0 \
maxsubs=0 \
maxns=0
So I assume that it should produce ideal reads without snps/snvs. However, I see that its not the case when I align the non mapped read against the corresponding transcript.
Any suggestions what's going on? Do I miss any parameter of randomreads.sh
?
You're simulating based on the transcriptome and aligning it back to the genome, right? Maybe the two references don't share a 100% overlap?
Good point, but I get the transcriptome from the genome using
gffread
.All aligners have some fuzziness built into them. They all depend on an initial match that gets extended out.
Have you tried to use
bbmap.sh
to align the reads? Some of them could also be multi-mapping and depending on how manySTAR
reports by default some alignments may get left out.Re multimappings, so I wouldn't have been surprised to see mulitmappers, but they constitute to just <1%. 10% are not mapped because of short alignments (in STAR terminology). And I don't have experience with BBmap to be honest. Is it comparable with STAR in sense of speed and accuracy?
Proof is in the pudding. Try it out yourself :-)
indeed, but old friends and old wine are the best:)
Try
bbmap.sh
out. You would not regret it.