Hello all,
I have RNA-Seq data from human cancer samples. I ultimately want to single out reads that do not align properly (i.e., don't align or are discordant/split-mapped/etc.) to the human reference for downstream analysis. I could do this by mapping to the genome with STAR which is what we typically do for alignment. However, I want to find the quickest/least expensive way. The idea is to first align to the transcriptome and keep only those unaligned/ambiguous reads for further analysis. I can then align these to the genome with STAR and my guess is this would be cheaper/faster than just aligning all of them to the genome to begin with. Please correct me if this is immediately wrong.
Now my question is what is the best approach for the initial transcriptome mapping. Based on comparing several tools (STAR, Bowtie2, Hisat2, kallisto), I think using Bowtie2 to map reads to human transcripts. STAR is generally not used for mapping to the transcriptome and has a higher memory usage. Kallisto pseudo-alignment is fast but it wouldn't give me the results I want, which are the unaligned reads back. Hisat2 seems comparable to Bowtie2, but in this case I don't need the alignment to be splice-aware, and it seems like Hisat2 is not as well maintained. Please let me know if you have done something similar or have any thoughts on this. Thanks!
Thanks for your response! I understand the cost difference may not be much... I am considering to compare this approach to just aligning to genome with STAR to see if we get anything better than negligible improvement. This is interesting, I didn't realize salmon wrote out the unmapped read names, thanks. I'll look into salmon/bbmap.