Hi guys,
I have RNA-seq data (paired-end reads) from several organ samples from 2 species: Jaculus jaculus and Jaculus hirtipes;
I mapped all samples (w/ hisat2) to Jaculus jaculus genome:
For J. jaculus I got mapping scores of around >96% and for J. hirtipes around 80%.
Then I had interest in getting the unmapped reads, so I used samtools and did:
$ samtools view -b -f 4 0x4 filename.sam > filename.unmapped.bam
in hope to BLAST these reads, I converted them to FASTA FILES with:
$ samtools fasta imput.unmapped.bam > output.unmapped.fasta
I tried to create a blast conda env with the mt and nr database and restricting it to taxid 9989 (rodents)
But it is too big database, I get server problems, and it would take me forever to do all my unmapped reads organ samples...
I lately read : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323668/ ; where the authors do assembly and mapping back of the unmapped RNA reads using trinity and then they re-mapped them using Bowtie2 and obtained read counts with Samtools
Could someone help me getting a better way to blast my unmapped reads for Jaculus ? or point me a place where I can learn ? Thanks
You could take a few reads and blast them at NCBI to see what kind of things you get. Considering that you got good alignments it may be best to not worry too much about the reads that did not map. There is probably a reason they did not map in first place. The "return on the investment" in terms of time you will put in for these unmapped reads may not be worth it at the end of the day.
Thank for your answer, I will try that and see if I am missing out on something :) have a great day