My data consists on 3 RNA-seq datasets and a non-model genome. I ran Trinity on each dataset to produce the transcripts, and afterward, I wanted to align the transcripts to the genome so that are correctly mapped (and in the end used as experimental evidence for annotation). Unfortunately, GMAP had an unstoppable increase in RAM use, and even when trying to correct the memory usage, the program couldn't align any read.
I would like to know if there's any alternative that could map the transcripts data to the genome. I've seen that STAR and BLAT are sometimes mentioned, but I ignore if they can be used in the same way as I intended with GMAP.
I really appreciate any help that can be provided.
If you point out which version of GMAP and the command used to build the index and run the alignment, someone could help you with the issue.
To align transcripts to a genome, you can use Exonerate, Spaln2, BLAT, Splign, sim4, possibly others.
First of all, thanks for your early answer. This is what it says in terms of the version of GMAP:
GMAP: Genomic Mapping and Alignment Program Part of GMAP package, version 2017-03-17
And these are the commands I used:
I changed the kmer size in order to decrease the RAM usage (chapter 4) from the 15 as default. I have a bit of experience with Exonerate, as I used it for the homolog protein alignment to the genome, but I wasn't sure that it could also be used for transcripts, as the model used for that is named "est2genome".
If you are mapping a Trinity assembly, shouldn't the output format be
samse
?As far as I'm concerned, samse stands for single ends, while this dataset was obtained using paired ends. That's why there's that difference from samse (single) to sampe (paired).
Your reads were paired end, you assembled them with Trinity, now the transcripts are "single end". GMAP may be trying to map your transcripts in pairs.
I understand your point, but unfortunately the change gives the same result:
Did you get aligned reads with other software? There is a suggestion of parameters for PacBio reads here, it may be helpful for you.
Did you check if your transcriptome is indeed D. buzzatii? Blast a few hundred reads (or diamond blastx the whole transcriptome) and check the taxonomy distribution with KronaTools.
It seems that in the end, nothing was really wrong, but that there were several reads that couldn't be mapped, which is predictable to happen with such numbers, I suppose. Even so, if when manipulating the data something goes wrong, I will check again this process.
Thanks a lot for your help.
Hi, i have the same problem.i use a pacbio data and get a consensus fasta finaly.i use the command :gmap -D <gmap_db_location> -d hg38 -f samse -n 0 -t 12 -z sense_force hq_isoforms.fasta > hq_isoforms.fasta.sam
But there are many mistakes like "No paths found for c322894/1/1246",and the sam profile is empty
GMAP version 2018-05-30
I really appreciate any help that can be provided.
Use minimap2.