Question: Alternative to GMAP for mapping Trinity transcripts to genome
0
gravatar for Tye Kahn
3.6 years ago by
Tye Kahn10
Tye Kahn10 wrote:

My data consists on 3 RNA-seq datasets and a non-model genome. I ran Trinity on each dataset to produce the transcripts, and afterward, I wanted to align the transcripts to the genome so that are correctly mapped (and in the end used as experimental evidence for annotation). Unfortunately, GMAP had an unstoppable increase in RAM use, and even when trying to correct the memory usage, the program couldn't align any read.

I would like to know if there's any alternative that could map the transcripts data to the genome. I've seen that STAR and BLAT are sometimes mentioned, but I ignore if they can be used in the same way as I intended with GMAP.

I really appreciate any help that can be provided.

ADD COMMENTlink modified 17 months ago by 8713229420 • written 3.6 years ago by Tye Kahn10
1

If you point out which version of GMAP and the command used to build the index and run the alignment, someone could help you with the issue.

To align transcripts to a genome, you can use Exonerate, Spaln2, BLAT, Splign, sim4, possibly others.

ADD REPLYlink modified 17 months ago • written 3.6 years ago by h.mon31k

First of all, thanks for your early answer. This is what it says in terms of the version of GMAP:

GMAP: Genomic Mapping and Alignment Program Part of GMAP package, version 2017-03-17

And these are the commands I used:

gmap_build -d genome7 -D . -k 13 ../../data/dbuzzatii_mask.fasta

gmap -n 0 -D . -d genome7 ../../out/out_trinity_dat3.3/Trinity.fasta -f sampe > ../../out/trinity_gmap.sam

I changed the kmer size in order to decrease the RAM usage (chapter 4) from the 15 as default. I have a bit of experience with Exonerate, as I used it for the homolog protein alignment to the genome, but I wasn't sure that it could also be used for transcripts, as the model used for that is named "est2genome".

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Tye Kahn10

If you are mapping a Trinity assembly, shouldn't the output format be samse?

ADD REPLYlink written 3.6 years ago by h.mon31k

As far as I'm concerned, samse stands for single ends, while this dataset was obtained using paired ends. That's why there's that difference from samse (single) to sampe (paired).

ADD REPLYlink written 3.6 years ago by Tye Kahn10
1

Your reads were paired end, you assembled them with Trinity, now the transcripts are "single end". GMAP may be trying to map your transcripts in pairs.

ADD REPLYlink written 3.6 years ago by h.mon31k

I understand your point, but unfortunately the change gives the same result:

gmap_build --db=genome8 --dir=/home/jochoteco/local/gmap-2017-03-17 --kmer=13 /home/jochoteco/data/dbuzzatii_smask.fasta 
gmap -n 0 --dir=/home/jochoteco/local/gmap-2017-03-17 --db=genome8 /home/jochoteco/out/out_trinity_dat3.3/Trinity.fasta --format=samse > trinity_gmap.sam
Starting alignment
No paths found for TRINITY_DN22114_c0_g1_i1
No paths found for TRINITY_DN22104_c0_g1_i1
No paths found for TRINITY_DN22113_c0_g1_i1
ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Tye Kahn10

Did you get aligned reads with other software? There is a suggestion of parameters for PacBio reads here, it may be helpful for you.

Did you check if your transcriptome is indeed D. buzzatii? Blast a few hundred reads (or diamond blastx the whole transcriptome) and check the taxonomy distribution with KronaTools.

ADD REPLYlink written 3.6 years ago by h.mon31k

It seems that in the end, nothing was really wrong, but that there were several reads that couldn't be mapped, which is predictable to happen with such numbers, I suppose. Even so, if when manipulating the data something goes wrong, I will check again this process.

Thanks a lot for your help.

ADD REPLYlink written 3.6 years ago by Tye Kahn10

Hi, i have the same problem.i use a pacbio data and get a consensus fasta finaly.i use the command :gmap -D <gmap_db_location> -d hg38 -f samse -n 0 -t 12 -z sense_force hq_isoforms.fasta > hq_isoforms.fasta.sam

But there are many mistakes like "No paths found for c322894/1/1246",and the sam profile is empty

GMAP version 2018-05-30

I really appreciate any help that can be provided.

ADD REPLYlink modified 17 months ago • written 17 months ago by 8713229420

Use minimap2.

ADD REPLYlink written 17 months ago by WouterDeCoster44k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1455 users visited in the last hour