Question

Alternative to GMAP for mapping Trinity transcripts to genome

0

Entering edit mode

7.0 years ago

Tye Kahn ▴ 20

My data consists on 3 RNA-seq datasets and a non-model genome. I ran Trinity on each dataset to produce the transcripts, and afterward, I wanted to align the transcripts to the genome so that are correctly mapped (and in the end used as experimental evidence for annotation). Unfortunately, GMAP had an unstoppable increase in RAM use, and even when trying to correct the memory usage, the program couldn't align any read.

I would like to know if there's any alternative that could map the transcripts data to the genome. I've seen that STAR and BLAT are sometimes mentioned, but I ignore if they can be used in the same way as I intended with GMAP.

I really appreciate any help that can be provided.

RNA-Seq Trinity GMAP transcripts mapping • 5.1k views

ADD COMMENT • link updated 4.9 years ago by 871322942 • 0 • written 7.0 years ago by Tye Kahn ▴ 20

1

Entering edit mode

If you point out which version of GMAP and the command used to build the index and run the alignment, someone could help you with the issue.

To align transcripts to a genome, you can use Exonerate, Spaln2, BLAT, Splign, sim4, possibly others.

ADD REPLY • link 4.9 years ago by h.mon 35k

0

Entering edit mode

First of all, thanks for your early answer. This is what it says in terms of the version of GMAP:

GMAP: Genomic Mapping and Alignment Program Part of GMAP package, version 2017-03-17

And these are the commands I used:

gmap_build -d genome7 -D . -k 13 ../../data/dbuzzatii_mask.fasta

gmap -n 0 -D . -d genome7 ../../out/out_trinity_dat3.3/Trinity.fasta -f sampe > ../../out/trinity_gmap.sam

I changed the kmer size in order to decrease the RAM usage (chapter 4) from the 15 as default. I have a bit of experience with Exonerate, as I used it for the homolog protein alignment to the genome, but I wasn't sure that it could also be used for transcripts, as the model used for that is named "est2genome".

ADD REPLY • link 7.0 years ago by Tye Kahn ▴ 20

0

Entering edit mode

If you are mapping a Trinity assembly, shouldn't the output format be samse?

ADD REPLY • link 7.0 years ago by h.mon 35k

0

Entering edit mode

As far as I'm concerned, samse stands for single ends, while this dataset was obtained using paired ends. That's why there's that difference from samse (single) to sampe (paired).

ADD REPLY • link 7.0 years ago by Tye Kahn ▴ 20

1

Entering edit mode

Your reads were paired end, you assembled them with Trinity, now the transcripts are "single end". GMAP may be trying to map your transcripts in pairs.

ADD REPLY • link 7.0 years ago by h.mon 35k

0

Entering edit mode

I understand your point, but unfortunately the change gives the same result:

gmap_build --db=genome8 --dir=/home/jochoteco/local/gmap-2017-03-17 --kmer=13 /home/jochoteco/data/dbuzzatii_smask.fasta 
gmap -n 0 --dir=/home/jochoteco/local/gmap-2017-03-17 --db=genome8 /home/jochoteco/out/out_trinity_dat3.3/Trinity.fasta --format=samse > trinity_gmap.sam
Starting alignment
No paths found for TRINITY_DN22114_c0_g1_i1
No paths found for TRINITY_DN22104_c0_g1_i1
No paths found for TRINITY_DN22113_c0_g1_i1

ADD REPLY • link 7.0 years ago by Tye Kahn ▴ 20

0

Entering edit mode

Did you get aligned reads with other software? There is a suggestion of parameters for PacBio reads here, it may be helpful for you.

Did you check if your transcriptome is indeed D. buzzatii? Blast a few hundred reads (or diamond blastx the whole transcriptome) and check the taxonomy distribution with KronaTools.

ADD REPLY • link 7.0 years ago by h.mon 35k

0

Entering edit mode

It seems that in the end, nothing was really wrong, but that there were several reads that couldn't be mapped, which is predictable to happen with such numbers, I suppose. Even so, if when manipulating the data something goes wrong, I will check again this process.

Thanks a lot for your help.

ADD REPLY • link 7.0 years ago by Tye Kahn ▴ 20

0

Entering edit mode

Hi, i have the same problem.i use a pacbio data and get a consensus fasta finaly.i use the command ：gmap -D <gmap_db_location> -d hg38 -f samse -n 0 -t 12 -z sense_force hq_isoforms.fasta > hq_isoforms.fasta.sam

But there are many mistakes like "No paths found for c322894/1/1246",and the sam profile is empty

GMAP version 2018-05-30

I really appreciate any help that can be provided.

ADD REPLY • link 4.9 years ago by 871322942 • 0

0

Entering edit mode

Use minimap2.

ADD REPLY • link 4.9 years ago by WouterDeCoster 47k