Question

Gmap aligner taking too long

0

Entering edit mode

6.5 years ago

KVC_bioinfo ▴ 590

Hello all,

I am running GMAP on nanopore sequence aligning it to the human genome. The query file is approximately 2gb. The alignment is taking extremely long. It's been more than 12 hours and the output .sam file is so far only 130mb.

I am using following command:

path/to/gmapl -D /path/to/dir/ONT -d ONT /path/to/sample/fasta -t 4 -n 0 -f samse > /path/to/output/output.sam

Am I missing anything here?

Could someone please help me. Thanks

gmap aligner • 2.4k views

ADD COMMENT • link 6.5 years ago by KVC_bioinfo ▴ 590

2

Entering edit mode

Asking why X program is taking too long has no good answers as long as the program is still running/producing output. Since you are likely running this for the first time you have no reference to compare (past results) either. It could be your data, hardware and/or any options that you may be missing/misusing (mmap case).

ADD REPLY • link 6.5 years ago by GenoMax 142k

0

Entering edit mode

I was trying to understand if there is something in my command that is missing or wrong that's why it might take longer. I thought someone who has already used it might be able to recognize it. about mmap: I did not understand what that is but seems like it might slow the process.

Therefore, all I am trying here is to understand if I am missing anything. Thank you

ADD REPLY • link 6.5 years ago by KVC_bioinfo ▴ 590

1

Entering edit mode

Have you tried minimap2? Fast and can do spliced alignment.

ADD REPLY • link 6.5 years ago by WouterDeCoster 47k

0

Entering edit mode

Not yet. I need gmap results because this is part of the comparison for aligners.

ADD REPLY • link 6.5 years ago by KVC_bioinfo ▴ 590

0

Entering edit mode

Can anyone who has used GMAP aligner before, recognize anything wrong in my command?

I am still struggling with it.

ADD REPLY • link 6.5 years ago by KVC_bioinfo ▴ 590

2

Entering edit mode

Have you tried just splitting up your input file of reads into 20, 50, 100 subsets and submitting 20, 50, 100 jobs to a cluster?

ADD REPLY • link 6.5 years ago by Philipp Bayer 8.4k

0

Entering edit mode

I did not try that. But I made a file of with subset of original with few thousand reads. to check if it works which is also taking forever.

ADD REPLY • link 6.5 years ago by KVC_bioinfo ▴ 590

0

Entering edit mode

But I will try doing the way you suggested.

ADD REPLY • link 6.5 years ago by KVC_bioinfo ▴ 590

0

Entering edit mode

I tried splitting it, it is still taking extremely long.

ADD REPLY • link 6.5 years ago by KVC_bioinfo ▴ 590

0

Entering edit mode

Hello, I tried doing that It still took forever. I had to kill the job. I assume GMAP does not work well with 1D reads.

ADD REPLY • link 6.5 years ago by KVC_bioinfo ▴ 590

0

Entering edit mode

@WouterDeCoste: I am currently trying minimap. Thank you for the suggestion.

ADD REPLY • link 6.5 years ago by KVC_bioinfo ▴ 590

0

Entering edit mode

I just came acorss this from the manual. I not sure what mmap and allocate is? it mentiones ""If mmap not available and allocate not chosen, then will use fileio (very slow)"" Is the case happening here? Computation options

-B, --batch=INT Batch mode (default = 2)

                             Mode     Offsets       Positions       Genome

                               0      see note      mmap            mmap

                               1      see note      mmap & preload  mmap

                  (default)2      see note      mmap & preload  mmap & preload
                               3      see note      allocate        mmap & preload
                               4      see note      allocate        allocate
                               5      expand        allocate        allocate
                       Note: For a single sequence, all data structures use mmap
                       If mmap not available and allocate not chosen, then will use fileio (very slow)
                   Note about --batch and offsets: Expansion of offsets can be controlled
                   independently by the --expand-offsets flag.  The --batch=5 option is equivalent
                   to --batch=4 plus --expand-offsets=1

ADD REPLY • link 6.5 years ago by KVC_bioinfo ▴ 590