Gmap aligner taking too long
0
0
Entering edit mode
3.4 years ago
KVC_bioinfo ▴ 530

Hello all,

I am running GMAP on nanopore sequence aligning it to the human genome. The query file is approximately 2gb. The alignment is taking extremely long. It's been more than 12 hours and the output .sam file is so far only 130mb.

I am using following command:

path/to/gmapl -D /path/to/dir/ONT -d ONT /path/to/sample/fasta -t 4 -n 0 -f samse > /path/to/output/output.sam

Am I missing anything here?

Could someone please help me. Thanks

gmap aligner • 1.4k views
ADD COMMENT
2
Entering edit mode

Asking why X program is taking too long has no good answers as long as the program is still running/producing output. Since you are likely running this for the first time you have no reference to compare (past results) either. It could be your data, hardware and/or any options that you may be missing/misusing (mmap case).

ADD REPLY
0
Entering edit mode

I was trying to understand if there is something in my command that is missing or wrong that's why it might take longer. I thought someone who has already used it might be able to recognize it. about mmap: I did not understand what that is but seems like it might slow the process.

Therefore, all I am trying here is to understand if I am missing anything. Thank you

ADD REPLY
1
Entering edit mode

Have you tried minimap2? Fast and can do spliced alignment.

ADD REPLY
0
Entering edit mode

Not yet. I need gmap results because this is part of the comparison for aligners.

ADD REPLY
0
Entering edit mode

Can anyone who has used GMAP aligner before, recognize anything wrong in my command?

I am still struggling with it.

ADD REPLY
2
Entering edit mode

Have you tried just splitting up your input file of reads into 20, 50, 100 subsets and submitting 20, 50, 100 jobs to a cluster?

ADD REPLY
0
Entering edit mode

I did not try that. But I made a file of with subset of original with few thousand reads. to check if it works which is also taking forever.

ADD REPLY
0
Entering edit mode

But I will try doing the way you suggested.

ADD REPLY
0
Entering edit mode

I tried splitting it, it is still taking extremely long.

ADD REPLY
0
Entering edit mode

Hello, I tried doing that It still took forever. I had to kill the job. I assume GMAP does not work well with 1D reads.

ADD REPLY
0
Entering edit mode

@WouterDeCoste: I am currently trying minimap. Thank you for the suggestion.

ADD REPLY
0
Entering edit mode

I just came acorss this from the manual. I not sure what mmap and allocate is? it mentiones ""If mmap not available and allocate not chosen, then will use fileio (very slow)"" Is the case happening here? Computation options

-B, --batch=INT Batch mode (default = 2)

                             Mode     Offsets       Positions       Genome

                               0      see note      mmap            mmap

                               1      see note      mmap & preload  mmap

                  (default)2      see note      mmap & preload  mmap & preload
                               3      see note      allocate        mmap & preload
                               4      see note      allocate        allocate
                               5      expand        allocate        allocate
                       Note: For a single sequence, all data structures use mmap
                       If mmap not available and allocate not chosen, then will use fileio (very slow)
                   Note about --batch and offsets: Expansion of offsets can be controlled
                   independently by the --expand-offsets flag.  The --batch=5 option is equivalent
                   to --batch=4 plus --expand-offsets=1
ADD REPLY

Login before adding your answer.

Traffic: 2347 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6