Entering edit mode
6.5 years ago
KVC_bioinfo
▴
590
Hello all,
I am running GMAP on nanopore sequence aligning it to the human genome. The query file is approximately 2gb. The alignment is taking extremely long. It's been more than 12 hours and the output .sam file is so far only 130mb.
I am using following command:
path/to/gmapl -D /path/to/dir/ONT -d ONT /path/to/sample/fasta -t 4 -n 0 -f samse > /path/to/output/output.sam
Am I missing anything here?
Could someone please help me. Thanks
Asking why
X
program is taking too long has no good answers as long as the program is still running/producing output. Since you are likely running this for the first time you have no reference to compare (past results) either. It could be your data, hardware and/or any options that you may be missing/misusing (mmap
case).I was trying to understand if there is something in my command that is missing or wrong that's why it might take longer. I thought someone who has already used it might be able to recognize it. about mmap: I did not understand what that is but seems like it might slow the process.
Therefore, all I am trying here is to understand if I am missing anything. Thank you
Have you tried minimap2? Fast and can do spliced alignment.
Not yet. I need gmap results because this is part of the comparison for aligners.
Can anyone who has used GMAP aligner before, recognize anything wrong in my command?
I am still struggling with it.
Have you tried just splitting up your input file of reads into 20, 50, 100 subsets and submitting 20, 50, 100 jobs to a cluster?
I did not try that. But I made a file of with subset of original with few thousand reads. to check if it works which is also taking forever.
But I will try doing the way you suggested.
I tried splitting it, it is still taking extremely long.
Hello, I tried doing that It still took forever. I had to kill the job. I assume GMAP does not work well with 1D reads.
@WouterDeCoste: I am currently trying minimap. Thank you for the suggestion.
I just came acorss this from the manual. I not sure what mmap and allocate is? it mentiones ""If mmap not available and allocate not chosen, then will use fileio (very slow)"" Is the case happening here? Computation options
-B, --batch=INT Batch mode (default = 2)