Entering edit mode
4 months ago
Nikesh
•
0
Hi,
I have a protein sequence file (about 14.9 GB) in FASTA format. Each sequence has an ORF ID in the header line. I want to find the KEGG Orthology (KO) IDs that match these ORFs.
Can someone please suggest a tool or workflow that can handle large files and help me map ORF IDs to KO IDs?
Thanks in advance!
Hi t tried to work with this, but there are some errors occurring, Do you have code or any source material to work on this ? Mensur Dlakic
I don't think anyone can help you when the only feedback you provide is "there are some errors occurring." If I told you that I tried to build a house but there were some problems, would you be able to offer any advice to me?
What I do know is when I installed all the dependencies outlined on that GitHub page and provided correct input files, everything worked. An educated guess is that you didn't do one or the other.
Mensur Dlakic
Hi, I set up the environment in HCC, and my FASTA file contains 98 sequences. This is my SLURM script, but I’ve tried running it changing time duration without success.
It keeps giving the following error, I and also tried changing cpu allocation.
“slurmstepd: error: * JOB 10654468 ON c2023 CANCELLED AT 2025-06-10T21:46:45 DUE TO TIME LIMIT *”
What should I do? What could be the issue?
You are asking for one minute less than 6 hours in your SLURM request so the job is getting killed once that limit is reached. Ask for more time in
--time=1-0
(this would be one day).What GenoMax said. I suggest you inquire about the SLURM time limit and set it to a maximum value allowed. This would be 6 days:
Also, why not ask for more than 8 CPUs?