Question

How can I find KO IDs for ORF sequences in a large FASTA file?

0

Entering edit mode

6 months ago

Nikesh • 0

Hi,

I have a protein sequence file (about 14.9 GB) in FASTA format. Each sequence has an ORF ID in the header line. I want to find the KEGG Orthology (KO) IDs that match these ORFs.

Can someone please suggest a tool or workflow that can handle large files and help me map ORF IDs to KO IDs?

Thanks in advance!

KEGG ORF • 1.0k views

ADD COMMENT • link updated 4 months ago by Mensur Dlakic ★ 30k • written 6 months ago by Nikesh • 0

GenoMax · Answer 1 · 2025-05-01

0

Entering edit mode

6 months ago

Mensur Dlakic ★ 30k

There is a tool made exactly for that purpose:

https://github.com/takaram/kofam_scan

ADD COMMENT • link 6 months ago by Mensur Dlakic ★ 30k

0

Entering edit mode

Hi t tried to work with this, but there are some errors occurring, Do you have code or any source material to work on this ? Mensur Dlakic

ADD REPLY • link 6 months ago by Nikesh • 0

0

Entering edit mode

I don't think anyone can help you when the only feedback you provide is "there are some errors occurring." If I told you that I tried to build a house but there were some problems, would you be able to offer any advice to me?

What I do know is when I installed all the dependencies outlined on that GitHub page and provided correct input files, everything worked. An educated guess is that you didn't do one or the other.

ADD REPLY • link 6 months ago by Mensur Dlakic ★ 30k

0

Entering edit mode

Mensur Dlakic

Hi, I set up the environment in HCC, and my FASTA file contains 98 sequences. This is my SLURM script, but I’ve tried running it changing time duration without success.

#!/bin/bash
#SBATCH --job-name=kofamscan
#SBATCH --output=kofamscan.out
#SBATCH --error=kofamscan.err
#SBATCH --time=5:59:00
#SBATCH --mem=32G
#SBATCH --cpus-per-task=8


source ~/miniconda3/etc/profile.d/conda.sh
conda activate kofamscan_env

./exec_annotation \
  -o kofam_output.txt \
  -f detail-tsv \
  -p profiles/ \
  -k ko_list \
  --cpu 8 \
  test.faa

It keeps giving the following error, I and also tried changing cpu allocation.

“slurmstepd: error: * JOB 10654468 ON c2023 CANCELLED AT 2025-06-10T21:46:45 DUE TO TIME LIMIT *”

What should I do? What could be the issue?

ADD REPLY • link updated 4 months ago by GenoMax 154k • written 4 months ago by Nikesh • 0

0

Entering edit mode

JOB 10654468 ON c2023 CANCELLED AT 2025-06-10T21:46:45 DUE TO TIME LIMIT *”

You are asking for one minute less than 6 hours in your SLURM request so the job is getting killed once that limit is reached. Ask for more time in --time=1-0 (this would be one day).

ADD REPLY • link 4 months ago by GenoMax 154k

0

Entering edit mode

What GenoMax said. I suggest you inquire about the SLURM time limit and set it to a maximum value allowed. This would be 6 days:

#SBATCH --time=6-00:00:00

Also, why not ask for more than 8 CPUs?

ADD REPLY • link 4 months ago by Mensur Dlakic ★ 30k