Question

Best way to provide sequences to Local Colabfold to not overload their mmseq2 server

0

Entering edit mode

4 months ago

myoui3122010 ▴ 30

I have about 100 queries like the one given below and am trying to run alphafold multimer via Local ColabFold

>P01375_Q9VJ83
RSSSRTPSDKPVAHVVANPQAEGQLQWLNRRANALLANGVELRDNQLVVPSEGLYLIYSQVLFKGQGCPSTHVLLTHTISRIAVSYQTKVNLLSAIKSPCQRETPEGAEAKPWYEPIYLGGVFQLEKGDRLSAEINRPDYLDFAESGQVYFGIIAL:
RSSSRTPSDKPVAHVVANPQAEGQLQWLNRRANALLANGVELRDNQLVVPSEGLYLIYSQVLFKGQGCPSTHVLLTHTISRIAVSYQTKVNLLSAIKSPCQRETPEGAEAKPWYEPIYLGGVFQLEKGDRLSAEINRPDYLDFAESGQVYFGIIAL:
RSSSRTPSDKPVAHVVANPQAEGQLQWLNRRANALLANGVELRDNQLVVPSEGLYLIYSQVLFKGQGCPSTHVLLTHTISRIAVSYQTKVNLLSAIKSPCQRETPEGAEAKPWYEPIYLGGVFQLEKGDRLSAEINRPDYLDFAESGQVYFGIIAL:
RGTRCGEILCNISQYCSPFDLHCKPCADACNATSHNYQPDECKKDCQFYL:
RGTRCGEILCNISQYCSPFDLHCKPCADACNATSHNYQPDECKKDCQFYL:
RGTRCGEILCNISQYCSPFDLHCKPCADACNATSHNYQPDECKKDCQFYL

Questions

Should I provide each sequence pair as a separate FASTA file, or is it fine to include multiple queries in a single FASTA file?
If I include multiple queries in a single FASTA file, will MSA generation run only once for all queries, or will it be computed separately for each?

I would appreciate insights from those experienced with AlphaFold Multimer and MSA behavior in Local ColabFold. Thank you!

Local ColabFold • 1.3k views

ADD COMMENT • link updated 4 months ago by dthorbur ★ 3.0k • written 4 months ago by myoui3122010 ▴ 30

score 1 · Answer 1 · 2025-02-14

1

Entering edit mode

4 months ago

dthorbur ★ 3.0k

1. Should I provide each sequence pair as a separate FASTA file, or is it fine to include multiple queries in a single FASTA file?

You can run colabfold_search on an input directory with single fasta file with thousands of sequences in it. But, it will iterate over all the fasta files in a directory if that works. It's down to preference.

2. If I include multiple queries in a single FASTA file, will MSA generation run only once for all queries, or will it be computed separately for each?

Regardless of how you input them, you will get an .a3m file for each sequence found in the fasta file(s) in your input directory. If you are using older versions of colabfold, the files will be numeric (i.e., 1.a3m, 2.a3m), but if you are using a more recent version they will inherit the sequence name from the fasta.

ADD COMMENT • link 4 months ago by dthorbur ★ 3.0k

0

Entering edit mode

Sorry, I should've been more specific. I'm currently unsure whether I can run colabfold_search since it's very intensive. But from your response I take it that it doesn't matter the form of input. So would it be okay to query 20 sequences(each trimers) to colabfold mmseq server

ADD REPLY • link 4 months ago by myoui3122010 ▴ 30

1

Entering edit mode

Ha, yeah that was not clear in your question. You have to be able to read the entire uniref30 database (+ any others you're using) into memory. We have a machine with 192Gb of RAM, and our custom databases barely fit. There are other DB loading options in the options for local colabfold_search, but I don't think they alleviate the problem that much from memory.

I've not really used the MSA servers much, but I suspect 20 query trimers should be fine.

ADD REPLY • link 4 months ago by dthorbur ★ 3.0k

0

Entering edit mode

Thanks, we do have a server with a TB of memory. But I wanted to know the options so as to not get my IP blocked. And I currently have around 100 sequences of which I thought maybe I could do 20 per day. But anyway thanks a lot.

ADD REPLY • link 4 months ago by myoui3122010 ▴ 30

1

Entering edit mode

I suspect you'll have better luck submitting multiple sequences in a single request rather than spamming the server with lots of single sequence jobs. Also, 100 sequences is not that many, so sending them all in a single request should be fine. Colleagues of mine have processed thousands of sequences on the MSA server without consequence.

ADD REPLY • link 4 months ago by dthorbur ★ 3.0k