I have about 100 queries like the one given below and am trying to run alphafold multimer via Local ColabFold
>P01375_Q9VJ83
RSSSRTPSDKPVAHVVANPQAEGQLQWLNRRANALLANGVELRDNQLVVPSEGLYLIYSQVLFKGQGCPSTHVLLTHTISRIAVSYQTKVNLLSAIKSPCQRETPEGAEAKPWYEPIYLGGVFQLEKGDRLSAEINRPDYLDFAESGQVYFGIIAL:
RSSSRTPSDKPVAHVVANPQAEGQLQWLNRRANALLANGVELRDNQLVVPSEGLYLIYSQVLFKGQGCPSTHVLLTHTISRIAVSYQTKVNLLSAIKSPCQRETPEGAEAKPWYEPIYLGGVFQLEKGDRLSAEINRPDYLDFAESGQVYFGIIAL:
RSSSRTPSDKPVAHVVANPQAEGQLQWLNRRANALLANGVELRDNQLVVPSEGLYLIYSQVLFKGQGCPSTHVLLTHTISRIAVSYQTKVNLLSAIKSPCQRETPEGAEAKPWYEPIYLGGVFQLEKGDRLSAEINRPDYLDFAESGQVYFGIIAL:
RGTRCGEILCNISQYCSPFDLHCKPCADACNATSHNYQPDECKKDCQFYL:
RGTRCGEILCNISQYCSPFDLHCKPCADACNATSHNYQPDECKKDCQFYL:
RGTRCGEILCNISQYCSPFDLHCKPCADACNATSHNYQPDECKKDCQFYL
Questions
Should I provide each sequence pair as a separate FASTA file, or is it fine to include multiple queries in a single FASTA file?
If I include multiple queries in a single FASTA file, will MSA generation run only once for all queries, or will it be computed separately for each?
I would appreciate insights from those experienced with AlphaFold Multimer and MSA behavior in Local ColabFold. Thank you!
Sorry, I should've been more specific. I'm currently unsure whether I can run colabfold_search since it's very intensive. But from your response I take it that it doesn't matter the form of input. So would it be okay to query 20 sequences(each trimers) to colabfold mmseq server
Ha, yeah that was not clear in your question. You have to be able to read the entire uniref30 database (+ any others you're using) into memory. We have a machine with 192Gb of RAM, and our custom databases barely fit. There are other DB loading options in the options for local
colabfold_search
, but I don't think they alleviate the problem that much from memory.I've not really used the MSA servers much, but I suspect 20 query trimers should be fine.
Thanks, we do have a server with a TB of memory. But I wanted to know the options so as to not get my IP blocked. And I currently have around 100 sequences of which I thought maybe I could do 20 per day. But anyway thanks a lot.
I suspect you'll have better luck submitting multiple sequences in a single request rather than spamming the server with lots of single sequence jobs. Also, 100 sequences is not that many, so sending them all in a single request should be fine. Colleagues of mine have processed thousands of sequences on the MSA server without consequence.