Netsurfp2 output is empty when running on multiple sequences
1
0
Entering edit mode
2.6 years ago

I am attempting to use the netsurfp2 software to predict solvent accessibility of protein residues. The software appears to work when submitting jobs to their server, but I need to run it for many proteins locally. The alignment steps appear to work correctly, the expected intermediate files from mmseq2 are created and are well-formed. However, the result file which is supposed to contain the netsurfp2 predictions is empty when I attempt to run the program on multiple sequences. When I use it for a single sequence at a time the output is created successfully, but this is prohibitively slow for high-throughput applications. I reached out to the developers but haven't hear back, so I thought I'd check here to see if anyone has had success running netsurfp2 locally.

prediction bug netsurfp2 software protein • 799 views
ADD COMMENT
1
Entering edit mode
2.6 years ago
Mensur Dlakic ★ 28k

It sounds like you are having success running netsurfp2 locally. I don't think the program is meant for predicting anything from a multiple sequence set simultaneously. Rather, I think when you submit multiple sequences the server will internally split them into individual files and do the predictions. This is to say that it will work for you once you split the sequences and run them in parallel, assuming your computer has the memory and CPUs to do so.

As to being prohibitively slow: all applications that build profiles from individual sequences are relatively slow, as they have to do a sequence search against large databases, find homologs, align them and build a profile. We should all be grateful that HHblits and MMseqs2 are much faster than BLAST, as it used to be even slower.

There is a way out of it, assuming that your sequences are related. I suggest you limit your predictions to one or several representatives per group. If you predict solvent accessibility for sequence A, and sequence B is 80-90% identical to it, it is safe to assume that the same prediction is valid sequence B.

ADD COMMENT
0
Entering edit mode

I think you're right. It was confusing because the MSA steps work correctly for input of multiple sequences, but probably the netsurf prediction step does not. Its unfortunate that there isn't an option to provide the MSA directly. It seems that they have that functionality on their server which does accept multiple sequences and doesn't appear to run the MSA step for each individually. I will probably have update their script to bypass the MSA step and go directly to prediction in order to run it in high-throughput. Thanks for your help!

ADD REPLY

Login before adding your answer.

Traffic: 1137 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6