Question: Proteins that cannot form biofilm?
gravatar for nafizh
15 months ago by
nafizh0 wrote:

I am trying to build a machine learning training set for bacterial protein sequences that form biofilm, and that cannot. I collected the positive sequences from the GO ontology website but for negative sequences I am not sure which sequences to incorporate into my training set.

Can anyone point me to resources for proteins sequences that are known to be not capable of forming biofilms?

ADD COMMENTlink modified 15 months ago by Elisabeth Gasteiger1.5k • written 15 months ago by nafizh0

What do you mean by proteins that form biofilms? Are you trying to find out what the major protein components of a biofilm are? Because bacteria, not proteins, form biofilms.

ADD REPLYlink written 15 months ago by jrj.healey11k

Essentially, yes, I am trying to detect proteins that are indispensable in forming biofilms. So, I need a negative set of protein sequences which definitely don't have that function.

ADD REPLYlink written 15 months ago by nafizh0

I think you need to be very careful how you define 'indispensable'. dnaA for example, is obviously not a biofilm producing gene, but if you lacked the gene, you wouldn't get a biofilm, because the organism would be non-viable (as it's a required housekeeping gene).

If it's sufficient that they don't have a primary functional role, then you could use standard, so-called housekeeping genes as negatives. These are easy to find in the literature as they're commonly used for negative controls in RT-PCR experiments. e.g. dnaA, gyrB, rpoA etc.

ADD REPLYlink written 15 months ago by jrj.healey11k
gravatar for Elisabeth Gasteiger
15 months ago by
Elisabeth Gasteiger1.5k wrote:

I cannot answer about the best GO terms to use, and do not know how consistently they are applied to proteins that actually form biofilm.

However, from the UniProt point of view, I'd like to alert you to fact that negative queries should be used with extreme caution: Indeed, the absence of an annotation does not mean absence of a function (a true negative). Lack of annotation may simply be due to false negatives: incompleteness either in the state of experiment-derived knowledge of a particular protein's function, or incompleteness in representing that knowledge as annotations, i.e. an entry may not be up-to-date and therefore does not have the positive annotation (yet).


ADD COMMENTlink written 15 months ago by Elisabeth Gasteiger1.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1289 users visited in the last hour