How Do You Start Writing An Algorithm? Help Needed!
6
7
Entering edit mode
14.1 years ago
Hazel ▴ 70

Hi,

I am a student currently doing a project about protein structure comparison and I am suppose to write a Java program to do the above.

The algorithm that I am using would be the 3D rigid body superposition algorithm which includes, translation, rotation & calculating the RMSD score.

java structure algorithm • 97k views
0
Entering edit mode

Thank you everybody for your time and answers!! It has really helped me! I'm currently doing the write-the-program bit right now.. it's so quite hard!

10
Entering edit mode
14.0 years ago
Neilfws 49k

An algorithm is simply a numerical recipe which takes a value (or set of values) as input, performs some mathematical operation and returns transformed values as output. In programming, this means writing what is variously called a function, method or subroutine, depending on the programming language.

I can't tell you exactly how to write your algorithm, but here are some general guidelines.

1. Think in terms of input, process and output. Identify what each of these are. For example, input might be "two sets of xyz coordinates, A and B", process is "translate and rotate A's xyz until it best fits B's xyz" and output is "return the new xyz coordinates and the RMSD".
2. Get your input files in the appropriate format.
3. Do your research. This sounds like an algorithm that has been described before, so locate that description. It could be in a textbook, a published paper or some code on the web. Good web searching skills are important - use appropriate keywords and search terms.
4. Figure out how to turn the algorithm into code. This is the difficult part. Of course, it requires that you understand the programming language (Java in your case) sufficiently well to express "concepts in words/equations" as "operations in code". Make sure that you have a good programming reference manual to hand.
5. Ensure that you know what the expected output should be and write tests to check for it. For example, there are probably many tools to calculate RMSD, so you can figure out the expected value for the proteins that you are using as input.

You can build your confidence by starting small, rather than trying to do everything at once. For example "return the square root of a number" is an algorithm, albeit a very simple one. Try to implement something like that in Java, make sure you understand what is going on and then really, anything more complex is just a case of building on those foundation skills.

5
Entering edit mode
14.0 years ago

The first thing you should know is that you will have to write a lot. I mean, you'll have to take notes on a real paper and with a real pencil, instead of keeping everything on your mind or writing with a computer. I personally like to split an A4 paper into 8 units (making 8 A7 papers) and write every step or concept about the program on a single paper. For example, you can write a general description of the algorithm, the use cases, and a description of every function on each A7 paper. Then, it will be easier to determine the order of each function and create a pipeline, you will have to just re-arrange the order of the A7 papers on a desktop.

Second, before you start writing anything, think about how you are going to validate your results. Think of some general cases of structures that will be easy to impose, and also think about the structures that will give you specific problems. It is important that you define a nice set of test cases before you start programming, because it will make it a lot easier and in the end it will save time.

Third, talk about your algorithm with somebody else. Don't do the "I'm a scientist and I have to do everything by myself" error that many people do. The better you are able to explain your work to somebody else, the easier it will be to make it. You should always take in mind the Bohem's curve, which says that the cost of fixing an error increases exponentially on time, so if you discuss with somebody else about your work and by doing so you discover an error, it will save you time.

4
Entering edit mode
14.0 years ago

Like with any new field, you start by reading up. One possible source of info would be Jmol which has, AFAIK, a method to align two protein structures. Actually, it could be done with the CDK without much trouble too. One possible route to address this would be:

1. find an Open Source tool X that allows you to do protein alignment (e.g. those I mentioned)
2. discover how to perform the alignment with tool X on a use case
3. when you know what part of the tool X does it, look up the source code that implements that functionality
4. use that as starting point to see how tool X implemented the algorithm
5. try to understand the steps and choices and assumptions they made (an implementation is seldomly an exact copy of the algorithm)
7. compare the results with the other implementations
3
Entering edit mode
14.0 years ago
Tom Walsh ▴ 550

One approach to doing this is to use the Kabsch algorithm:

http://en.wikipedia.org/wiki/Kabsch_algorithm

Also take a look at Biojava's SVDSuperimposer class:

http://www.biojava.org/docs/api/org/biojava/bio/structure/SVDSuperimposer.html

1
Entering edit mode
14.0 years ago

There are already excellent answers for the question. Here is my 2 cents.

It always helped me when I prepared a flow-chart before designing a new algorithm. I know its a bit old-school, but it will definitely help you to focus on your problem in a granular fashion in terms of input output options etc. It will be a good to start with a flow-chart based on Neil's answers.

Structural alignment is a mature topic in structural bioinformatics, several algorithms and associated tools are available. As explained by other here, this is a well defined problem and already algorithms are available, but it is important to understand the available literature if you plan to publish a manuscript or plan to develop a better algorithm. It will be good to refer to some of the common and recent structural superposition programs in the Structural Alignment page at Wikipedia. Some of my favorites are STAMP, DALI and MinRMS

1
Entering edit mode
14.0 years ago
User 5267 ▴ 110

This is a universal issue that comes up when learning mathematics or programming. In very simple cases like this, the simple answer is to start from the end and imagine what you would need to easily calculate that, then work backwards. For instance, if you need to calculate the RMSD of two structures, what do you need to make it obvious? How about an array of all the coordinates of each atom in each structure, lined up so each position in each array corresponds to the homologous residue? Then imagine the step back from that which you could calculate easily, and so forth.

This is the simple answer, but there are actually a couple of very good books that try to teach this very issue. The classic is Polya's "How to Solve It" in mathematics, but you'll have to get that from your library. There are two focused on programming, both available for free online. The more recent is How to Design Programs. The older is Thinking Forth. They're in Scheme and Forth, respectively, but both languages are far simpler and more powerful than Java, so shouldn't give you any real trouble. Scheme, in particular, is the lingua franca of computer science, so you need to know it. If Forth is too foreign, there's a prequel, Starting Forth, to help you with that.