Question

Criteria That Befit A Good Bioinformatics Phd Research

8

Entering edit mode

12.8 years ago

Hranjeev ★ 1.5k

I'm in my planning stage for a PhD - would like to get Biostar member's opinion on what are the criteria that makes a good PhD research project. In the university I'm planning to go, they do not seem to be well-developed in the Bioinformatics arena (but studies are fully research-based; no classes). So, I'm pretty much on my own to chalk out the bioinformatics part of my studies.

What are the criteria that befit a good Bioinformatics PhD research?

On a general note, I would like to hear this answered perhaps from a real-world problem or an example of a PhD. I would like to know how it gets broken down from proposal, research all the way to the bioinformatics core. And, what criteria that makes it a PhD rather than a MSc as per your definition.

Any pointers would be helpful too.

phd subjective • 7.7k views

ADD COMMENT • link updated 9.7 years ago by Vanceed ▴ 30 • written 12.8 years ago by Hranjeev ★ 1.5k

1

Entering edit mode

Very tough question - and one with a different answer for each person. You may want to have a look at this 10 simple rules piece advice that touches on this topic.

ADD REPLY • link updated 4.6 years ago by Ram 43k • written 12.8 years ago by Casey Bergman 18k

0

Entering edit mode

i just clicked it and it did not work so i made a direct link in this comment Ten Simple Rules for Graduate Students

ADD REPLY • link 10.9 years ago by Medhat 9.7k

0

Entering edit mode

It will be great to hear the different perspectives and scenarios in an organized way - don't you think. For instance, Will's answer below really showed his vast experience and thoughts in tackling the research projects in his field. Also, it is very interesting to learn how each researchers' approach is different.

ADD REPLY • link 12.8 years ago by Hranjeev ★ 1.5k

score 23 · Answer 1 · 2011-07-06

There are quite a few things which go into creating a PhD level project. i would try to answer these overall questions first since they will dictate things later down the road:

Are you going to be making your own data (sequencing, microarray, etc.)? If you are then plan LOTS of time for this step ... take your most outlandishly long expectaction and then DOUBLE it, that's probably how long it will take to get the data.
Are you going to create a new analysis technique or apply an old (or underused) technique to a new area?
Are you going to do an entire _in silico_ experiment/analysis?
Are you going to create a novel resource (tool or database)?

If you choose option #1 then be prepared to allocate a lot of money, time and resources to gathering data. The advantage here would be that you can control the data quality, source, type, etc. to be exaclty what you need to answer the biological question.

If you choose option #2 (which is what I did btw) then you have an advantage of using public data for your analysis ... which speeds up and cheapens the process. The HUGE disadvantage is that the data you have was not gathered to answer the question you are asking (otherwise the original authors would've published it already). So you have to do lots of fudging, merging, massaging, etc. to get what you want.

If you choose option #3 then you never need to worry about gathering data or analyzing properly (since your simulation by definition gathers the correct data in the correct format). The difficulty comes in convincing other people that your simulation properly reflects the biological problem you're trying to answer. This is often a difficult proposition ... especially if you are in a biology department (if you're in a comp-sci or math dept then this is much easier).

If you choose option #4 then you need to find a niche that hasn't been filled by an existing database/tool. Most of the low hanging fruit has been picked but I'm sure you can still find things to do. This is the option that I know the least about btw.

I've advised (or co-advised) students through options 1, 2, and 3. And each had their ups and downs ... I can't say that any one option is the "easiest" or "hardest".

For real world examples (same categories as above) ... I'm going to propose 4 methods for answering the following biological question:

There is a lot of interest in finding acute diseases which lead to chronic problems ... For example HPV infection leads to cervical cancer, H. pylori infection leads to stomach cancer, and I'm sure there are plenty of others. It would be interesting to find the reason for these links and propose new acute/chronic disease links.

Using method 1: You could take samples from cervical cancer patients with & without a history of HPV. Then perform methialolation studies (or some other study) and look for differences between the two groups. You would have to do an in-depth lit. search to determine which tests would be reasonable (I don't have much knowledge in this area).

Using method 2: It would be interesting to combine datasets from microarray, epidemiology, PPI, methialation studies, etc. A cursory search of GEO shows plenty of microarray data for testing this hypothesis. This would also let you test for new links in a way that method 1 would not allow.

Using method 3: There are numerous _in silico_ tumor generation models. You could modify one of them to incorporate "transformation from infection" and see if the results better match the observed progression. Or you could simulate which mutations are likely to occur from viral infections and how likely they are to induce cancerous cells.

Using method 4: You could build a text-classification system and try to use literature articles to guess links between diseases ... perhaps by checking co-mentions ... this one I'm a little fuzzy on.

Hopefully these give you a rough idea of things that might make good PhD projects. As for size/scope of a project ... In my lab coming up your thesis was three first-author manuscripts stapled together but I think this varies from lab-to-lab and project-to-project.

The other important thing to note is whether you're in a "biology heavy" or "biology light" department. I work for a university where we have a separate bioinformatics team in the Biology, Physics/Math, Information-Systems, and engineering (where I am) departments. The projects/papers/thesis from each of these departments looks VERY different from each other. In the physics department the projects have about 1 paragraph worth of biology and the rest is math/comp-sci. The the Biology department its the reverse. The ISIS group has mostly database style projects where biology is just the "subject" ... the analysis could be applied to virtually anything with very little change in concept. In my department we do about 50/50 between biology/comp-sci in our papers. I suggest you try to get a feel for what your department/advisor is looking for so you know where you need to focus on.

Hope that helps,

Will

PS. Feel free to use the thesis ideas ... although I can't vouch for whether they will be fruitful ;)

score 0 · Answer 2 · 2014-07-30

0

Entering edit mode

9.7 years ago

Vanceed ▴ 30

Great post. Currently in my PhD in Public Health currently in the planning stages of my dissertation work in Bioinformatics. Greatly appreciated!

ADD COMMENT • link 9.7 years ago by Vanceed ▴ 30