There are quite a few things which go into creating a PhD level project. i would try to answer these overall questions first since they will dictate things later down the road:
Are you going to be making your own data (sequencing, microarray, etc.)? If you are then plan LOTS of time for this step ... take your most outlandishly long expectaction and then DOUBLE it, that's probably how long it will take to get the data.
Are you going to create a new analysis technique or apply an old (or underused) technique to a new area?
Are you going to do an entire _in silico_ experiment/analysis?
Are you going to create a novel resource (tool or database)?
If you choose option #1 then be prepared to allocate a lot of money, time and resources to gathering data. The advantage here would be that you can control the data quality, source, type, etc. to be exaclty what you need to answer the biological question.
If you choose option #2 (which is what I did btw) then you have an advantage of using public data for your analysis ... which speeds up and cheapens the process. The HUGE disadvantage is that the data you have was not gathered to answer the question you are asking (otherwise the original authors would've published it already). So you have to do lots of fudging, merging, massaging, etc. to get what you want.
If you choose option #3 then you never need to worry about gathering data or analyzing properly (since your simulation by definition gathers the correct data in the correct format). The difficulty comes in convincing other people that your simulation properly reflects the biological problem you're trying to answer. This is often a difficult proposition ... especially if you are in a biology department (if you're in a comp-sci or math dept then this is much easier).
If you choose option #4 then you need to find a niche that hasn't been filled by an existing database/tool. Most of the low hanging fruit has been picked but I'm sure you can still find things to do. This is the option that I know the least about btw.
I've advised (or co-advised) students through options 1, 2, and 3. And each had their ups and downs ... I can't say that any one option is the "easiest" or "hardest".
For real world examples (same categories as above) ... I'm going to propose 4 methods for answering the following biological question:
There is a lot of interest in finding acute diseases which lead to chronic problems ... For example HPV infection leads to cervical cancer, H. pylori infection leads to stomach cancer, and I'm sure there are plenty of others. It would be interesting to find the reason for these links and propose new acute/chronic disease links.
Using method 1: You could take samples from cervical cancer patients with & without a history of HPV. Then perform methialolation studies (or some other study) and look for differences between the two groups. You would have to do an in-depth lit. search to determine which tests would be reasonable (I don't have much knowledge in this area).
Using method 2: It would be interesting to combine datasets from microarray, epidemiology, PPI, methialation studies, etc. A cursory search of GEO shows plenty of microarray data for testing this hypothesis. This would also let you test for new links in a way that method 1 would not allow.
Using method 3: There are numerous _in silico_ tumor generation models. You could modify one of them to incorporate "transformation from infection" and see if the results better match the observed progression. Or you could simulate which mutations are likely to occur from viral infections and how likely they are to induce cancerous cells.
Using method 4: You could build a text-classification system and try to use literature articles to guess links between diseases ... perhaps by checking co-mentions ... this one I'm a little fuzzy on.
Hopefully these give you a rough idea of things that might make good PhD projects. As for size/scope of a project ... In my lab coming up your thesis was three first-author manuscripts stapled together but I think this varies from lab-to-lab and project-to-project.
The other important thing to note is whether you're in a "biology heavy" or "biology light" department. I work for a university where we have a separate bioinformatics team in the Biology, Physics/Math, Information-Systems, and engineering (where I am) departments. The projects/papers/thesis from each of these departments looks VERY different from each other. In the physics department the projects have about 1 paragraph worth of biology and the rest is math/comp-sci. The the Biology department its the reverse. The ISIS group has mostly database style projects where biology is just the "subject" ... the analysis could be applied to virtually anything with very little change in concept. In my department we do about 50/50 between biology/comp-sci in our papers. I suggest you try to get a feel for what your department/advisor is looking for so you know where you need to focus on.
Hope that helps,
PS. Feel free to use the thesis ideas ... although I can't vouch for whether they will be fruitful ;)