Question

Forum:Why Are There No Proteomics Questions In Biostar?

14

Entering edit mode

13.7 years ago

Jdnavarro ▴ 410

I've been following BioStar for a while now and I've noticed the conspicuous lack of any proteomics questions.

For many years I've heard within the field that the main bottleneck of proteomics progress is informatics. But in spite of proteomics being heavily funded why this lack of interest in proteomics.

Potential factors that can contribute:

(I'm not affirming all these factors are happening, I'm just giving ideas)

Proteomics informaticians are not rewarded by usability of their software but only by publications. Other bioinformatics fields went through this step. Usability of proteomics software is still not important. There is no interest in real software only proof-of-concepts.
There are many proteomics informaticians but they don't have the inclination to share knowledge because it took too long to learn the idiosyncrasies and want to keep their exclusivity.
Proteomics data is inherently much more complex than genomics/pathway data. The reward of mastering proteomics data doesn't pay off. Very few bioinformaticians want to get into it.
Current proteomics software is awful. It's impossible to innovate using the current software as a foundation. PIs/experimentalists/employers always demand to build software on top of this crappy software, there is no
Mass Spec vendors with their propietary formats make impossible to build something on top of their software.
There are no proteomics informaticians because employers don't know how to recruit and manage bioinformaticians.
Proteomics is highly politicized. The moment an external bioinformaticians shows an alternative approach he/she is driven off by the proteomics community who sees the newcomer as a threat. All the funding goes to people who maintain the status quo.
There are many proteomics informaticians but they are too stressed to spend time on BioStar. If they are too stressed is because there is too much work for very few people. So this is not a real factor, why there are so few.

Have you experienced any of these factors? Why there is no interest in proteomics informatics? How do you see proteomics informatics from outside?

Update

Daniel Standage pointed out the lack of public accessible data compared to genomics.

proteomics biostars • 7.5k views

ADD COMMENT • link updated 17 months ago by Ram 44k • written 13.7 years ago by Jdnavarro ▴ 410

0

Entering edit mode

I asked a proteomics question regarding proprietary vendor formats just the other day. I would not assume that lack of questions here equates to lack of interest elsewhere. There could be many reasons why we see less proteomics queries.

ADD REPLY • link 13.7 years ago by Neilfws 49k

0

Entering edit mode

I think you have posed and answered the question simultaneously. I certainly see many of the bullet points you have raised.

ADD REPLY • link 13.7 years ago by Alastair Kerr 5.3k

0

Entering edit mode

I only see 2 questions under proteomics tag, can't find yours... I know there can be many reasons but it puzzles me why in genomics you get sites like BioStar whereas in proteomics you can't find anything like that.

ADD REPLY • link 13.7 years ago by Jdnavarro ▴ 410

0

Entering edit mode

Fantastic summary. I do experience some of the factors, not all of them. That is however what I love about this field, there is so much work to be done, so many low-hanging fruits from the IT perspective.

ADD REPLY • link 13.1 years ago by Roman Zenka ▴ 10

Ram · Answer 1 · 2010-11-11

A full answer would require too long a post, but a few points:

The mass-spectrometry based proteomics market is relatively small, and proprietary formats have hindered development of non-instrument specific software. But this is changing - the folks at ProteoWizard have a tool to convert almost any proprietary format to one of the open formats (as long as you are doing it on a computer with the vendor software installed).

Industry jumped into proteomics around 2000 and it was too early - the methods, instruments and software were not ready. But they are taking another look now, this will help to drive proteomic software development.

There is excellent, open source proteomics software, TheGPM being one of the better examples, as well as the afore-mentioned ProteoWizard.

There is a LOT of freely available proteomics data out there. Some 13+ TB of raw and processed data on Tranche (some is protected, but increasingly it is open). And a trove on GPMdb.org made very useful with some custom tools.

Also, there have not been any major proteomics characterization efforts. But this, too is changing: http://grants.nih.gov/grants/guide/rfa-files/RFA-CA-10-016.html This effort will generate a large amount of data.

Overall, proteomics is still at a relatively early stage, maybe circa 1990-95 compared to genomics.

score 7 · Answer 2 · 2010-11-11

Getting started in proteomics is hard compared to genomics. I started out in LC-MS, then LC-MS/MS and ended up in RNA-Seq for next-gen sequencers. Understanding genomic and transcriptomic studies does not require much understanding of chemistry. You need to understand cellular processes to interpret experiments, and a little probability/statistics for sequence analysis, and then you're started in the field. Getting good requires more knowledge, but it's pretty accessible to a newbie to start messing around with.

Proteomics is tougher because the data are much more dependent on chemical phenomena. There is the chromatographic separation that has to be understood. In particular for MS/MS the fragmentation is very much a product of molecular orbital bond dissociation energies, which gets into some pretty awful quantum physics, physical chemistry and statistical mechanics. There are ways of abstracting this into something easier to compute (which is what everyone in proteomics has done), but that abstraction introduces some errors and bias that propagate through all the downstream analysis. The equivalent steps in genomics seem to be much better understood and most of the bias removed (though it still shows up in the error rates of next-gen sequencers).

Overall, proteomics is getting better and becoming more accessible as some of these core problems are worked through. It would be nice to see more community building in proteomics. I wonder how much of this, though, comes from the fact that most of what's on BioStar is related to genomics/transcriptomics, so proteomics people don't stumble across it, so they don't post to it, so they don't realize it's an option. I wouldn't be here if it weren't for my cross-over work in transcriptomics. Just my 2 cents.

score 6 · Answer 3 · 2010-11-11

6

Entering edit mode

13.7 years ago

User 59 13k

I know my colleague Simon would normally answer this, as he was recruited to our institution specifically to deal with proteomics and protein informatics. The simple fact of that matter is that certainly at our place of work, very little high-throughput work is being done. Dealing with piecemeal protein informatics is within the grasp of most bioinformaticians, but I think the number of people producing and needing to analyse vast quantities of high-throughput proteomics data is vanishingly small, compared to sequence based genomics, or even microarray array based transcriptomics.

ADD COMMENT • link 13.7 years ago by User 59 13k

0

Entering edit mode

I think nobody in bioinformatics expects to find proteomics informatics currently trained in the market. AFAIK most proteomics labs hire bioinformaticians that will be trained in proteomics data. But still there are not that many proteomics informaticians.

Do you think it's a problem of recruitment or that non-proteomics bioinformaticians are not interested in getting trained in proteomics?

ADD REPLY • link 13.7 years ago by Jdnavarro ▴ 410

score 5 · Answer 4 · 2010-11-11

5

Entering edit mode

13.7 years ago

Daniel Standage 4.1k

Interest may be a part of the issue, but it's not the heart of the issue in my opinion. One of the reasons bioinformatics is so badly needed is because it's not just J. Craig Venter who is creating "genomics"-scale data nowadays. Any Dr. Joe Schmoe with a couple thousand dollars can get his hands on gigabytes of raw sequence data, thanks to Illumina, Roche/454, et al. I think this is great, but it has had a huge impact on the approach many biologists take to their research.

Advances in high-throughput proteomics have not been able to keep up with their nucleotide counterparts. There are probably many reasons for this, and we need not assume that lack of interest is the primary one. But, as Giovanni said, it's a vicious cycle. People have more nucleotide data, so they spend more time developing nucleotide informatics, so industry focuses more on nucleotide-based omics platforms, so we get even more/better nucleotide data, and so on and so forth.

ADD COMMENT • link 13.7 years ago by Daniel Standage 4.1k

0

Entering edit mode

I take this answer as the lack of public accessible data. I forgot to add that factor

ADD REPLY • link 13.7 years ago by Jdnavarro ▴ 410

0

Entering edit mode

I think there is a lot of data in the field but there is reluctance to share the data. Why is that is another story.

ADD REPLY • link 13.7 years ago by Jdnavarro ▴ 410

0

Entering edit mode

I'm not sure it is a lack of shared data! I don't even think it is a reluctance to share data, but rather is more about what data do we share? From that aspect, the nucleotide side of things is way ahead of the proteomics field.

ADD REPLY • link 13.7 years ago by Julian ▴ 200

0

Entering edit mode

Not to mention the proprietary formats and the messy open ones.

ADD REPLY • link 13.7 years ago by Paulo Nuin ★ 3.7k

score 4 · Answer 5 · 2010-11-11

I think D Swan is spot on, there just isn't that much ongoing work at most institutions. I work on proteomics, sit right next to two groups that do large scale experiments, and we still only spend 1/4 of our effort on proteomics. The rest is spent on genomics and integration of past proteomics experiments with them.

score 3 · Answer 6 · 2010-11-11

For my experience, people working on proteomics are more ofter computer scientists than researchers with a biological background. A few years ago I gave a talk on bioinformatics to a Python Programming conference, and the majority of the people who approached me were computer scientists working on proteomics. Maybe this is because what you say in your post, that proteomics require better programming skills, and the biological background may be less important when developing a software to read the results of a Protein Fingerprinting. So, it may be that this website is frequented more by bioinformaticians with background in biology.

Moreover, it is a vicious cycle.

Some time ago I was the moderator of an italian web-forum on Science. Until a certain point, we had very few questions about cancer therapies; however, a certain day some people asked a few questions about that. The way our webforum was indexed by google changed drasticly, in a few days: we were indexed first for 'cancer' in italian, all the google/ads changed and started announcing therapies for cancer, and a lot of new users came asking for the same topic. After some discussion, since none of the moderators was a medic and we were not able to judge the quality of the answers given (beware, on Internet there are a lot of people selling false therapies), we had to forbid any direct medical question in the forum, and after a week, the situation came back to how it was before.

So there are few people working on proteomics who are aware of this website, while this website is frequented by a lot of people experts in other fields; therefore there are few questions on proteomics here. Then, google and the other search engines have not given an high score to this website for the proteomics-related queries, so few people interested in proteomics see this website; and so on.

If you want more proteomics-related questions here, just ask a few ones yourself, using a correct title and hope that it gets indexed by the search engines correctly. It may work very quickly.

Ram · Answer 7 · 2010-11-11

1

Entering edit mode

13.7 years ago

Bio_X2Y ★ 4.4k

I think a lot of questions that people have in relation to proteomics software have already been addressed on other sites.

For example, the Seattle Proteome Center has a google group for the discussion of their tools. It currently has approx. 9000 threads, so is a very good resource for asking proteomics questions. Many questions aren't specifically related to the SPC tools - it's basically a place where the whole proteomics software ecosystem is discussed.

ADD COMMENT • link updated 4.9 years ago by Ram 44k • written 13.7 years ago by Bio_X2Y ★ 4.4k

0

Entering edit mode

I used to follow that group some time ago. I found it too specific of TPP but will give it a try again. Thanks for reminding me that one..

ADD REPLY • link 13.7 years ago by Jdnavarro ▴ 410

Ram · Answer 8 · 2014-12-09

There is a lack of a benchmarking community to compare alternative methods using a single dataset. For DNA and RNA analysis there are projects such as SEQC and DREAM Challenges. It's also hard to develop improved methods when there are few details of how the exiting methods work, such as ProteinPilot and Mascot. The only way is to demonstrate improved identification and quantitation, which requires a dataset based on known proteins and dilutions of them. That's something which the proteomics community never generated.

There are also no good review articles to describe the statistical and computational challenges. A search of PubMed for proteomics with filtering for review articles shows plenty of reviews from the chemistry and biophysics fields, but almost none from bioinformatics.