Question

Depth Of Knowledge Required To Become Expert In Ngs Bioinformatics?

4

Entering edit mode

12.1 years ago

Ngsnewbie ▴ 380

I assume that this question would be repeated in this forum, but the objective to put this here is to just find out how deep should I learn stuffs to become an expert in NGS bioinformatics. Bioinformatics is an interdisciplinary subject. People come in this domain are from biological, mathematical or computer science background.

E.g., for linux - should I learn up to basic commands or go up to admin level.

biology - just basic concepts or go into detail.

Is there any need to learn java if you know perl/python?

Database management system - sql queries are sufficient or need to learn schema design, relational algebra etc

and the list grows with R, MATLAB, statistical concepts etc

There are lot of computer engineers, mathematician/statistician, and biologist but what the qualities/skills an expert in bioinformatics carries and up to what level?

next-gen-sequencing • 4.9k views

ADD COMMENT • link updated 14 months ago by Ram 43k • written 12.1 years ago by Ngsnewbie ▴ 380

2

Entering edit mode

biology is the most important thing. If you do not understant the biological question behind the technic, you're worthless. Now good skill in programming (perl, python, java, R,..) is essential for a bioinformatician.

ADD REPLY • link 12.1 years ago by Nicolas Rosewick 10k

Ram · Answer 1 · 2012-03-29

Hi,

I think to reach an expert level to process and analyse NGS data you would need:

Good knowledge of a unix environment. Knowing basic commands might not be enough. Being efficient with command lines tools such as grep, sec, awk,... can really make your life easier for basic operations on huge file.
Biology is very important. The data your are working with are biological data. For the downstream analyses but also the quality checks (you should understand why finding bacterial DNA is not a good thing while sequencing mammalian genomes for example) it is better to know the biological question your project is addressing as well as the biology of the studied organism(s). Extent of biological knowledge will clearly be something that will distinguish an expert from a guy who just run bowtie/TopHat/fastqc/Cufflinks without looking at the data
You should probably be efficient with one scripting language such as Perl, Python or Ruby. Knowledge of a fast language such as C/C++ can sometimes be useful but is not necessary.
To my opinion knowledge in SQL is not a priority. NGS data is rarely stored within a SQL-type database. Nonetheless that can be useful to later store processed data (i.e RPKM/FPKM) or query your data using a web interface for example. It is therefore not mandatory but could potentially be useful.
Statistics are also very important for downstream analyses. Knowing how to accurately analyze your data is of high importance given that you are looking at huge datasets. For some specific projects advanced methods such as data mining for example might be needed. In general, if you a re not a statistician you just learn the method you need once you are face to a problem you can not solve. For this purpose R or Matlab can of course be of great help. For data analyses and quality control also, by the way.
Know the technology. As well as knowing the biological questions that motivated the sequencing experiment, knowing the sequencing technology is of great importance to detect/assess potential bug, biases. For example, I recently has to quantify the contamination of adaptor sequence on a HiSeq2000 library. I thus has to understand in detail how the library was made and sequenced to really address this question.

EDIT: adding the last bullet point about knowledge of the technology + fixed a sentence.

Ram · Answer 2 · 2012-03-29

This question does not have a single answer; it depends. I'd suggest a focus on a specific biological question or set of questions related to a specific dataset. From there, one can potentially answer the question of what needs to be done to answer those specific questions.

To answer your question more directly, bioinformatics is not just "how do we do it", but also includes questions related to the best way to do it (implying algorithm design, statistics), how to do it efficiently (computer science), how we support the infrastructure (IT), and why we do it (biological knowledge, hypothesis generation, and interpretation). In the setting of team science, an effective bioinformatician may need only meet the first criteria (how do we do it), while another bioinformatician may need be a computer scientist and software developer, while another may need all of the above.