Forum: 1st-year grad student, plant biology lab, new to bioinformatics -- where to begin?
gravatar for mkhoury
10 months ago by
mkhoury0 wrote:

Hi all,

I'm a first-year graduate finishing up a rotation in a plant biology lab and pretty decided about joining this lab for my thesis work. I'm really interested in getting involved in bioinformatics. I did a little bit when I was in undergrad about three years ago, but not using the knowledge I acquired from that lab, I lost a lot of my chops. Anyways, this is my first time working with plants (Arabidopsis thaliana to be exact), so I'm not too familiar with the bioinformatics tools in my field. Any suggestions for how to get started? I've recently installed Python and PyCharm and started learning some basic commands, but I really want to practice using some real data (and if it's from my lab that would be even better). The problem is, I don't even know where to start. I've heard from other lab members about microarray datasets that are available, but they're always in the form of Excel spreadsheets, which indicates to me they've been processed. One of the differentially expressed gene datasets, for example, is a huge table with signal log ratios (apparently this is easier to convey gene expression levels). So how do I get raw data? And any advice on how to begin askinq questions about my field? One of the questions I'm interested in digging deeper with is finding which transcription factors are upstream of a particular transcription factor I'm focusing on now (called nam). I'm not even sure if this is a good question! But I figured I'd get started somewhere.

Thank you to those who took the time to read this post. I'm excited to hear your advice and get started on my journey to becoming a bioinformatician!

ADD COMMENTlink modified 10 months ago by Michael Dondrup48k • written 10 months ago by mkhoury0
gravatar for khorms
10 months ago by
khorms210 wrote:

There are a number of resources that you can use depending on what do you want to do.

First, if you just want to get more comfortable with coding, there are a number of websites with exercises, such as rosalind and leetcode

Then, if you want to get your hands dirty with real data, I think the best thing for a beginner is to use a large dataset that has been pre-processed in a reproducible way, start from raw data and try to reproduce the main conclusions yourself. Depending on the field you are working in you might pick different data types. However, if you don't have a preference, I would suggest you could start with RNAseq analysis since it's a pretty standardized field with several programs that everyone uses. Some other fields (like somatic mutation analysis for example) are far less standardized so they will be harder to learn at first. This is a great review of standard practices for RNAseq analysis.

An example of such resource with lots of pre-processed data is ARCHS4 (link, paper) - they scraped lots of published RNA-seq from the Gene Expression Omnibus (GEO) and processed them. So what you do is you pick your favorite gene/organism, find a dataset related to it in ARCHS4, go to the original paper, download raw data from GEO, and try to reproduce the conclusions from the paper yourself. This way you will quickly get a sense of how the commonly used tools work. You can also start from Xena or a number of other resources.

If you get lost along the way, there are a number of classic tutorials that you can refer to, for example Michael Love's one

ADD COMMENTlink modified 10 months ago • written 10 months ago by khorms210
gravatar for Mensur Dlakic
10 months ago by
Mensur Dlakic9.0k
Mensur Dlakic9.0k wrote:

There are all kinds of tutorials out there. My suggestion is to start with peer-reviewed protocols, where everything is explained in a way that is approved by reviewers. I like Nature Protocols, and you may find interesting content in Nature Methods.

For example, this is what to enter in PubMed to search for rna-seq in Nature Protocols:

nat protoc[jo] rna-seq

This is search result. Enter any other keyword or phrase instead of rna-seq, such as CRISPR, to search for other things. These articles have literally all the steps required to reproduce an experiment, starting from where to download the data, how to unpack and process it, and what commands to enter. Once you learn how to use a tool on a general dataset, it should not be a problem to find and analyze the data that is specific to your area of research.

Current Protocols is yet another option.

ADD COMMENTlink written 10 months ago by Mensur Dlakic9.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1748 users visited in the last hour