Blog: A list of Bioinformatics projects for volunteers
gravatar for Zhilong Jia
3.8 years ago by
Zhilong Jia1.5k
Zhilong Jia1.5k wrote:

Here is a list of Bioinformatics projects for volunteers. I have read some posts that lots of people, interested in computational biology, would like to participate in a bioinformatics project. I assume those wants bioinformatics ideas to start. Comments and answers are welcome,  enabling me to update it irregularly.

# Mainly for people who can program ( R, Python, Java etc.).

Mentored Projects from Bioconductor

ROSALIND  contains bioinformatics problems (By @TriS ) [ref: a biostar post]

DREAM Challenges pose fundamental questions about systems biology and translational medicine.


# Mainly for people in Medicine or Biology

Stargeo aims to annotate disease samples (such as control and a certain disease) from GEO, enable powerful meta-analysis of a certain disease. (Medicine or Biology knowledge is necessary, especially related with diseases)

BD2K-LINCS-DCIC Crowdsourcing Portal includes crowdsourcing projects (lots of microtasks and megatasks) related with drugs, genes and diseases in Library of Integrated Cellular Signatures (LINCS) (mainly) and GEO.



1. Khare, Ritu, et al. "Crowdsourcing in biomedicine: challenges and opportunities."Briefings in bioinformatics (2015): bbv021.


In my opinion, one of the main issues for both sides, except knowledge, is the contribution of continuity. Working few days / weeks and then giving up probably is suitable for crowdsourcing projects to an extent.

Update (24 March, 2016):

open innovation pavilion : One of the focus is on transnational medicine or transnational bioinformatics. has teamed up with InnoCentive to offer its readers the opportunity to participate in research and development challenges. As a Solver, you can apply your expertise to important problems, stretch your creative boundaries, and win cash awards.

Update (4 Jue 2016)

NCI up for a challenge

project blog volunteers • 4.7k views
ADD COMMENTlink modified 3.5 years ago • written 3.8 years ago by Zhilong Jia1.5k

Thank you for this wonderful resources.

ADD REPLYlink written 3.8 years ago by Veera 90

Hello, thanks for this nice post. However, one thing probably worth to say about Stargeo is that they seem to define expression in a very incorrect way (at least in their fundamentals video).

ADD REPLYlink written 3.7 years ago by Anima Mundi2.5k

Thank @Anima. Could you detail the issue? Thank you. Actually I'm involved in this project. I'm not sure what is your point. Each disease signature is a comparison between the disease samples and control samples.

ADD REPLYlink written 3.7 years ago by Zhilong Jia1.5k

Gladly. I specify that this is merely a matter of jargon. Still, I think it is important to use biological terms accurately, in order to avoid confusion.

For instance,

00:18 - "when RNA is used to make protein, it is said to be expressed" 00:13 - "when more protein is made, RNA is said to have a greater expression, and when less protein is made, RNA is said to have a smaller expression"

RNA is not expressed. Genes are, and when their RNA is transcribed, they are said to be expressed regardless them being protein-coding or not. Also, gene expression levels depend solely on the amount of RNA produced.

Another more subtle point is the use you make of the term "expression pattern". You seem to refer to it as if it indicates a particular state of a transcriptome for a given biological sample, but it is actually defined as the spatio-temporal location of the RNA for a given gene through the body of a certain organism. There is no such thing as an expression pattern of a disease.

Of course this is all mean to be constructive criticism, hope it helps!

ADD REPLYlink written 3.7 years ago by Anima Mundi2.5k

"Merely a matter of jargon" indeed. That "RNA is not expressed" is also a matter of jargon. Large intergenic noncoding RNAs (lincRNAs) have been identified. Is that intergenic RNA not expressed by definition? As the concept of biology changes with our increased understanding of the genome, so does the "jargon" used to describe it.

That the term "expression pattern" subtly indicates a "spatio-temporal location of the RNA" is also striking to me. I would argue most don't share in that jargon. Given the overarching context of the Gene Expression Omnibus (GEO) in the video, I think it's quite obvious that "expression pattern" refers to gene expression patterns. See this NCBI tutorial on how to find "expression patterns" in the context of GEO:

In any case, a medical student without any bioinformatics or technical experience made the video. That is the exact audience STARGEO is targeted towards. In context, I think the video is quite clear conveying what we are trying to do. Please let me know if you disagree. But we will tighten our technical prose in the next iteration. Thx. :)

ADD REPLYlink written 3.7 years ago by Dexter Hadley10

Hi Dexter, expression is a prerogative of DNA: once a lincRNA is recognized as of some functional meaning, the genomic region that produces it is a gene by definition; when that lincRNA is found in a cell, it is its DNA template that is being expressed. Talking of RNA expression is at least improper.

You truncated my definition of expression pattern: I defined it as the "spatio-temporal location of the RNA for a given gene through the body of a certain organism", so I was talking in first place of gene expression pattern. This definition (expression pattern being an attribute of genes) was opposed to the use of the term that is made in the video: e.g. 00:46 - "our expression pattern is not always constant; when we get a disease, our expression pattern changes in such a way that is characteristic of that disease".

In my opinion, an inexperienced author is not a good choice for a tutorial, especially when newbies are the target audience: beginners are of course more prone to get confused or mislead, and are understandably less capable of critical analysis of the message provided.

ADD REPLYlink written 3.7 years ago by Anima Mundi2.5k

@Anima Thank you. Based on the central dogma proposed by Crick, I think you are precise at some points. RNAs are transcribed from genes, while protein is translated from mRNA.

Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product, either RNA or Protein (from wikipedia), also pointed by @Dexter below. Generally, gene expression levels means the amount of expressed RNAs measured by Microarray or RNA-Seq (as you pointed) partially due to the poor availability of proteomics data.

Also, the mistaken is related with the video, but not the analyses themselves in stargeo. Our team will make an improved video to clearly explain what does Stargeo do based on this discussion. Thank you for your comments.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Zhilong Jia1.5k

While gene expression might be finalized to the production of a polypeptide, I think you would not disagree on the fact that it pivots on the process of transcription; in fact, as you say, gene expression levels are purely a measure of RNA quantities.

I am happy this discussion was helpful and I wish you and Dexter all the best for the future of Stargeo ;).

ADD REPLYlink written 3.7 years ago by Anima Mundi2.5k
gravatar for Charles Plessy
3.8 years ago by
Charles Plessy2.7k
Charles Plessy2.7k wrote:

Everybody is welcome to join the Debian Med prroject !  We package bioinformatics tools in Debian, but we also increasingly work on metadata, regression tests, etc.

ADD COMMENTlink written 3.8 years ago by Charles Plessy2.7k
gravatar for roma
3.5 years ago by
roma120 wrote:

This list is a great idea, but I wish there were more projects.

For Bioconductor mentored projects, the link does not work for me (404); this one works:; however, I am not sure if the list is still relevant. E.g. one of the projects listed there is marked with

  • Status: imminent (January 2013)

From what I understand, Rosalind is a collection of learning exercises, as opposed to something a volunteer could contribute to.

DREAM Challenges looks very interesting, though.

ADD COMMENTlink written 3.5 years ago by roma120

I'm not sure what's the status of the Bioconductor mentored projects now. The link is update now. For people wanting to join in bioinformatics, Rosalind is a good project. Thank you.

ADD REPLYlink written 3.5 years ago by Zhilong Jia1.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1186 users visited in the last hour