Forum:To the Biologists out there: What bioinf tool are you missing?
5
2
Entering edit mode
7.7 years ago
mschmid ▴ 180

I have a degree in Biology and Comp. Science.

Now, for fun and to "learn by doing" I would like to develop a simple to medium complex bioinf tool. (Python, Perl or C)

So if you can tell me what tool you are missing and if I think I am able to hack it together I would like to do such a project.

Let's see what comes up!

education career • 2.8k views
ADD COMMENT
6
Entering edit mode
7.7 years ago

Here are some on my wish list:

  1. A way to subsample SRA data without having to download it all: for example get 1 million random reads from each of the following runs.
  2. A sane replacement for Entrez Direct: http://www.ncbi.nlm.nih.gov/books/NBK179288/ (the current version will ignore invalid parameters and provides little to no help on what the valid values could be)
  3. Simple, standalone command line replacements for R tools like deseq, edgeR that operate directly on files rather than the complex R objects that these tools typically require.
ADD COMMENT
1
Entering edit mode

I was also annoyed by number 3, I wrote an Rscript taking command line arguments (counttable, sample info file) for that purpose, performing sanity check on input data and then differential expression analysis with either DESeq2, edgeR or limma. Writing the results to tab separated files and saving all potential useful plots in image files.

E.g. execute like DEA.R counttable.txt sampleInfoFile.txt deseq2

ADD REPLY
2
Entering edit mode

For additional references, here's our Rscript (part of the whole pipeline) for doing a similar DESeq2 analysis.

ADD REPLY
0
Entering edit mode

Trinity also has a number of handy utilities for a number of things, including differential expression: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Differential-Expression

ADD REPLY
0
Entering edit mode

Hmm looks like there are some parts I can still add to mine!

ADD REPLY
0
Entering edit mode

I'll have to look at yours and see if there are ideas we can borrow from you :)

ADD REPLY
0
Entering edit mode

Should we consider #3 done :-)

Share the code!

ADD REPLY
3
Entering edit mode

I'm pretty new to using git, so I hope I didn't make mistakes and wouldn't mind your feedback. If all went well, my script can be found on https://github.com/wdecoster/DEA.R

Notice it will need some modifications to perfectly match your files... Let me know what you think!

ADD REPLY
1
Entering edit mode

Create a new post for command line scripts for DE :-)

Don't keep this buried in this long thread where it may not be found by others easily.

@Devon can add his version to your new post.

ADD REPLY
0
Entering edit mode

Would you mind sharing that? I've been wanting to write up something similar, but no use re-inventing the wheel if I can modify yours a bit to do the same thing.

EDIT: me and genomax2 wrote this at the exact same time, so jinx you owe me a dr. pepper!

ADD REPLY
0
Entering edit mode

I definitely wouldn't mind sharing (and I will), but that would be the first piece of code I ever shared, makes me nervous ;-) In addition, I also should get some comments/advice from people with more experience in gene expression analysis. I'll put it online and let you know, probably tomorrow.

ADD REPLY
0
Entering edit mode

Simple, standalone command line replacements for R tools like deseq, edgeR that operate directly on files rather than the complex R objects that these tools typically require.

Make that request*2.

ADD REPLY
4
Entering edit mode

"complex R objects" is putting it nicely...

ADD REPLY
1
Entering edit mode

YES. Why does it seem like R projects just assume everyone will be using R. And for things like deseq, the inputs are quite simple - why no command-line love?

ADD REPLY
0
Entering edit mode

and make that request * 3.

ADD REPLY
5
Entering edit mode
7.7 years ago
GenoMax 141k

Why not work with a lab/group locally to see if you are able to find a challenging project (for them, perhaps not for you). Direct interaction would allow you to learn quicker (rather than character limit on a post here) and things can move much faster when everyone is in the same room.

There is plenty of general purpose software out there and it may be difficult to find a project by a solicitation on Biostars. While not new, here is a similar question that was asked on Biostars a while ago: Which New Bioinformatics Related Tool Would You Appreciate The Most?

ADD COMMENT
0
Entering edit mode
7.7 years ago
bbmisraccb ▴ 70

Covert ALL or MOST "R" packages to a good GUI and press-button easy to use, high through put softwares, web servers and softwares that will run on Windows- for biologists. In dire need of those.

Open source statistical packages.

Anything related to Metabolomics data analysis from various mass-spec platforms would be useful.

Connecting metabolites with proteins/genes.

Integration of multiple -omics datasets would be highly in demand too.

Easy, and universal "data imputation" tools for all -omics data sets would be useful.

Thoughts galore, but tools are limited. : (

ADD COMMENT
0
Entering edit mode
7.7 years ago
Farbod ★ 3.4k

Hi, and thank you for your efforts

would you please design a new BLAST algorithm that can do a blasting procedure that takes 10 months on a server, just in one day ?

ADD COMMENT
3
Entering edit mode

What on earth are you blasting?

ADD REPLY
3
Entering edit mode

Every sequence on earth against those from mars :-)

ADD REPLY
2
Entering edit mode

If it is blastx that you want to replace then I have good news, this tool exists: PAUDA, 10,000x faster than blastx,

http://ab.inf.uni-tuebingen.de/software/pauda/

the output is blast compatible. Amusingly it is dubbed as "a poor man's blastx'

ADD REPLY
3
Entering edit mode

DIAMOND is 20000x faster than blast (for short reads), so the authors claim.

ADD REPLY
0
Entering edit mode

Indeed, that looks even better. I need to add this to my list of tools.

ADD REPLY
0
Entering edit mode

Dear Istvan Albert,

can this program be used for running a blastx against NCBI nr local database. too ?

ADD REPLY
1
Entering edit mode

Looks like you would need to build a pauda index with the nr data but in a way, yes.

ADD REPLY
0
Entering edit mode

Actually, you're in luck. Other options exist. See previous discussion here: Faster BLAST alternative

ADD REPLY
0
Entering edit mode
7.7 years ago
Whoknows ▴ 960

Hi,

I think except number of websites like Broad Institute Gene pattern and Galaxy Suite for NGS analysis, there is no complete bioinformatic tools which biologist could perform their analysis without suffering !!

My idea is using bioinformatics tools in an open access web server which all user could perform their analysis on that without any prior knowledge of Linux, programming ....

Currently, many biologists have to learn about NGS analysis which is time consuming, but suppose GEO2R tool everybody could use this tool for Micro array profile analysis without R knowledge. Also, many labs have not access to a appropriate server with enough Hardware resources which is vital for NGS analysis.

So, I think integration of current tools can be a target for those who want to spread bioinformatics era.

Thanks.

ADD COMMENT
1
Entering edit mode

In other words, it'd be nice if someone added more Galaxy tools or workflows, since Galaxy is already fitting everything you've written.

ADD REPLY
0
Entering edit mode

Yes, you are right. But I mean for all aspects of bioinformatics not just NGS, other era like protein analysis, structural analysis and ...

ADD REPLY
0
Entering edit mode

You can do protein analysis in Galaxy, not sure about structural analysis (I've never needed to). It's only a matter of whether someone already has or wants to write a wrapper around a tool or visualization, aside from that the sky is the limit.

ADD REPLY

Login before adding your answer.

Traffic: 1517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6