Like in everything else we do, the 80%-20% rule works great in bioinformatics. What is your 80% of the time wasted on? I can suggest:
- converting accession numbers between databases
- parsing output
Like in everything else we do, the 80%-20% rule works great in bioinformatics. What is your 80% of the time wasted on? I can suggest:
Asking people to clarify their question.
Pointing people towards the free online help/tutorial/documentation that already answers their question.
Arguing with my lab mates about variant calling pipelines.
replacing files, versioning, pruning, waiting for stuff to finish running
I always try to race the program. If I can write a faster version before it finishes running, or add substantial-enough improvements to make the output of the original obsolete, I kill it and start over. That puts a bound on how slow my programs can be - they never take longer to run than they do to write :)
Comparing results from different methods takes a lot of time too.
Installing packages and relevant version of perl and python X.X.X.X.... with out admin privileges :)
Oh, yeah, this is what kills me the most, Our cluster is not connected to internet for security reasons and image for for R or perl when you are installing a package that has a lot of dependencies. Also if one package works only with recent R versions, make me sick :)
https://github.com/pengchy/RScript/blob/master/download_r_packages.R
Maybe this script will help you.
This script is used to 1) download the latest source packages to local; 2) update local source packages; 3) download dependencies of one specified package
Forgetting to use the -w flag in grep, and therefore repeat all the analysis!!!
book keeping, debugging, keeping pace with version changes of the tools
Trouble shooting for several hours when I Install packages which cannot run just because of outdated python or R version (that too without having system admin privilege)
The following is an example of R code. It contains a small error, which when hidden in a >200lines scripts causes an error message that is incomprehensible to most human beings. Can you find it?
myarray = c(1,2,3,4,5,)
Most of my time goes wasted fixing silly R syntax errors like this. I really wish Python had more libraries for working with large datasets and plotting :-/.
Actually I find that R has a wider range of options for reading large files than python. You have the rhdf5 library from bioconductor, the sqldf library that allows you to read csv files to a database instead of keeping them in memory, and much more. I have used with great results PyTables for reading HDF5 in python, but I just find that R has more options.
Rerun the analysis after finding a bug in the scripts, or found a wrong parameter used, or new data input.
Searching for data and their quality check.
Searching for hours and hours just to find out that such data does not exist yet. Or that although in 1000 HGP does exist excel with families, they have not been sequenced yet. Trying to find out if the reads are genomic/RNASeq, who published them, male/female and so on..
Converting between formats.
Installing new software.
Waiting for results. I am no good in doing 5 things at the same time :-)
Figuring out which tool to use. Lately, for analyzing miRNA data. Any pipeline suggestions?
A waste of time comes with a loud indicator: Just see how you feel after an activity. Are you energized or drained by it? Are you happy or resentful for it? You know the feeling when you’ve wasted your time. I find all of these a giant waste of my time, but if you don’t, well that’s why we have the comment section, so tell me what you think.
Making sure all the analyses I rushed off in an excited whirl are properly written up and documented in reports/notebooks where I should have done them in the first place. Making sure all the labels are rotated correctly, the titles in plain english, the color viewable to color blind people etc....
Managing branches across a codebase with 15 collaborators across three forks and tens of branches.
In here, answering others' questions. :)
functional interpretation of protein lists from comparative proteomics studies
Meddling with incorrectly annotated genes/proteins.
Trying to stay organized.
Debugging my scripts because a new dataset broke them in an unanticipated way--and just when I thought they were robust to all errors :(
Figuring out how a dataset file had been created (i.e. "Is this the right file? How was it filtered?").
Reinventing the wheel. Like, all the time. Why use clusterProfiler with an up-to-date GO-annotation, when you can just write your own client for PANTHER-ORA? Oh, PANTHER does not report genesets back. Why not just write a custom GO-slim for semantic extraction of cognate genes? I guess you only learn by making mistakes.
Trying to understand formats that have
=> The day you will be confident how to use it, the format will change (updated) or be deprecated, and you start over
Trying to understand tools that have
=> The day you will be confident how to use it, the tool will change (updated) or be deprecated, and you start over
In the past it was a lot about installing tools and dealing with dependencies to get previous tool still working, compilation, etc. Nowadays this task is a routine, but time to time we still find a tool that is a pain to install.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
"Asking people to clarify their question" This has to be number one for me.
After some more thoughts, the process is more akin to helping them to formulate their own questions...
I spend 80% of my time actually reading documentation. But I'm one of a very small number of people who actually do that I think...
What do you mean? Could you please clarify?