Question

Forum:What Are The Most Common Stupid Mistakes In Bioinformatics?

146

Entering edit mode

14.3 years ago

Jeremy Leipzig 23k

While I of course never have stupid mistakes...ahem...I have many "friends" who:

forget to check both strands
generate random genomic sites without avoiding masked (NNN) gaps
confuse genome freezes and even species

but I'm sure there are some other very common pitfalls that are unique to bioinformatics programming. What are your favorites?

software • 81k views

ADD COMMENT • link updated 2.1 years ago by Raony Guimarães ★ 1.5k • written 14.3 years ago by Jeremy Leipzig 23k

24

Entering edit mode

Staying in the office all day

ADD REPLY • link 10.0 years ago by russhh 5.8k

8

Entering edit mode

good way to boost reputation.

ADD REPLY • link 13.2 years ago by Raygozak ★ 1.4k

3

Entering edit mode

meta: should this Q be community-wiki?

ADD REPLY • link 14.3 years ago by Chris Miller 22k

1

Entering edit mode

what's the mean by said generate random genomic sites without avoiding masked (NNN) gaps? more detailed? do not understand.

ADD REPLY • link 13.1 years ago by Zhilong Jia ★ 2.2k

1

Entering edit mode

see this?

http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr1:1,10000

you wouldn't want to sample from it

ADD REPLY • link 13.1 years ago by Jeremy Leipzig 23k

Ram · Answer 1 · 2011-04-04

110

Entering edit mode

14.3 years ago

biobot 0.0.77.a.1099 6.2k

Invent a new weakly defined, internally redundant, ambiguous, bulky fruit salad of a data format. Again.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by biobot 0.0.77.a.1099 6.2k

15

Entering edit mode

This still makes me laugh every time I read it...and cry a little inside too.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by Daniel Standage 4.1k

0

Entering edit mode

I 2nd that-- Still makes me laugh every time I read it!

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 10.8 years ago by madkitty ▴ 690

13

Entering edit mode

awesome: "bulky fruit salad of a data format".

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by brentp 24k

1

Entering edit mode

It's funny because it's true :D

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 9.7 years ago by ebrown1955 ▴ 320

0

Entering edit mode

We should work on a standard .. I keep saying that but no one would hear me anyway

Laughs & Tears ~!

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by madkitty ▴ 690

10

Entering edit mode

http://xkcd.com/927/

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by Daniel Standage 4.1k

2

Entering edit mode

It made me laugh when I read the MIQE guidelines not long after this comic came out:

The nomenclature describing the fractional PCR cycle used for quantification is inconsistent, with threshold cycle (Ct), crossing point (Cp),and take-off point (TOP) currently used in the literature ... we propose the use of quantification cycle (Cq)

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.1 years ago by dominic ▴ 20

Ram · Answer 2 · 2011-04-04

78

Entering edit mode

14.3 years ago

brentp 24k

I truncated many fasta files this way when trying to see which headers it contained:

grep > some.fasta

I also see a lot of off-by-one errors due to switching between formats

Bed is 0 based
GFF/GTF are 1-based

and switching between languages:

Python and nearly every other modern language are 0-based indexing
R is 1-based (as is Lua)

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by brentp 24k

9

Entering edit mode

Not only is the bed format 0-based, it's also "half-open", meaning the start position is inclusive, but the end position is not.

So if your region starts at position 100 and ends at 101 using standard 1-based coordinates with both start and end inclusive (ie it's two bases long), when you convert it to 0-based half-open coords for bed format the region now starts at 99 but it still ends at 101!

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Nina ▴ 400

0

Entering edit mode

Yeah, the people who conceived of that were utterly brilliant usability specialists.

ADD REPLY • link 5.1 years ago by colindaven 7.7k

0

Entering edit mode

YES, THAT IS an interesting feature when, as a bioinformatican, you're working with C-arrays , you want to define an empty interval, an insertion point, etc..

ADD REPLY • link 5.1 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

You should also have told the correct command:

grep '>' some.fasta

And yes, of course, I did this before as well :D

ADD REPLY • link 2.1 years ago by Raony Guimarães ★ 1.5k

Ram · Answer 3 · 2011-04-04

54

Entering edit mode

14.3 years ago

Fred Fleche 4.3k

Gene annotation stored in an excel file and find out that some HUGO gene names have been hacked by Excel. SEPT9 become sept-9. Conclusion Do not use the .xls format to store your data.

Listen people saying this eternal mistake "Hey these two sequences are 50% homologs"

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Fred Fleche 4.3k

11

Entering edit mode

http://www.biomedcentral.com/1471-2105/5/80

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Simon Cockell 7.4k

3

Entering edit mode

This is a popular one, dec1 is another well known example. But you can actually tell Excel not to do that auto correction. Since you most often get the data from biologist who may have treated the data in Excel already better use an another ID column and not the gene name column if that is available (you often receive both anyway). These errors can even occur in databases that you download data from or which are used for annotation, so it is good to check.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Chris Evelo 10k

2

Entering edit mode

"Hey these two sequences are 50% homologs"... I know. Whereas those are 45% homologs only ;-)

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 12.6 years ago by Manu Prestat 4.1k

1

Entering edit mode

The MARCH genes have tripped me up in the past. Using Excel/Calc etc is fine as long as gene name column is set to 'text' during import.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.2 years ago by Ian 6.1k

0

Entering edit mode

@Simon Thanks for the publication

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Fred Fleche 4.3k

0

Entering edit mode

Uh oh. Looks like something was lost here in the migration.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by Daniel Standage 4.1k

Ram · Answer 4 · 2011-04-04

48

Entering edit mode

14.3 years ago

Casbon ★ 3.3k

Reminds me of that old joke...

There are only two hard things in computer science: naming things, cache invalidation and off-by-one errors

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Casbon ★ 3.3k

Ram · Answer 5 · 2011-04-04

38

Entering edit mode

14.3 years ago

Jeremy Leipzig 23k

I feel like a lot of "stupid mistakes" revolve around betrayed trust and false assumptions

For example:

Trusting that a downloaded file is actually fully downloaded
Trusting that an aligner will accept a list of query files instead of just taking the first and ignoring the rest (quiz: which ones am I talking about?)
Assuming that the quality scores in a FASTQ file are from a great Sanger-encoded run instead of a very poor Illumina-1.3 run
Assuming chr1 is followed by chr2 not chr10

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Jeremy Leipzig 23k

11

Entering edit mode

I like the last item. :)

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.1 years ago by Yuri ★ 1.7k

0

Entering edit mode

Re #2 - I think I couldn't make Trimmomatic to take all adapters I gave him for removal from my fastq files. I then switched to bbduk (great tool!) to remove my adapters, never skipped anything in my list.

ADD REPLY • link 2.6 years ago by jena ▴ 330

Ram · Answer 6 · 2011-04-06

35

Entering edit mode

14.3 years ago

Ketil 4.2k

If you forgive an attempt to be somewhat provocative, my two favorite mistakes are:

Letting academics build software

Academics are in the need to publish papers, and one easy way to do that is to implement an algorithm, demonstrate that it works (more or less), and type it up in a manuscript. BT,DT. But robust and useful software requires a bit more than that, as evidenced by the sad state of affairs in typical bioinformatics software (I think I've managed to crash every de novo assembler I've tried, for instance. Not to mention countless hours spent trying - often in vain - to get software to compile and run). Unfortunately, you don't get a lot of academic credit for improved installation proceedures, testing, software manuals, or especially, debugging of complicated errors. Much better and productive to move on to the next publishable implementation.
Letting academics build infrastructure

Same argument as above, really. Academics are eager to apply to construct research infrastructures, but of course they aren't all that interested in doing old and boring stuff. So although today's needs might be satisfied by a $300 FTP server, they will usually start conjecturing about tomorrow's needs instead, and embark on ambitious, blue sky stuff that might result in papers, but not in actually useful tools. And even if you get a useful database or web application up and running (and published), there is little incentive to update or improve it, and it is usually left to bitrot, while the authors go off in search of the next publication.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Ketil 4.2k

13

Entering edit mode

Yeah I don't know what why it is so hard for me to remember all the great bioinformatics software that has come from industry, like uhh Eland, or the great standards that have come from industry, like Phred-64 FASTQ.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Jeremy Leipzig 23k

6

Entering edit mode

To be clear, it's not a problem with academics themselves (after all, I'm one), just that the incentives are all wrong...

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Ketil 4.2k

5

Entering edit mode

I am fine with point 2, but I have to disagree with 1. Your de novo assembler example is actually not a good one. De novo assembly is very complicated and highly data dependent. I doubt any assemblers work for any data sets, no matter developed by academia or by professional programmers.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by lh3 33k

2

Entering edit mode

Out of the (relatively few) tools I have experience with, bowtie/tophat/cufflinks and also fastqc are the exceptions in terms of documentation, UI, maintenance, non-brittleness.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by bw. ▴ 260

1

Entering edit mode

I always wonder if they have ever check the program/code that come with paper. In one paper, they hardcode the input file in code, make me waste a whole afternoon to figure out what's the hell wrong with it.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.2 years ago by Tg ▴ 320

0

Entering edit mode

@Jeremy: I'm not so sure industry is much better, and it's possible that academia is the democracy of development - worst, except for the others. Also, a lot of industry software are add-ons, designed to sell something else. FWIW, Newbler seems to be one of the better assemblers out there, and CLC is at least half-decent as an analysis platform for non-informaticians.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.2 years ago by Ketil 4.2k

0

Entering edit mode

What about citations of your paper describing the software? The more people are able to use your software, the more cited it will be. That sounds like an incentive to me, and I try to see it that way when I write code (but I don't publish much code, it's mostly quite ad-hoc stuff).

ADD REPLY • link 2.6 years ago by jena ▴ 330

0

Entering edit mode

What are you talking about/responding to?

ADD REPLY • link 2.6 years ago by Ram 45k

0

Entering edit mode

To the answer by Ketil?

ADD REPLY • link 2.6 years ago by jena ▴ 330

0

Entering edit mode

What point of their answer specifically? Your post is tangential to their post at best and an irrelevant take at worst.

ADD REPLY • link 2.6 years ago by Ram 45k

0

Entering edit mode

Ketil repeatedly says in the post that academics don't have incentive to improve their tools, as it's more beneficial to move on to other publications. I say the incentive are citations of your method. Not to mention I regularly see academic software where they care about the things he says academics don't care about. So why would somebody say there are no incentives for improvements, when I can clearly see them and point them out to you?

ADD REPLY • link 2.6 years ago by jena ▴ 330

0

Entering edit mode

You do have a point. However, citations are quite rare. One common problem with bioinformatics tools is that people use but seldom cite them properly. Imagine something like bioawk - it would be day to day usage for quite a few people, but it's not seen as something cite-worthy. Good tools have this habit of melding into the background that they don't stand out. Journals are changing this practice now though, with initiatives like STAR methods.

ADD REPLY • link 2.6 years ago by Ram 45k

0

Entering edit mode

The problem with citing bioawk is that there is no publication to cite. Other tools may not have such problem. But it is true that even I fail to cite some packages I use, usually those that are not required to replicate some results, but are rather just a convenience.

STAR methods looks like a good initiative, too bad I will never send anything to Cell Press, on principle. They care about money above everything else - they are the only publisher that asked me to pay for a COVID-19 paper, at the peak of the pandemic no less.

ADD REPLY • link 2.5 years ago by jena ▴ 330

Ram · Answer 7 · 2011-04-05

off-by-one errors
regex errors
parsing a complex alignment/file format incorrectly (e.g. BLAST or GenBank, probably the original rationale for developing BioPerl)
failing to account for strand
failing to revcomp sequences
failing to account for the last element in a file (because of a improper loop condition or no EOL character on last line)
failing to account for OS dependent line breaks
using the wrong assembly/annotation/release
using the wrong genome coordinate system
using the wrong file (multiple versions, version skew)
failing to account for nested/intercalated annotation features (e.g. genes)
assuming all jobs have completed on a cluster
deleting files
not randomizing your data properly
improper use of statistical tests
not documenting methods fully (to check and correct all of the above)

Ram · Answer 8 · 2011-06-09

28

Entering edit mode

14.1 years ago

Zhaorong ★ 1.4k

One mistake not unique to bioinformatics is: while editing one source file, compile and run another file.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.1 years ago by Zhaorong ★ 1.4k

9

Entering edit mode

;) then hours debugging the wrong file

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.1 years ago by toni ★ 2.2k

3

Entering edit mode

ouch... brings back bad memories...

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.1 years ago by kajendiran56 ▴ 120

1

Entering edit mode

oops.. I did that several times.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 10.8 years ago by bagdevi.mishra ▴ 110

Ram · Answer 9 · 2011-06-12

27

Entering edit mode

14.1 years ago

Shellfishgene ▴ 310

Using grep to find sequence (or other) IDs without using the -w switch: grep 'seq12' will also find seq121, seq122 and so on.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.1 years ago by Shellfishgene ▴ 310

4

Entering edit mode

Yea, until you make the assumption that -w actually works only on whitespace.

printf "foo-choo" | grep -Fw -e "foo"

returns foo-choo. Hate that.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 12.2 years ago by sjneph ▴ 690

0

Entering edit mode

indeed helpful

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 9.8 years ago by Gjain 5.8k

2

Entering edit mode

This tip is helpful indeed!

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.0 years ago by Zhaorong ★ 1.4k

Ram · Answer 10 · 2011-04-04

26

Entering edit mode

14.3 years ago

Istvan Albert 102k

IMHO being off by one is the emperor of all bioinformatics mistakes - it rules them all - and probably causes tens of millions of dollars in wasted effort

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Istvan Albert 102k

Ram · Answer 11 · 2011-04-04

26

Entering edit mode

14.3 years ago

Pierre Lindenbaum 166k

trying to solve any problem with BioPerl :-)

but the

'+1' error
and the grep > some.fasta

are my favorite mistakes.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Pierre Lindenbaum 166k

12

Entering edit mode

s/Bioperl/perl/ ;) - haters gonna hate!

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Casbon ★ 3.3k

1

Entering edit mode

why do you hate BioPerl :(?

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 12.2 years ago by anin.gregory ▴ 110

7

Entering edit mode

My top reason is that BioPerl is inefficient due to its OOP layer.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 12.2 years ago by lh3 33k

Ram · Answer 12 · 2012-04-22

19

Entering edit mode

13.2 years ago

Philipp Bayer 8.8k

Re-inventing the wheel. So often did I have to debug (or just replace) a bad implementation of a fasta-parser when BioPython/BioPerl have perfect implementations, I don't understand why no-one bothers to use them. 10 minutes in Google can save you 2 days of work and other people a week of work (you save 2 days of programming, they save a week of understanding your program to find the bug)

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by Philipp Bayer 8.8k

4

Entering edit mode

I agree too. Sometimes it's good for practicing purposes to keep on writing simple code, though.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 10.0 years ago by Manu Prestat 4.1k

2

Entering edit mode

As the nim library docs say, "The great thing about re-inventing the wheel is that you can get a round one."

My main reason for reinventing the wheel is that I want to use much more powerful and general language: Python instead of R. Of course, if the stuff I needed was already in Python/Pandas it would be a different thing entirely.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 9.8 years ago by Endre Bakken Stovner ▴ 970

0

Entering edit mode

Exactly! I have yet to find a tool that correctly counts heterozygous sites in a VCF. I had to write my own tool instead.

ADD REPLY • link 2.6 years ago by jena ▴ 330

1

Entering edit mode

I fully agree, re-inventing the wheel is so tempting. We are way too eager to write a few lines of code each time. Plus, because you may have convinced yourself that you can resolve the code in 15 minutes, you don't bother about writing any documentation. In short, there is a very large tendency to re-invent the wheel... many, many times!

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by Javier Herrero ▴ 300

2

Entering edit mode

Using other peoples code is all fun and games, until you realise that your package has 106 dependencies, and you only use 1 function from each. Each of those dependencies has its own dependencies, depends on a particular version of gcc (but not the same one as the others), doesn't play nice with some common system used on other peoples systems...

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 7.8 years ago by i.sudbery 21k

Ram · Answer 13 · 2011-04-11

18

Entering edit mode

14.3 years ago

Rayna ▴ 280

Try to open microarray or, worse, NGS datafiles with excel or word...

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Rayna ▴ 280

Ram · Answer 14 · 2011-06-13

17

Entering edit mode

14.1 years ago

Zhidkov ▴ 600

Forget to do 'dos2unix' and then spend a lot of time trying to figure out why there is no OUTPUT

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.1 years ago by Zhidkov ▴ 600

1

Entering edit mode

Classic. This one tricked me 3 times over the course of two years, spending one hour each time to figure out what the h... is going on.

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 11.2 years ago by Christian ★ 3.1k

Ram · Answer 15 · 2011-06-08

16

Entering edit mode

14.1 years ago

Andres Pinzon ▴ 160

Well I have couple:

Run a batch BLAST job and forgetting to put the -o something.out option. Then switching off the monitor and coming the next day to see a bunch of characters in my terminal
tar -zxvf without checking the tar file before, I have decompressed thousands of files in my current directory assuming they came in their own folder.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.1 years ago by Andres Pinzon ▴ 160

1

Entering edit mode

+1 for 2) mostly happens with downloaded software!

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.1 years ago by Naga ▴ 450

0

Entering edit mode

Forget the tar problem, just use atool

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 12.7 years ago by Ryan Thompson ★ 3.7k

Ram · Answer 16 · 2012-11-29

13

Entering edit mode

12.6 years ago

Manu Prestat 4.1k

I gave my Amazon EC2 password to someone in my group who wanted to run something quickly (estimated cost, $2). I received the bill 2 months later: $156. This person forgot to close the instance. This is a 8 months story and I'm still waiting for my reimbursement... Conclusion: don't trust colleagues!

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 12.6 years ago by Manu Prestat 4.1k

Ram · Answer 17 · 2011-04-04

I'll offer this one, which is a bit on the general side: Deletion of data that appear to serve no relevance from the computational side, but which have importance to the biology/biologist. Often, this arises from a lack of clear communication between the two individuals/teams as to what everything means, what it exactly means and why it is relevant to the process being developed.

Ram · Answer 18 · 2012-04-23

11

Entering edit mode

13.2 years ago

Zev.Kronenberg 12k

Not keeping an adequate notebook.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by Zev.Kronenberg 12k

2

Entering edit mode

What is a "notebook"?

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 9.7 years ago by ebrown1955 ▴ 320

0

Entering edit mode

documentation

ADD REPLY • link 2.6 years ago by jena ▴ 330

Ram · Answer 19 · 2011-04-07

having manual components to an analysis pipeline (editing data sets running scripts manually)
Not dealing with error conditions at all. This is one thing that I really noticed when I started with bioinformatics; code that would just merrily continue when it hit incorrect data and output gibberish or fail far away from the bad data. A debugging nightmare.
Not testing edge and corner cases for input data
Assuming that your input data is sane; I've run into all sorts of inconsistency issues with public data sets (i.e. protein domains at positions off the end of the protein, etc). Usually fixed promptly if you complain but you've got to find them first.

Ram · Answer 20 · 2011-04-04

8

Entering edit mode

14.3 years ago

Daniel Standage 4.1k

I often encounter problems related to the fact the computer scientists index their arrays starting with 0, while biologists index their sequences starting with 1. Simple concept that drives the noobs mad and even trips up more experienced scientists every once in a while.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Daniel Standage 4.1k

Ram · Answer 21 · 2011-04-04

8

Entering edit mode

14.3 years ago

Chris Evelo 10k

Do pathways statistics or gene set enrichment statistics and then represent the list of gene sets as a valuable result, instead using that statistics just as a means to decide which pathways need to be evaluated.

(This is bad for many reasons for instance because the statistical contribution of a key regulatory gene in a pathway is equal to that of 1 out 7 iso-enzymes that catalyze a non-relevant side reaction, and because the significance of a pathway changes when you add a few non-relevant genes, and also because we have many overlapping pathways).

Another typical mistake is to solve problems that nobody has.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Chris Evelo 10k

2

Entering edit mode

these sound like poor judgments (e.g. Clinton-Lewinsky), not stupid mistakes (e.g. dangling chad)

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Jeremy Leipzig 23k

2

Entering edit mode

I would define a stupid mistake as falling prey to a trivial but catastrophic pitfall, an error in judgment is more due a fundamental lack of understanding or willful ignorance

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Jeremy Leipzig 23k

1

Entering edit mode

No, I think it is actually wrong to publish a list of pathways without further judgement. I think not doing the judgement is a mistake. But I have to admit that I don't really understand your examples. So maybe my English is not good enough to understand the finesses of the difference between poor judgement and stupid mistakes.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Chris Evelo 10k

0

Entering edit mode

In that case you are right, these would be judgement errors.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Chris Evelo 10k

Ram · Answer 22 · 2011-05-06

8

Entering edit mode

14.2 years ago

Vince Buffalo ▴ 470

One mistake: not looking to see that the 0x4 bit in the bitflag column of a SAM (or BAM) file indicates the entry is mapped. RNAME, CIGAR, and POS may be set to something non-null (an actual string!) but these are not meaningful if the 0x4 flag says the read is unmapped.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.2 years ago by Vince Buffalo ▴ 470

0

Entering edit mode

I've stepped in that bear trap. Learning to trust the bit flags is an important lesson!

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 10.6 years ago by Zev.Kronenberg 12k

Ram · Answer 23 · 2012-04-20

8

Entering edit mode

13.2 years ago

Rm 8.3k

using rm -rf * .fasta in unintended directory; especially if within the home directory...

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by Rm 8.3k

1

Entering edit mode

then do not use the recursive switch (-r) to delete files within the same directory :-P

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by toni ★ 2.2k

1

Entering edit mode

yes the space between * and .fa has bitten me as well. maybe there is an idiot guard against that somewhere?

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by Jeremy Leipzig 23k

2

Entering edit mode

So this is a (very) late reply, but in case it's still helpful or someone comes across this question like I did, rm -ir will ask before deleting files. Maybe a little annoying to type y a hundred times, but better to do that than lose all your data to a mistyped glob IMHO.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 11.0 years ago by Duncan Murdock ▴ 20

9

Entering edit mode

From my .bashrc:

alias cp='cp -i'
alias rm='rm -I'

Has saved me quite a bit of headache. The capital I prompts only when you remove more than three files - good when you accidentally type rm some_directory/ * (notice the space)

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 11.0 years ago by Philipp Bayer 8.8k

0

Entering edit mode

Ooh, I never knew about -I, thanks!

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 11.0 years ago by Duncan Murdock ▴ 20

1

Entering edit mode

I was about to add this one myself. It's bitten me a couple of times.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by Travis ★ 2.9k

1

Entering edit mode

:) I did the same stupid thing many time!! I lost weeks of works by one click!

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.0 years ago by Mchimich ▴ 320

1

Entering edit mode

I too have had that moment of dread when I realized I typed rm * /folder versus rm /folder/*! Check out some of the solutions on this forum page, specifically trash-cli. You can set up a trash folder so after deleting files they are not completely gone and can be restored if needed. You would have to manually empty the trash folder or set up a cron job to do so on a regular basis, but this may help circumvent the nightmares listed here!

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 10.0 years ago by alolex ▴ 960

0

Entering edit mode

I was just deleting some unnecessary files from a dir and managed to have a space and an asterisk at the end of the rm command. As soon as I realized what was happening I hit ctrl-c, but important files without backups were already gone. Oh well, it will only take like 2-3 weeks to reproduce them. Also time to edit .bashrc following Philipp's post..

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 10.1 years ago by 5heikki 11k

1

Entering edit mode

Once I did something very similar: I deleted all files and subdirectories in a directory of which I thought I have them in duplicate. Shortly after I realized I was inside a symbolic link directory and I was deleting the original data....

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 10.1 years ago by Christian ★ 3.1k

1

Entering edit mode

Note that the really stupid mistake here is a bit hidden "important files without backups"

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 10.0 years ago by Chris Evelo 10k

Ram · Answer 24 · 2011-04-21

7

Entering edit mode

14.2 years ago

Ian 6.1k

Masking out sequence in a FASTA file (e.g. s/TAAT/NNNN/ig) where the sequence is formatted, i.e. split onto multiple lines.

This will miss TAAT that is split over the end of one line and the start of the next!

The classic mistake (also mentioned above by Casey) is not being aware the genome assembly effect coordinates.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.2 years ago by Ian 6.1k

Ram · Answer 25 · 2011-04-04

6

Entering edit mode

14.3 years ago

Blunders ★ 1.1k

What kills me the most is the hand editing of data sets.

If you're reading this and do it, please stop -- and start using automated builds -- with clear documentation.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Blunders ★ 1.1k

Ram · Answer 26 · 2011-04-05

6

Entering edit mode

14.3 years ago

Andrewjgrimm ▴ 460

Having separate files for each sequence.

Of a 454 run.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Andrewjgrimm ▴ 460

1

Entering edit mode

I was using Ubuntu Linux. It "handled" it, just slowly.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.2 years ago by Andrewjgrimm ▴ 460

0

Entering edit mode

Or rather, using a file system that can't handle a few million files in a directory?

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Ketil 4.2k

Ram · Answer 27 · 2012-06-11

I wouldn't say it's stupid, but I think a very common mistake is to not correct for batch effects in high-throughput data.

Batch effects can (best-case) hide the real effect that you're looking for, or (worst-case) make it look like your variable of interest is contributing to your findings when it's actually an artifact.

Leek + Irizarry et al. have a sobering review on this here.

Ram · Answer 28 · 2011-04-04

5

Entering edit mode

14.3 years ago

Thaman ★ 3.3k

Generate Multiple Sequence Alignment direct from fasta or other file and ask: why there are no gaps & deletion in the MSA viewer?

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Thaman ★ 3.3k

Ram · Answer 29 · 2011-04-05

5

Entering edit mode

14.3 years ago

hadasa ★ 1.0k

Using excel to sort or manage your csv records.
Using multiple alignments or highly diverse sequences or worst recombining sequences and inferring evolutionary history based on the resulting tree.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by hadasa ★ 1.0k

Ram · Answer 30 · 2012-01-16

5

Entering edit mode

13.5 years ago

T S ▴ 50

developing algorithms and software around one single piece of low quality data with no prior knowledge while being ignorant about the entire problem.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 13.5 years ago by T S ▴ 50

1

Entering edit mode

Sounds like a first year of a PhD project :D

ADD REPLY • link 2.6 years ago by jena ▴ 330

Ram · Answer 31 · 2012-01-16

5

Entering edit mode

13.5 years ago

Pascal ★ 1.5k

Easy one: you wait for 4 hours downloading a big DNA file (e.g. bam file) and you mistakenly delete it when trying to move it with a good old rm.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 13.5 years ago by Pascal ★ 1.5k

Ram · Answer 32 · 2013-04-26

5

Entering edit mode

12.2 years ago

girlwithglasses ▴ 280

Creating a new set of (public) identifiers for your database for entities that already have identifiers in a widely-user public db.

Having unstable identifiers that are publicly available.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 12.2 years ago by girlwithglasses ▴ 280

1

Entering edit mode

Our paper, out this week, is a decent round up of how to avoid identifier-specific issues with your data and that of others. http://dx.doi.org/10.1371/journal.pbio.2001414

ADD REPLY • link 8.0 years ago by mcmurry.julie ▴ 50

Ram · Answer 33 · 2011-06-16

4

Entering edit mode

14.1 years ago

Jeremy Leipzig 23k

tacking on another command line argument without looking through the rest of them

novoalign -a ATCTCGTATGCCGTCTTCTGCTTG -d genome.ndx -F ILMFQ -f query.fq -a -m -l 17 -h 60 -t 65 -o sam -o FullNW

the first adapter argument (-a ATCTCGTATGCCGTCTTCTGCTTG) is negated by the empty second one

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.1 years ago by Jeremy Leipzig 23k

Ram · Answer 34 · 2012-04-20

Running the bwa/GATK pipeline with a corrupt/incompletely generated bwa index of hg19. Everything still aligned, but one of 2 mates would have its strand set incorrectly. Other than the insert size distribution, everything seemed normal, until the TableRecalibration step downshifted all quality scores significantly and then UnifiedGenotyper called 0 SNPs. 1st time I've seen a problem with step 1 of a pipeline not become obvious until step 5+.

Ram · Answer 35 · 2012-05-04

4

Entering edit mode

13.2 years ago

Niek De Klein ★ 2.6k

Scripting an hour to do something you could have done in half an hour manually, and then never needing to repeat it again.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by Niek De Klein ★ 2.6k

9

Entering edit mode

I tend to see the opposite not spending the time up-front to do it right and having to continue to do it manually/semi-manually ad nauseam

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by brentp 24k

2

Entering edit mode

but that one is in here already

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.1 years ago by Niek De Klein ★ 2.6k

1

Entering edit mode

well, at the very least you may learn something that saves you an hour on a different project in the future

ADD REPLY • link 8.3 years ago by cmdcolin ★ 4.2k

Ram · Answer 36 · 2012-07-31

4

Entering edit mode

13.0 years ago

Leonor Palmeira 3.9k

I've just made one, which cost me a good headache trying to figure out the biology underlying my strange results!

expecting SAM's POS field to be the leftmost position of my mapped read on the '+' strand, and the rightmost position on the '-' strand

Note to self : "Read the manual..."

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 13.0 years ago by Leonor Palmeira 3.9k

2

Entering edit mode

not to mention how much work is to actually get the rightmost coordinate.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.0 years ago by Istvan Albert 102k

1

Entering edit mode

Hah! I just reverse complemented the reference genome, and redid the alignment. Admittedly, this was for 454 data.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 12.7 years ago by Ketil 4.2k

0

Entering edit mode

Hehe! That's the best idea ever! This thread keeps on giving.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 12.7 years ago by Istvan Albert 102k

0

Entering edit mode

But you should be careful. Doing that will misplace the indel position in a microsatellite.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 12.7 years ago by lh3 33k

0

Entering edit mode

If I understand you correctly, you are saying that this will inflate the number of variants, since many have ambiguous positions? Interesting - do aligners generally guarantee that such ambiguous variants are consistently placed for forward and reverse reads?

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 12.7 years ago by Ketil 4.2k

2

Entering edit mode

BWA always places the indel at the beginning of a microsatellite. If you align the read to the rev-complemented ref, the indel will be at the end. Many indel callers assume the bwa behavior, though there are also tools to left-align indels.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 12.7 years ago by lh3 33k

0

Entering edit mode

Isn't this just POS+length(SEQ)? I'm having doubts now...

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.0 years ago by Leonor Palmeira 3.9k

2

Entering edit mode

that only applies if the sequence contains only matches or mismatches, this means edit strings that are composed of a number followed by M (like 76M) . For all other alignments you will need to parse the CIGAR string and build the end coordinate from the start + numbers in the edit string.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.0 years ago by Istvan Albert 102k

0

Entering edit mode

Phew... I'm glad I only have matches and mismatches, so I fall in the easy category :-) Thanks a lot for adding this information, this can be a big trap!

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.0 years ago by Leonor Palmeira 3.9k

0

Entering edit mode

I have a perl parser that will change the read length bases on the cigar if you ever want / need it.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.0 years ago by Zev.Kronenberg 12k

0

Entering edit mode

Thanks a lot! I'll keep that in mind! Or maybe you can share it here as a Tool? or as an answer to this thread : Mapping Reads With Bwa And Bowtie ?

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.0 years ago by Leonor Palmeira 3.9k

Ram · Answer 37 · 2012-08-01

4

Entering edit mode

13.0 years ago

JC 13k

I made one a few months ago. I launched a heavy process in a pay-per-use cluster, it was running for one week. I thought, 6 pennies/hr cannot be too much money. I received a bill for $832 usd. I'm not using this cluster again unless I estimate the total cost of the process.

edit: the price is per core

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 13.0 years ago by JC 13k

0

Entering edit mode

By my count, 6 pennies per hour is $1.44 a day or about $10 a week. How did you get $832?

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 12.7 years ago by Ryan Thompson ★ 3.7k

0

Entering edit mode

maybe its price per CPU and depending upon the size of cluster, BAM!!!

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 12.7 years ago by Sukhi Singh 11k

0

Entering edit mode

I used ALL the cores ...

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 12.7 years ago by JC 13k

0

Entering edit mode

Ah well, 6 pennies per CPU-hour is a little different, isn't it? :)

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 12.7 years ago by Ryan Thompson ★ 3.7k

0

Entering edit mode

yes, but it was not clear in the service description from our local provider

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 12.7 years ago by JC 13k

Ram · Answer 38 · 2015-07-27

Possibly: implementing methods that magically generate p-values from non-replicated RNA-seq experiments, possibly as a result to pressure from 'experimentalists'. I really would like to know the history behind their implementation (where they forced by reviewers, or by other groups?). Now we have to explain why those p-values are bogus, and why there are so little significantly differentially expressed genes detected in a non-replicated analysis.

Ram · Answer 39 · 2015-08-18

4

Entering edit mode

9.9 years ago

tiago211287 ★ 1.5k

When using scp, I did this:

scp -r somename@somehost:$home ~/

instead of $home, should use ~/

this created a file named $home in my other account.

used rm $home

and kabum!

should pay attention and look how remove this kind of file before.

ADD COMMENT • link updated 5.7 years ago by Ram 45k • written 9.9 years ago by tiago211287 ★ 1.5k

score 4 · Answer 40 · 2016-04-07

4

Entering edit mode

9.3 years ago

Nicola Casiraghi ▴ 500

I spent hours to implement R parallel script to fully exploit 64 cores and 250 GB RAM of the server lab. Thus, really proud of myself, I run it on my 4 cores and 8 GB RAM desktop pc. Boom.

ADD COMMENT • link 9.3 years ago by Nicola Casiraghi ▴ 500

0

Entering edit mode

I remember once I put a coma in the wrong bracket in some matrix assignment on genotype data and after running the code my laptop froze so hard I had to shut it down by holding the power button. I remember the laptop needed two reboots to get back to normal. After going through the code to see what happened, I realized I assigned a matrix to each cell of itself (or something like that), basically creating an object somewhere around 250 GB in size. My laptop had 8 GB of RAM and about the same of swap (older Ubuntu) and it run out of both within an eye-blink :D

ADD REPLY • link 2.6 years ago by jena ▴ 330

Ram · Answer 41 · 2011-04-07

3

Entering edit mode

14.3 years ago

Eric Normandeau 11k

Losing an hour to learn that some files saved on a Mac have a strange 'beginning of file' like character...

Normalize file standards already! x_o

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Eric Normandeau 11k

Ram · Answer 42 · 2012-04-20

3

Entering edit mode

13.2 years ago

Sukhi Singh 11k

My common one cat aVeryBigFile.whatever &

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by Sukhi Singh 11k

6

Entering edit mode

can't you just fg then ctrl-c?

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by brentp 24k

2

Entering edit mode

@brentp, @Daniel cool guys, now this mistake can be rectified :)

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by Sukhi Singh 11k

2

Entering edit mode

if you don't have other cat job running, you can also type killall cat. Even if you don't see your input, it should work.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 12.2 years ago by Manu Prestat 4.1k

1

Entering edit mode

just kill -9 $PID from another session, no?

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by Simon Cockell 7.4k

1

Entering edit mode

Good thought but, on a server (I should have told earlier, I commit this on server sessions), your process id's in one login are not known in other login/session.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by Sukhi Singh 11k

7

Entering edit mode

I'm pretty sure that ps aux | grep cat from another session will give you the PID of any running cat process on the server.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 13.2 years ago by Daniel Standage 4.1k

Ram · Answer 43 · 2015-09-12

I had another good one recently. I was executing an untested bash script for generation of output from two input files on our cluster. I just let it run over night. When I checked in the morning, it had generated about 40 TB worth of output (expectation was about 20 MB). There was a tiny spelling mistake that led to an infinite loop. Oops. I was lucky to check it when I did because there was still a few TB space left so at least other jobs didn't get killed because of it..

Ram · Answer 44 · 2015-09-12

3

Entering edit mode

9.8 years ago

Antonio R. Franco ★ 5.2k

I am not joking,

how about forgetting to have a full loaded battery for your wireless keyboard and/or mouse near you, and the current one inside is running out of power?

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 9.8 years ago by Antonio R. Franco ★ 5.2k

1

Entering edit mode

It depends, how long did it take to notice you were typing and nothing happened?

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 9.7 years ago by h.mon 35k

2

Entering edit mode

sometimes a long time.. :-))

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 9.7 years ago by Antonio R. Franco ★ 5.2k

Ram · Answer 45 · 2015-10-13

I have spent hours, in repeated occasions, looking for a mysterious error in a perl script that at the end was simply a = instead of a == within an IF statement.

Another recurrent mistake: not documenting what I did and what those scripts do in the belief that everything is so intuitive, organised, simple and natural that it won't be necessary. Then, sometime after, I have to spend hours trying to guess what all that mess was.

score 3 · Answer 46 · 2017-09-21

3

Entering edit mode

7.8 years ago

ATpoint 88k

rm foo *

instead of

rm foo*

and there goes all the content.

ADD COMMENT • link 7.8 years ago by ATpoint 88k

0

Entering edit mode

-i

is your friend here

ADD REPLY • link 7.8 years ago by Gjain 5.8k

Ram · Answer 47 · 2021-05-06

3

Entering edit mode

4.2 years ago

4galaxy77 2.9k

I'm sure someone has mentioned this, but running something like:

bcftools view -S something.txt in.bcf > in.bcf

Goodbye in.bcf, I hardly knew ye.

Also memory management for Java programs on an HPC (beagle I'm looking at you). I love spending hours trying to guess exactly how much memory the JVM is going to use and toggling -Xmx/-Xms flags.

ADD COMMENT • link updated 4.2 years ago by Ram 45k • written 4.2 years ago by 4galaxy77 2.9k

1

Entering edit mode

I feel your pain. A handy way to avoid this is to set -o nocolobber, which prevents you from redirecting into existing files. See: https://mywiki.wooledge.org/NoClobber

ADD REPLY • link 4.2 years ago by Ram 45k

0

Entering edit mode

thanks for the tip! definitely will try that out

ADD REPLY • link 4.2 years ago by 4galaxy77 2.9k

Ram · Answer 48 · 2011-04-04

2

Entering edit mode

14.3 years ago

Andra Waagmeester 3.2k

What about building an interface not aimed at a community of (fellow) Biologists?

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Andra Waagmeester 3.2k

2

Entering edit mode

that might be stupid and it might be a mistake but it isn't a "stupid mistake". see comments under Chris Evelo's post.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Jeremy Leipzig 23k

1

Entering edit mode

Then I will conclude with the same comment as Chris.

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Andra Waagmeester 3.2k

Ram · Answer 49 · 2011-04-07

2

Entering edit mode

14.3 years ago

Russh ★ 1.2k

s/foo/bar/g

without the g at the end

as I just proved in the field

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.3 years ago by Russh ★ 1.2k

1

Entering edit mode

I have an opposite mistake. I accidentally replace a string in the whole document and something I didn't want to replace gets replaced. Now I visually select the area where I want to replace...

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 14.1 years ago by Sequencegeek ▴ 740

Ram · Answer 50 · 2011-06-13

2

Entering edit mode

14.1 years ago

Gww ★ 2.7k

Making claims without experimental validation. Especially involving studies utilizing multiplexed technologies such as microarrays and high-throughput sequencing.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 14.1 years ago by Gww ★ 2.7k

Ram · Answer 51 · 2012-06-11

2

Entering edit mode

13.1 years ago

Rm 8.3k

Just read this article: "How Not To Be A Bioinformatician" Thought it would be interesting to post here....

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 13.1 years ago by Rm 8.3k

2

Entering edit mode

Then this might fit as well: "How to be a bioinformatician"

ADD REPLY • link updated 5.1 years ago by Ram 45k • written 11.2 years ago by Christian ★ 3.1k

Ram · Answer 52 · 2012-06-21

Some really great comments here, nice to know that such things happen to all genii ;). I have to say my most painful moments relate to my assumption that data obtained elsewhere is correct in every way. I also remember early in my career, using PDB files and realising that sometimes, chains are represented more than once, thus when manually checking calculations involving atomic coordinates, being utterly perplexed and wanting to break my computer. Oh the joys of Bioinformatics.

Ram · Answer 53 · 2012-10-25

2

Entering edit mode

12.7 years ago

Ryan Thompson ★ 3.7k

Using a statistical test on data that does not satisfy the assumptions of that test.

For example, finding differentially-expressed genes by doing ANOVA on log2(FPKM).

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 12.7 years ago by Ryan Thompson ★ 3.7k

Ram · Answer 54 · 2012-11-04

2

Entering edit mode

12.7 years ago

Ryan Thompson ★ 3.7k

Assuming that the gene IDs in "knownGenes.gtf" from UCSC are actually gene IDs. Instead they just put the transcript ID as the gene ID.

This just caused me a bit of pain when doing read counting at the gene level. Basically, any constitutive exon in a gene with multiple splice forms was ignored because all the reads in that exon were treated as ambiguous.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 12.7 years ago by Ryan Thompson ★ 3.7k

Ram · Answer 55 · 2012-11-16

2

Entering edit mode

12.7 years ago

axelwilhelm ▴ 120

Using the same output (> oh.shit) for multiple commands.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 12.7 years ago by axelwilhelm ▴ 120

Ram · Answer 56 · 2013-04-30

2

Entering edit mode

12.2 years ago

Leszek 4.2k

I found myself guilty iterating through the loop and storing data let's say every 100 iterations... but not storing the very last bit of the data (ie lines 10001 to 10026) at the very end.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 12.2 years ago by Leszek 4.2k

Ram · Answer 57 · 2014-05-16

These are just a few, but thought worth sharing:

Not having a copy of VERY IMPORTANT file as backup.
Forgetting to save parameters used, while running a tool and again start hunting them after few days.
Irrelevant folder/file names, especially not having extensions like fasta or fastq at the end of file.
Not having a consolidated folder/place, where all scripts and programs lie.
Upgrading OS without IT support or prior knowledge, and completely losing the GUI, which needs reinstall of OS again.
using UNIQ command in linux, without SORTING the data.
Not removing trailing and leading space of variable, before matching it.

Ram · Answer 58 · 2015-09-14

2

Entering edit mode

9.8 years ago

Lesley Sitter ▴ 610

How about writing a tool and being convinced it works perfectly, so you start testing it on a complete dataset instead of testing it first on a subset and finding out after it ran for an hour or so that you made a tiny mistake somewhere. Sooo much time wasted that I'll never get back :P

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 9.8 years ago by Lesley Sitter ▴ 610

Ram · Answer 59 · 2015-10-14

2

Entering edit mode

9.8 years ago

elia.brodsky ▴ 340

Spending a few months to find an interesting correlation in data and then presenting to the lab that sequenced to find out they changed coverage on the last few samples!

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 9.8 years ago by elia.brodsky ▴ 340

score 2 · Answer 60 · 2016-04-04

2

Entering edit mode

9.3 years ago

5heikki 11k

This one was really good, embarking on sudo yum update when there was lots of stuff to update and swap space was very low. Ended up with a situation much like this. Took me good four hours before I saw my desktop again.

ADD COMMENT • link 9.3 years ago by 5heikki 11k

score 2 · Answer 61 · 2016-04-07

2

Entering edit mode

9.3 years ago

5heikki 11k

Another fun one is when you develop a pipeline with a small test set thinking speed over all and then you increase your test set size and realize that you're creating TBs of temp data and utilizing hundreds of GBs of RAM :)

ADD COMMENT • link 9.3 years ago by 5heikki 11k

score 2 · Answer 62 · 2016-04-08

2

Entering edit mode

9.3 years ago

T ▴ 40

Chromosome Identifiers - UCSC "chr1" - Ensembl "1"

Produces very confusing outputs ...

ADD COMMENT • link 9.3 years ago by T ▴ 40

score 2 · Answer 63 · 2016-10-26

2

Entering edit mode

8.7 years ago

ssv.bio ▴ 200

Over estimating future me when I was younger or present me cussing younger me :). Commenting is imp as old me understands that.

ADD COMMENT • link 8.7 years ago by ssv.bio ▴ 200

score 2 · Answer 64 · 2017-09-21

Since no one mentioned.

The uppercase/lowercase error.

Sometimes, you need to distinguish between "atgc" and "ATGC". But in most cases, they mean the same thing. So always convert any string you met to uppercase if you do not need to distinguish.

I feel spending my whole Ph.D. debugging this. Hope it helps for others.

score 2 · Answer 65 · 2017-09-21

Running statistical analyses with no understanding of the tests your using, why you're using them, when it's appropriate to use them, how your data needs to be formated, normalize, or scaled to use them... but hey, you made a volcano plot... so it must be publication quality. right? read the damn paper...

score 2 · Answer 66 · 2017-09-21

2

Entering edit mode

7.8 years ago

mforde84 ★ 1.4k

drawingbiological insights into a dataset solely from GO enrichment analysis

ADD COMMENT • link 7.8 years ago by mforde84 ★ 1.4k

score 2 · Answer 67 · 2017-09-21

2

Entering edit mode

7.8 years ago

mforde84 ★ 1.4k

using sudo, dd, parted or any combination thereof ... like a moron.

ADD COMMENT • link 7.8 years ago by mforde84 ★ 1.4k

Ram · Answer 68 · 2013-05-08

1

Entering edit mode

12.2 years ago

Manu Prestat 4.1k

A double mistake combo, 1 - use tar to compress a single file and, 2 - inverting the command arguments

tar cvfz file file.tgz

instead of

tar cvfz file.tgz file

Bye bye file!

It happened to me so many times, that I was considering doing an imagery brain check up.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 12.2 years ago by Manu Prestat 4.1k

Ram · Answer 69 · 2014-05-16

1

Entering edit mode

11.2 years ago

Christian ★ 3.1k

Forgetting that human chromosomes are named differently by UCSC and Ensembl (UCSC names have the 'chr' prefix)
Assuming the alphabetical order of human chromosome names is "chr1", "chr2", "chr3", ... when in fact it is "chr1", "chr10", "chr11", ...
Forgetting that there is a fifth possible character in a DNA sequence: N (for unknown)

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 11.2 years ago by Christian ★ 3.1k

0

Entering edit mode

You also occasionally see IUPAC codes pop up in fasta sequences.

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 11.2 years ago by Chris Miller 22k

Ram · Answer 70 · 2014-11-23

1

Entering edit mode

10.6 years ago

Bioinformatics_NewComer ▴ 330

Forgot to use -n in numerical sorting of positions in a bed file.
Not using -k flag in sort for a specific column, instead sorting all possible stuff in it.

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 10.6 years ago by Bioinformatics_NewComer ▴ 330

Ram · Answer 71 · 2015-09-14

1

Entering edit mode

9.8 years ago

always_learning ★ 1.2k

Mixing between hg18, hg19 and yes new one GrCh 38 too !! :)

ADD COMMENT • link updated 5.1 years ago by Ram 45k • written 9.8 years ago by always_learning ★ 1.2k

score 1 · Answer 72 · 2016-10-26

1

Entering edit mode

8.7 years ago

confusedious ▴ 490

A very general one:

Using packages without trying to understand what they actually do.

A more specific one:

Assume that the human mitochondrial reference sequence is the ancestral sequence.

ADD COMMENT • link 8.7 years ago by confusedious ▴ 490

score 1 · Answer 73 · 2017-03-22

1

Entering edit mode

8.3 years ago

Venkatesh Chellappa ▴ 10

Opening a .clc file in terminal only to find out it was a binary! #facepalm

ADD COMMENT • link 8.3 years ago by Venkatesh Chellappa ▴ 10

score 1 · Answer 74 · 2022-06-11

I have a bit of a different take:

The most pervasive and damaging problem afflicting bioinformatics studies is the failure to curate covariate information, resulting in problems of separation and/or complete separation.

This problem is particularly sinister because this frequently occurs long before the bioinformatician is consulted on the project, meaning the project already has unnecessary limitations before the data scientist/statistician/what-have-you receives the data.

At minimum, this will limit statistical power, in certain cases, it means there is a confound that can't be mitigated without meta-analytics or extreme ends. In my experience this affects, oh I don't know, >70% of labs, including world-class, well-known labs, who don't retain a statistician.

score 0 · Answer 75 · 2017-09-21

0

Entering edit mode

7.8 years ago

mforde84 ★ 1.4k

my personal favorite is irreparably breaking your xsession, or breaking a mount point so you can't boot past initramfs

ADD COMMENT • link 7.8 years ago by mforde84 ★ 1.4k

score 0 · Answer 76 · 2022-06-12

0

Entering edit mode

3.1 years ago

ahmad mousavi ▴ 800

Creating new result and forget to remove past one for example : result1, result2 , result 30May, result20june, result_final, result_final_final !!!
Holding the zip version after unzipping the files
Holding sam format files after converting to bam format
Creating multiple bam in GATK and forgetting to remove them
Changing the linux configuration core during installing the softwares !!! crashing the linux and install new linux !!
Forgetting to save the pipelines used in Bash
Forget to put dev.off() in R code
Using rm * without careful watching the file list

ADD COMMENT • link 3.1 years ago by ahmad mousavi ▴ 800

1

Entering edit mode

Points 1 and 2 are not really mistakes, I never delete older versions. Instead, I create files with timestamps. Unless you know you'll never need the older version (and still have code/documentation to create the older version if necessary), deleting it is not a smart thing to do. I'd rather delete unzipped files after using them than delete the zip archive.

Points 1-4 are all about space saving. Of these, only #3 is problematic in all scenarios as no information is lost. #4 is probably specific to your way of using GATK.

ADD REPLY • link 3.1 years ago by Ram 45k

score 0 · Answer 77 · 2022-11-22

0

Entering edit mode

2.6 years ago

LauferVA 4.8k

having someone in your group inadvertently save data (rather than code/readmes) to github. Have fun getting that off!

ADD COMMENT • link 2.6 years ago by LauferVA 4.8k