Question: How to avoid conversion of gene symbols to date format in Excel
7
gravatar for Mike
2.8 years ago by
Mike1.3k
UK
Mike1.3k wrote:

Hi All,

Apologies for the technical/irrelevant question.

Some gene names start with APR/MARC/SEPT* etc default converted into date format.

How to avoid this automatics conversion of gene symbols to date in Excel.

Thanks..

gene • 4.2k views
ADD COMMENTlink modified 2.8 years ago by Pranavathiyani G240 • written 2.8 years ago by Mike1.3k
2

See also https://nsaunders.wordpress.com/2012/10/22/gene-name-errors-and-excel-lessons-not-learned/ in Neil's blog.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Michael Dondrup46k
2

Well avoiding excel would work. Look here: "Gene name errors are widespread in the scientific literature"

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by dago2.5k

I know this is not the answer to the question that you are asking, but I agree with the other guys. Should you work with genes on a regular basis or will with with data in general in the future, consider putting the effort in to learn R/Matlab/Python - at least one of them and at least a bit. It's worth it. (If this is just, let's say, a student project and you know that this is a one time thing, than this might be another story but .... well )

ADD REPLYlink written 2.8 years ago by LLTommy1.2k
7

I would argue that for most, Excel is unfortunately unavoidable - yes, you can personally avoid it, but as soon as you send your tab-delimited table to someone else to look at - science is collaborative after all - they will open it in Excel.

The mangling of gene names by Excel is an entirely avoidable, user-mediated problem. Use your advantage (knowing how computers work) to prevent others making this mistake, not to sanctimoniously point out their inevitable error when they make it.

ADD REPLYlink written 2.8 years ago by Simon Cockell7.3k
1

The problem has been known for at least 12 years. And I agree it can be avoided on a case by case basis, but not generally. That means, that whatever precautions are taken, eventually the same errors will be introduced, the past experience shows that this is going to happen over an over again.

The mangling of gene names in Excel is imo an archetype of bad design, where the software tries to be 'smart' in an unexpected way (even called 'Standard') upon import of text data. The easiest way would be to turn of the 'Standard' import type in Excel completely, but this is not possible, and therefore will inevitably lead to these errors.

There are enough alternatives to Excel for dealing with spreadsheets, like LibreOffice, or Google docs, or R-studio and none of them has this behavior.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Michael Dondrup46k
1

@ LLTommy , thanks, but this is not one time thing, I am bioinformatician, and I have lots of experience with Unix /R/Perl, so for me this is not problem, But problem is that when I send my results to my PI or someone else/Experimental people, they always open in Excel or want results in Excel.

ADD REPLYlink written 2.8 years ago by Mike1.3k
2

But problem is that when I send my results to my PI or someone else/Experimental people, they always open in Excel or want results in Excel.

Then include explicit instructions NOT to use Excel, explain that Excel will eff-up their data, and suggest an alternative tool. A lot of headaches can be avoided by anticipation of the errors that "experimental people" are likely to make. And it's our responsibility as bioinformaticists to minimize those errors, instead of compounding them by being intentionally obtuse and/or not sharing our expertise.

Excel is an unavoidable evil only if we allow it. Turn away from the dark side! :-).

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by harold.smith.tarheel4.4k
1

I admire your idealism @harold.smith.tarheel, but >10 years of experience has taught me that this will not work. You'll pry Excel from their cold, dead hands.

ADD REPLYlink written 2.8 years ago by Simon Cockell7.3k

I'm actually pretty cynical, and my experience is similar to yours. But there's a difference between using Excel out of ignorance vs intentionally disregarding instructions to the contrary. The latter users deserve to have their data mangled.

ADD REPLYlink written 2.8 years ago by harold.smith.tarheel4.4k

agree with @Simon Cockell

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Mike1.3k

Fair enough, I just assumed that you are working with Excel yourself. And yes, if others open/send you data in that data format, you can not simply work around as I mentioned by not using Excel. So well, I don't know if there is really a solution to this or if you constantly have to work around it (telling people not to use Excel, double check the data that you receive from people to make sure Excel did not do funny things, and I don't know)

Having sad all that. Excel (or alternatives) are great programs - for what they were made for. The problem is that people just put everything in there, EVERYTHING. And that is not only the case in Biology/Bioinformatics but in many many other fields. Believe me, I know. And it causes a lot of problems there as well. This is kind of an old story, but if you have some minutes and don't know it, it's 'fun' and reminds you that we are not the only ones battling with Excel.

ADD REPLYlink written 2.8 years ago by LLTommy1.2k

Thanks all for your comments and suggestions.

Final conclusion is that "Excel is an unavoidable Evil"

ADD REPLYlink written 2.8 years ago by Mike1.3k

Hi, This is a useful video to resolve the problem.

ADD REPLYlink written 5 days ago by Morteza Hadizadeh10
13
gravatar for Simon Cockell
2.8 years ago by
Simon Cockell7.3k
Newcastle
Simon Cockell7.3k wrote:

If we assume that Excel is an unavoidable evil (and for most this is unfortunately a credible assumption)...

When importing your delimited data into Excel, be sure to set the column type appropriately. The default for all columns is 'General' - this will lead to the inevitable auto-correct mangling. Gene name columns should be set to 'Text', then the septins etc will be left unmolested.

Once mangling has occurred, correction needs to be approached carefully - the main issue with what Excel does here is that it changes the underlying data, rather than only changing things in the display layer. This means your gene names are turned into floats (which is Excel's underlying representation of dates). Of course, Excel being Excel, the float used is different depending on the platform: https://support.microsoft.com/en-us/kb/214330 (also some versions of Excel think 1900 was a leap year, others know better - this is a very deep rabbit hole...). You could conceivably write some code to handle the conversion of dates back to gene names, but it would have to consider these caveats and edge cases veeeerrrry carefully.

All in all, I agree with Michael (although accept that this is not always possible).

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Simon Cockell7.3k

Thanks Simon........

ADD REPLYlink written 2.8 years ago by Mike1.3k
4
gravatar for Michael Dondrup
2.8 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

Avoiding use of Excel is a way (and possibly the only one) to consistently solve this problem.

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Michael Dondrup46k
2
gravatar for WouterDeCoster
2.8 years ago by
Belgium
WouterDeCoster40k wrote:

From what I read, no conversion happens in google spreadsheets.

ADD COMMENTlink written 2.8 years ago by WouterDeCoster40k

Google Spreadsheets is not a safe space for gene lists, says @ryanlayer.

ADD REPLYlink written 20 months ago by trutane30
1

I believe (but am not 100% certain) that Google sheets did not suffer from the auto-conversion issue when this thread was started, but now it does. The "trick", as with Excel, is to define the column variable type of course. Unfortunately, getting people to think about the nature of their data in a prevalent "point and click" culture remains challenging.

ADD REPLYlink written 20 months ago by Neilfws48k
2
gravatar for Pranavathiyani G
2.8 years ago by
Pondicherry, India
Pranavathiyani G240 wrote:

While pasting, use import from text wizard in excel and in Data Format section choose the gene column as "Text" datatype.

ADD COMMENTlink written 2.8 years ago by Pranavathiyani G240
1
gravatar for H.Hasani
2.8 years ago by
H.Hasani760
Freiburg, Germany
H.Hasani760 wrote:

Please read this paper before making any efforts and then decide for yourself!

Gene name errors are widespread in the scientific literature

ADD COMMENTlink written 2.8 years ago by H.Hasani760
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 780 users visited in the last hour