Question: How to define a column containing gene names "text" (instead of "general") when using openxlsx::write.xlsx
2
gravatar for Friederike
7 weeks ago by
Friederike5.1k
United States
Friederike5.1k wrote:

It is a well-known issue that Excel can inadvertently bungle up gene names unless the format type of the column containing the gene names is set to "text" (see e.g. here and here).

While the solution proposed in the links above works it depends on using Excel's Import Wizard and generally assumes that the files in question are simple tab- or comma-delimited files. However, I would, in fact, like to make use of the capabilities of the openxlsx R package, which allows me to generate fairly complex spreadsheet collections, which is often preferable to dumping an entire folder of individual csv/txt files on a collaborator. Does anyone know how to specify the column type when using the write.xlsx function?

This is the command that I typically use to turn a list of data.frames into a collection of spreadsheets (the names of the data.frames within the list will become the names of the individual sheets in the resulting xslx file.

list_of_data_frames <- list( first_sheet = data.frame(gene = c("a","b","c"), b = 4:6), second = data.frame(X = "a", Y = 5)) 
openxlsx::write.xlsx(list_of_data_frames, file = "filename.xlsx", rowNames = TRUE)
openxslx excel genes R formatting • 179 views
ADD COMMENTlink modified 13 days ago by Benn7.7k • written 7 weeks ago by Friederike5.1k

cross-posted on github

ADD REPLYlink written 7 weeks ago by Friederike5.1k

I think you should be able to use the writeDataTable function to enforce formatting on the data by setting class(df$Gene) <- "text". See the formatting vignette for an example.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by RamRS24k

class(df$gene) <- "text" did not change the result. I'd like to stick to write.xlsx, which is basically a wrapper around writeDataTable if I understand correctly.

ADD REPLYlink written 7 weeks ago by Friederike5.1k

Have you tried writing paste0("'", gene) instead of just gene? Excel will treat SEPT1 as date but 'SEPT1 as text.

Try:

list_of_data_frames <- list( first_sheet = data.frame(gene = c("a","b","c"), b = 4:6), second = data.frame(X = "a", Y = 5))
list_of_data_frames$first_sheet$gene <- paste0("'", list_of_data_frames$first_sheet$gene)
openxlsx::write.xlsx(list_of_data_frames, file = "filename.xlsx", rowNames = TRUE)
ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by RamRS24k

Yes, that's a work-around, I guess, one could add anything to alter those gene names, but it's annoying because I cannot easily read them back in.

That being said, I've noticed that, apparently, the gene names are not changed when I just open the file, so that would be a huge relief and would mean that I don't, in fact, actually have to worry about it. The problem arises if I try to enter another gene name in the same column in Excel, but it won't actually change the values that I dumped in the spreadsheet via write.xlsx.

ADD REPLYlink written 7 weeks ago by Friederike5.1k

Excel is not made for bioinformatics, that was clear. R can save your data into excel's "TEXT" (character string), but still excel will overrule it and will change SEPT11 to 11-Sep or 9/11/2019 (or something alike). The only way is still the wizard, so that means only text files. Unfortunately, because it would be nice when your methods would work without this error!

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Benn7.7k

Why does Excel take the Wizard's instructions more seriously than the encoded column type?

ADD REPLYlink written 7 weeks ago by Friederike5.1k

I think because microsoft wants people not to think for them selves, but let computers do that for them. When you are someone that has a mind of its own, they will force you to use the wizard...

ADD REPLYlink written 7 weeks ago by Benn7.7k

if it would let me IMPORT xlsx files, that'd be a compromise I could live with, but it won't

ADD REPLYlink written 7 weeks ago by Friederike5.1k

Can you tell me for which function you've observed that the gene names were changed in an xlsx file that you generated via R? As you can see in my last comment to Ram's suggestion, I actually realized that Excel seems to honor whatever content I've dumped in the xlsx file via write.xlsx, but it will mercilessly change any new entries that I do via Excel. That's a behavior I can live with since I never plan on actually doing anything in Excel and I can easily accompany each shared Excel file with a note that the data format MUST be manually changed to "text" if a users plans on adding any entries to that column within Excel.

ADD REPLYlink written 7 weeks ago by Friederike5.1k

Sorry I didn't want to imply that I was using write.xlsx. I know that excel is doing unasked stuff by default, that's why I commented. I always give .txt files to collaborators, if they lazy load them without the wizard, it is not my fault if you know what I mean ;-) But seems like you have a good solution now?

ADD REPLYlink written 7 weeks ago by Benn7.7k

at least on my machine with my applications it does seem to work as I describe above, yes! (the only caveat being that the user should make sure to manually adapt the column format to ensure that nothing is going to happen upon additional manipulation of the xlsx file)

ADD REPLYlink written 7 weeks ago by Friederike5.1k
2
gravatar for Benn
13 days ago by
Benn7.7k
Netherlands
Benn7.7k wrote:

Sorry for the late reaction, but I was using xlsx library, and found it can properly print gene names in a workbook (and I recalled this discussing here).

So if you use xlsx library, this works for me.

library(xlsx)

wb <- createWorkbook()

sheet <- createSheet(wb, "Sheet1")

 rows  <- createRow(sheet, rowIndex=1:2)    

cell.1 <- createCell(rows, colIndex=1)[[1,1]]
cell.2 <- createCell(rows, colIndex=1)[[2,1]]     
setCellValue(cell.1, "Sept11")
setCellValue(cell.2, "March2")

cs1 <- CellStyle(wb, dataFormat=NULL, font=NULL)

setCellStyle(cell.1, cs1)  
setCellStyle(cell.2, cs1)

saveWorkbook(wb, file="test.xlsx")

If I open the file the gene names do not automatically change to dates!

ADD COMMENTlink written 13 days ago by Benn7.7k

Thanks for sharing! That behavior is consistent with the openxlsx solution described above, too. Now the question is: what happens when you open your test.xlsx file with Excel and add another entry (e.g. Sept11) in the gene name column?

ADD REPLYlink written 13 days ago by Friederike5.1k

If you type another notorious symbol, it will be changed to date immediately... But why would your collaborators type another gene symbol?

ADD REPLYlink written 13 days ago by Benn7.7k

Because they're not scared of altering XLS files. Anyway, as I detailed above, I'm somewhat ok with that, but, on the other hand, this behavior tells me that the format type of the gene name column is not correctly assigned by either package. Just something to be aware of.

ADD REPLYlink written 13 days ago by Friederike5.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2042 users visited in the last hour