Question

Converting an Rdata set to an excel sheet

0

Entering edit mode

3.1 years ago

rahman.samir33 • 0

Hi, I have a .Rdata file with ENSEMBL IDs under different categories. I would like to export this data onto an excel sheet.

There are different numbers of genes in each category.

I tried using the R package xlsx to convert the data into an excel sheet, but received the following error:

write.xlsx2(mylist,"/Users/samirrahman/Downloads/TssA_Fetal.xlsx", sheetName = "Sheet1",
+ col.names = TRUE, row.names = TRUE, append = FALSE)
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 346, 269, 411, 1597, 498, 874, 214, 442

The different number of rows correspond to the different number of ENSEMBL IDs in each category. Can I split the Rdata into multiple files and then export the data onto multiple excel spreadsheets? I want to convert all the ENSEMBL IDs to gene names as well.

R Gene • 2.4k views

ADD COMMENT • link updated 3.1 years ago by rpolicastro 13k • written 3.1 years ago by rahman.samir33 • 0

0

Entering edit mode

Can you show an example of some of your data? dput(mylist[1:5])

Can you also give an example of what you want your excel sheet to look like?

ADD REPLY • link 3.1 years ago by rpolicastro 13k

0

Entering edit mode

Use write.xlsx2 with lapply. You may have to find a way to make adding sheet names work.

ADD REPLY • link 3.1 years ago by Ram 43k

0

Entering edit mode

file:///Users/samirrahman/Desktop/Screen%20Shot%202021-03-10%20at%203.15.21%20PM.png

Hi everyone. Thank you for the responses. I attached an image of what the file looks like on Rstudio. I am not a bioinformatician myself, so I have limited knowledge. My colleague gave me this file which basically contains lists of genes associated with different chromatin loops that have been assigned different chromatin states.

ADD REPLY • link 3.1 years ago by rahman.samir33 • 0

0

Entering edit mode

Use these directions: How to add images to a Biostars post

ADD REPLY • link 3.1 years ago by GenoMax 141k

0

Entering edit mode

I don't know if you will be able to see that image. Here is an example of what one list looks like:

mylist[["TssBiv_fetal"]]
  [1] "ENSG00000171729" "ENSG00000228549" "ENSG00000116329" "ENSG00000163873" "ENSG00000269113" "ENSG00000142700"
  [7] "ENSG00000187140" "ENSG00000143355" "ENSG00000266094" "ENSG00000196482" "ENSG00000143869" "ENSG00000115507"
 [13] "ENSG00000144355" "ENSG00000115844" "ENSG00000236651" "ENSG00000217236" "ENSG00000163364" "ENSG00000224577"
 [19] "ENSG00000163497" "ENSG00000163273" "ENSG00000144649" "ENSG00000163638" "ENSG00000239513" "ENSG00000183770"
 [25] "ENSG00000206262" "ENSG00000181804" "ENSG00000174963" "ENSG00000239381" "ENSG00000181449" "ENSG00000114315"
 [31] "ENSG00000163132" "ENSG00000215612" "ENSG00000109705" "ENSG00000197826" "ENSG00000163623" "ENSG00000138795"
 [37] "ENSG00000151615" "ENSG00000083857" "ENSG00000168621" "ENSG00000271788" "ENSG00000177721" "ENSG00000251493"
 [43] "ENSG00000113196" "ENSG00000137265" "ENSG00000096696" "ENSG00000158553" "ENSG00000204520" "ENSG00000204252"
 [49] "ENSG00000112837" "ENSG00000203809" "ENSG00000131015" "ENSG00000122584" "ENSG00000122691" "ENSG00000164651"
 [55] "ENSG00000006377" "ENSG00000128573" "ENSG00000106538" "ENSG00000218672" "ENSG00000236544" "ENSG00000164778"

ADD REPLY • link updated 3.1 years ago by GenoMax 141k • written 3.1 years ago by rahman.samir33 • 0

0

Entering edit mode

Thanks, and can you also describe what you want your resulting excel file to look like?

ADD REPLY • link 3.1 years ago by rpolicastro 13k

0

Entering edit mode

I would like to get a separate sheet for every list, with the Ensembl IDs in a single column. Do you know how I can convert the ensembl IDs to gene names?

ADD REPLY • link 3.1 years ago by rahman.samir33 • 0

0

Entering edit mode

Use biomaRt. Search the site - there are plenty of discussions on getting HGNC symbols from ENSG identifiers. Those two identifiers do not have 1:1 correspondence for all genes though, so watch out for that.

As for one sheet per list item, your list seems to be a named list. Like I mentioned in my comment, you may want to use names(mylist) for the sheet names. I think you may have to use a loop since you'd be iterating in parallel through two 1D objects: mylist and names(mylist)

ADD REPLY • link 3.1 years ago by Ram 43k

score 1 · Answer 1 · 2021-03-09

It mentions differing number of rows. This most likely means something is wrong with your "mylist" file.

I can think of two issues that could cause this. The first one, and probably most likely since you are working with some sort of external .RData file you say (which isn't really an internal data format used within an R script but is instead is the format R uses to store all data + environment into while performing the script, correct me if i'm wrong here.) So it's likely your mylist is in fact some type of multi dimensional type (i.e. a list of list, or a list of dataframes etc) containing multiple data types with differing lengths.

For example, this is some PCA data i worked on a while back... called a.pca.res

enter image description here

You can see that it's not a data.frame or list, but a 'list of 5' type. If you then look at the data itself, you can see it contains 5 individual "double" type's. If this is in fact some sort of object with different data's contained within it, the easiest way to normally grab that specifiic datatype is by calling it using a $. Most objects are made that way. So for example if i'm interested in rotation, i can call it up using a.pca.res$rotation.

enter image description here

A second reason this error comes up could be that your list or data.frame contains empty cells. R doesn't like to handle data that isn't consistent and will often throw errors if you try to do something like that. I can't easily reproduce a data frame with fixed colums but differing row lengths. But essentially it's comes down to this

enter image description here

That being said, i have had data imported into R that does actually contain inconsistent row lengths or column lengths due to empty cells. Often happens with .csv or .tsv files where a tabs or commas were missing. Normally while reading the file you would handle these by putting some (na.strings = "NA") while doing read.table or whatever you are doing.

Hope this helps, if not let me know and provide an example of your data or script so we can look closer at it, as this is not an issue of R or NCBI, but of your script, to which only you are privy.

Cheers, Lesley

score 1 · Answer 2 · 2021-03-15

I'm going to use an OrgDb from AnnotationHub here to generate some example data and to later convert gene IDs to gene names, but you can use biomaRt too.

library("AnnotationHub")

# Download the human OrgDb.
ah <- AnnotationHub()
orgdb <- query(ah, c("homo sapiens", "orgdb"))[[1]]

# Generate some example data.
samples <- LETTERS[1:2]
names(samples) <- samples
samples <- lapply(samples, function(x) sample(keys(orgdb, keytype="ENSEMBL"), 5))

> samples
$A
[1] "ENSG00000278658" "ENSG00000254055" "ENSG00000051382" "ENSG00000277483"
[5] "ENSG00000200755"

$B
[1] "ENSG00000259040" "ENSG00000052126" "ENSG00000231133" "ENSG00000265182"
[5] "ENSG00000164346"

Once you have the OrgDb you can map the IDs to names, and then use writexl::write_xlsx to save it to an excel file with each named list element being a tab.

# Map gene names/symbols to the IDs.
samples <- lapply(samples, function(x) select(orgdb, keys=x, columns="SYMBOL", keytype="ENSEMBL"))

> samples
$A
          ENSEMBL       SYMBOL
1 ENSG00000278658      MIR6826
2 ENSG00000254055 LOC105375851
3 ENSG00000051382       PIK3CB
4 ENSG00000277483    RN7SL321P
5 ENSG00000200755     RNA5SP68

$B
          ENSEMBL         SYMBOL
1 ENSG00000259040 BLOC1S5-TXNDC5
2 ENSG00000052126        PLEKHA5
3 ENSG00000231133          HAR1B
4 ENSG00000265182        SRP72P1
5 ENSG00000164346           NSA2

# Save named list to excel file with multiple sheets.
writexl::write_xlsx(samples, "file.xlsx")