Question: How to stop R creating syntactically correct names
2
gravatar for emmapead2
3.7 years ago by
emmapead260
emmapead260 wrote:

Hello,

So I have a list of metabolites and peak intensities which I'm bring into R for analysis then I need to bring it out again. Unfortunately, R changes the names of my metabolites so that they are syntactically correct

eg. L-Leucine is changed to L.Leucine

this poses a significant hurdle as if I'm to compare to other resources for my analysis the metabolite won't be recognised because of the syntax. The problem is further amplified when the metabolite names get more complicated. I've tried looking for patterns to change the names using awk etc but theres no definitive pattern and I have LOTS of data.

Is there a way to stop R doing this or any method to undo the changes when I bring the data out of R?

(I've already tried check.names = F, and this does not work)

R metabolomics • 1.2k views
ADD COMMENTlink modified 3.7 years ago by ablanchetcohen1.2k • written 3.7 years ago by emmapead260
2

check.names=F works for me. Also, this is probably better asked at stackoverflow

ADD REPLYlink written 3.7 years ago by fanli.gcb690
1

I may try there, but thought I'd post on here just in case anyone has done this kind of analysis with metabolomics data specifically and may have advice.

ADD REPLYlink written 3.7 years ago by emmapead260
1

Regarding there being no definitive pattern, the details are given at the make.names help page

ADD REPLYlink written 3.7 years ago by russhh5.1k
1

I've tried this already but make.names does not like replicates. It will add ".1" on the end. This is still an issue as I'm comparing multiple sets which may have the same metabolites and comparing e.g L.Leucine.1 in one file to L.Leucine.1 in another does not necessarily mean they are identified as the same. To add another lovely layer of confusion to it all i can't just remove the ".1" as some of the metabolites have ".1" in their name

ADD REPLYlink written 3.7 years ago by emmapead260

For completeness, could you please add your R code in your question?

ADD REPLYlink written 3.6 years ago by Egon Willighagen5.2k
3
gravatar for ablanchetcohen
3.7 years ago by
ablanchetcohen1.2k
Canada
ablanchetcohen1.2k wrote:

When reading the information from a file, check.names = FALSE works for me, with either read.table of fread. If during further processing, the names have to be converted to syntactically valid names, I just store the names after reading them, and then restore the names right before outputting them, at the end of the processing. Although, you could try and convert the syntactically correct names to the original names using a reverse function, it is much simpler to just store the original names when you first read them. Restoring them before outputting them is then trivial.

ADD COMMENTlink written 3.7 years ago by ablanchetcohen1.2k
1

Ahhhhhh so I think my problem has been is that I'm actually reading it from excel (nobody shout at me, I was given the data like this :D) so it stores the names of my metabolites as row names which really need to be different for anything further to work. So I guess if I just store my metabolites as a dataframe then allow R to change the rows so it understands them then before i put it out just cbind it all? This what you mean?

ADD REPLYlink written 3.7 years ago by emmapead260
2

I don't believe row names need to respect the R syntax for variable names. For the column names in Excel files, some packages like openxlsx do come with the option check.names = FALSE, so it makes no difference whether you read the data from an Excel file or a text file. If, for some reason, the row names are being transformed, you can also store the rownames as a column, in which case any string will be accepted. For further downstream processing where you would use the row names as variables names, just use the same simple strategy, store the names before processing, and restore them before outputting the names.

ADD REPLYlink written 3.7 years ago by ablanchetcohen1.2k
1

Thanks very much, I'll give it a go. Just strange how check.names = F regardless of which package I use to bring in the file just does not work at all, especially if their are replicates. Which is a major issue as I need to keep them but also be able to trace back their corresponding values which I've described why its an issue in one of my above comments

ADD REPLYlink written 3.7 years ago by emmapead260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2059 users visited in the last hour