Why am I unable to load my data from a tab separated file into R?
7
1
Entering edit mode
8.0 years ago
Mo ▴ 920

Hello,

I don't know why I am getting some errors during my analysis

I uploaded an example of my data in

I use the following command in R to load my data

data <- read.delim("path to your file /example.txt", header=FALSE)


however, in summary or head or other commands I look at the data, it seems alright but I cannot analysis since it gives error like all numeric variables. For example if you want to get the range of the example data, you will get such error.

How normally do you import, load a microarray data (with txt format) (each row represents a prob and each column a sample)?

Thanks

microarray programming R • 11k views
1
Entering edit mode

Please provide the file in an external link, it takes quite long to load and to scroll all the way down your table.

0
Entering edit mode

After reading several comments from you below, it seems that reading the help for range() might be useful to you. In particular, it tells you that range works on any numeric or character objects. Data frames are not numeric though they can contain numbers. Also, range(), like almost all aggregation functions in R, will return NA when there is any NA in the data unless told to do otherwise.

3
Entering edit mode
8.0 years ago

Works perfectly for me:

> dat <- read.delim(file="Downloads/gist2c69ab500bfa94d0268a-ac4cd3d5b0d0764c2faae0e3fb0db8a39d75bb22/example.txt", row.names=1)
# row.names=1

M1      M2      M3      M4      M5      M6      M7      M8      M9     M10     M11     M12
200645_at    0.0446  0.0744 -0.0340  0.0173  0.2280  0.0070 -0.0250  0.0644 -0.0253 -0.1230 -0.6251  0.0210
200690_at   -0.0165  0.1121 -0.0959  0.0000 -0.4595 -0.0282 -0.1617 -0.0482 -0.2611  0.0223 -0.6129  0.1961
200691_s_at  0.0554 -0.0689 -0.0852  0.0702  0.0823  0.0361 -0.0306 -0.0076 -0.0340 -0.0198 -0.1823 -0.0681
200692_s_at  0.0000 -0.0505 -0.0508 -0.0159 -0.3041 -0.0684 -0.0644 -0.0175  0.0503  0.0546 -0.2141 -0.0216
200693_at    0.0608  0.0601  0.0115  0.0744 -0.0232 -0.1095 -0.0416 -0.0499 -0.0515  0.0303 -0.1153  0.0824
200694_s_at  0.0424  0.0957  0.0758 -0.0387 -0.0517 -0.0207  0.0328 -0.1392  0.0140 -0.1476  0.1382  0.0113
M13     M14     M15     M16     M17     M18
200645_at    0.1095  0.1527  0.0261 -0.2107 -0.0196 -0.2316
200690_at    0.2119  0.0122 -0.5495  0.1518 -0.2409  0.1610
200691_s_at  0.1219 -0.1615 -0.0729 -0.0696  0.0042  0.1239
200692_s_at  0.0440 -0.0811  0.0964  0.0211 -0.0325  0.1810
200693_at   -0.0036  0.0575  0.0427  0.1104 -0.0216  0.0278
200694_s_at  0.2247  0.1489  0.0196  0.0883 -0.1848  0.1989

> range(dat)
[1] -20.091  25.652

2
Entering edit mode

Thanks Michael for such a valuable comment and very sharp to the point!!!

That is certainly the answer, I was looking for. I used something like below and it works just fine !

mydata <- read.delim(file="path to the data.txt", header=TRUE, row.names=1)

str(mydata) # to see the structure of my data
head(mydata, n=1) # to check the first line of my data
tail(mydata, n=1) # to check the last line of my data

0
Entering edit mode

OP pasted >1500 lines and called it a sample, actually meant sample - the original is 40K lines. I think somewhere down the line the datatype gets messed up.

0
Entering edit mode

Well, then I think he is would be wasting our time with incomplete information....

1
Entering edit mode

I am not wasting anybody time! a portion of the data which is not publicly available represent the entire data structure!

0
Entering edit mode

Seemingly, if the code works, your data was representative, and I take it back :D In general it is tricky to debug given an incomplete reference dataset. I understand that some part of your data is private, however this can end up in 'works for me' -> 'doesn't work' ... cycles, where each person is talking about a different data-set. It is very important to help the people trying to help with as much information as possible, or people might get frustrated.

2
Entering edit mode
8.0 years ago

I think you want:

data <- read.delim("path to your file /example.txt", header=TRUE)

Note header=TRUE. With header=FALSE, all the columns are set to factor by default, probably you want all but the first column to be numeric.

(By the way, for the future reporting the exact error message and the command that generated it would help)

Dario

0
Entering edit mode

Hi Dario,

Thanks for your comment, but I still get the following error when I turn the header to true

> range.raw <- range(example)
Error in FUN(X[[1L]], ...) :
only defined on a data frame with all numeric variables

1
Entering edit mode
8.0 years ago

read.delim is just a wrapper for read.table(); it's often easier to just use read.table() and let it infer separations, column types, etc. In particular,

df <- read.table('/path/to/example.txt.txt',header=TRUE)


works file for me on your data.

0
Entering edit mode

The same I still get error

range.raw <- range(example)
Error in FUN(X[[1L]], ...) :
only defined on a data frame with all numeric variables


What I did was to keep the probes ID in a separate file as follows:

rprobes <- example[,1]


then tried to get the data matrix by using the following function.

data <- data.matrix(example[,2:ncol(example)])


it seems that data.matrix changes the value of my data

2
Entering edit mode

Well, the first column certainly doesn't hold 'all numeric variables'. Shouldn't you define row.names=1 or something like that for read.table? Those error messages more often than not tell what the problem is..

1
Entering edit mode

But your data file doesn't have all numeric variables, so of course your data frame doesn't have all numeric variables. What are you planning to do with the probe IDs?

If it's alright to just ignore them, just strip them off

df2 <- df[,-c(1)]


and proceed using df2; or pass off the data to (say) range with something like

range(df[,-c(1)])
[1] -20.091  25.652


Otherwise, if you want to (say) strip of the _at, _x_at, and _s_ats, and treat the rest as a number (I have no idea if that's ok; will the remaining ids be unique? Should they be?), you can do that easily enough too:

df[,1] <- as.numeric(gsub("_.*","",as.character(df[,1])))

0
Entering edit mode
8.0 years ago
Ram 37k

Looks like your data might not be delimited properly (space-delimited with random spaces between columns). You might either wanna check that out or explore how to treat consecutive delimiters as one.

0
Entering edit mode

if I cat -A <file> it looks well formatted.

0
Entering edit mode

So, all tabs?

0
Entering edit mode

It looks like that

0
Entering edit mode

Hi Ram, Might be the problem but I am working on it and so far could not find whether the problem is

1
Entering edit mode

You said this gist is just the sample, right? If you could give us the actual file, we can figure out what the problem is.

0
Entering edit mode
8.0 years ago
dago ★ 2.7k

You might want to try to set

read.delim(...stringsaAsFactors =False)

0
Entering edit mode

I think stringsaAsFactors =False is for data.table and not read.delim

However, this also does not help

0
Entering edit mode
8.0 years ago
Manvendra Singh ★ 2.2k
read.table("file", header=TRUE, stringsAsFactors=FALSE, sep="\t", dec=".")


should work

0
Entering edit mode

Thanks for your comment but it does not work unfortunately

0
Entering edit mode
Dude, then you must try to import your file in excel sheet, if it gets imported then copy and paste from excel to notepad and then read it in R
0
Entering edit mode

I would not do that, imagine you have over 40000 probes and 1500 samples, would you personally copy and paste in the notepad?

0
Entering edit mode
Yes, Atleast first 10 rows to see what exactly problem is
0
Entering edit mode

Excel is not what I'd recommend, but bulk copy-paste is super easy. Ctrl+Shift+direction will copy to the last row or column of the range in use in the direction you select.

0
Entering edit mode

For sure it is easy if and only if your data is small. It is definitely not a good way to copy and paste your files over and over because of systematic error!

I personally avoid such things but for sure if you are working with 20 samples and 100 variables , more or less, it would be convenience to do such things

0
Entering edit mode

I'd probably use it to ensure data type consistency. I avoid Excel as much as possible. UNIX is much much better with large files.

0
Entering edit mode
8.0 years ago
TriS ★ 4.6k

This worked for me

df <- read.table("test.txt", header=T, row.names = 1)
df <- apply(df, 2, function(x) sapply(x, as.numeric))
range(df)

> range(df)
-20.091  25.652


The key is to apply to each string the sapply function to make it numeric

1
Entering edit mode

Thanks for your comment, however, it does not work for me.

When I ran the following command on my data, I got an error which means there are some blank

df <- read.table("test.txt", header=T, row.names = 1)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
line 1 did not have 1085 elements


I solved the problem by adding fill =TRUE

However, I am sure I have 1500 columns but the data was wither 1800 columns

Then I said ok, might the problem solve by the other command

df <- apply(df, 2, function(x) sapply(x, as.numeric))


But it did not, and of course the range was NA NA. Any clue where the problem might be?

0
Entering edit mode

Add sep="\t"to the read.file() or use read.delim() but it looks like I am running a little late and you already go an answer :).

0
Entering edit mode

Hello TriS,

Thanks for your comment, Yes I have got to the answer. My mistake was because of header =False and also I did not use the row.names=1 when I was importing my file by read.delim function