Why am I unable to load my data from a tab separated file into R?
7
1
Entering edit mode
6.2 years ago
Mo ▴ 920

Hello, 

I don't know why I am getting some errors duing my analysis 

I uploaded an example of my data in

https://gist.github.com/anonymous/2c69ab500bfa94d0268a

I use the following command in R to load my data 

data <- read.delim("path to your file /example.txt", header=FALSE)

however, in summary or head or other commands I look at the data, it seems alright . but I cannot analysis since it gives 

error like "all numeric variables". For example if you want to get the range of the example data, you will get such error. 

How normally do you import , load a microarray data (with txt format)  (each row represents a prob and each column a sample) ?

Thanks 

 

microarray R programming • 8.3k views
ADD COMMENT
1
Entering edit mode

Please provide the file in an external link, it takes quite long to load and to scroll all the way down your table.

ADD REPLY
0
Entering edit mode

After reading several comments from you below, it seems that reading the help for range() might be useful to you.  In particular, it tells you that range works on any *numeric* or *character* objects.  Data frames are not *numeric* though they can contain numbers.  Also, range(), like almost all aggregation functions in R, will return NA when there is any NA in the data unless told to do otherwise.

ADD REPLY
3
Entering edit mode
6.2 years ago

Works perfectly for me:

> dat <- read.delim(file="Downloads/gist2c69ab500bfa94d0268a-ac4cd3d5b0d0764c2faae0e3fb0db8a39d75bb22/example.txt", row.names=1)
# your mistake was to set header=FALSE, and to omit 
# row.names=1 

> head(dat)
                 M1      M2      M3      M4      M5      M6      M7      M8      M9     M10     M11     M12
200645_at    0.0446  0.0744 -0.0340  0.0173  0.2280  0.0070 -0.0250  0.0644 -0.0253 -0.1230 -0.6251  0.0210
200690_at   -0.0165  0.1121 -0.0959  0.0000 -0.4595 -0.0282 -0.1617 -0.0482 -0.2611  0.0223 -0.6129  0.1961
200691_s_at  0.0554 -0.0689 -0.0852  0.0702  0.0823  0.0361 -0.0306 -0.0076 -0.0340 -0.0198 -0.1823 -0.0681
200692_s_at  0.0000 -0.0505 -0.0508 -0.0159 -0.3041 -0.0684 -0.0644 -0.0175  0.0503  0.0546 -0.2141 -0.0216
200693_at    0.0608  0.0601  0.0115  0.0744 -0.0232 -0.1095 -0.0416 -0.0499 -0.0515  0.0303 -0.1153  0.0824
200694_s_at  0.0424  0.0957  0.0758 -0.0387 -0.0517 -0.0207  0.0328 -0.1392  0.0140 -0.1476  0.1382  0.0113
                M13     M14     M15     M16     M17     M18
200645_at    0.1095  0.1527  0.0261 -0.2107 -0.0196 -0.2316
200690_at    0.2119  0.0122 -0.5495  0.1518 -0.2409  0.1610
200691_s_at  0.1219 -0.1615 -0.0729 -0.0696  0.0042  0.1239
200692_s_at  0.0440 -0.0811  0.0964  0.0211 -0.0325  0.1810
200693_at   -0.0036  0.0575  0.0427  0.1104 -0.0216  0.0278
200694_s_at  0.2247  0.1489  0.0196  0.0883 -0.1848  0.1989

> range(dat)
[1] -20.091  25.652
ADD COMMENT
2
Entering edit mode

Thanks Michael for such a valuable comment and very sharp to the point!!!

That is certainly the answer, I was looking for. I used something like below and it works just fine !

mydata <- read.delim(file="path to the data.txt", header=TRUE, row.names=1)

str(mydata) # to see the structure of my data
head(mydata, n=1) # to check the first line of my data
tail(mydata, n=1) # to check the last line of my data

 

ADD REPLY
0
Entering edit mode

OP pasted >1500 lines and called it a sample, actually meant sample - the original is 40K lines. I think somewhere down the line the datatype gets messed up.

ADD REPLY
0
Entering edit mode

Well, then I think he is would be wasting our time with incomplete information....

ADD REPLY
1
Entering edit mode

I am not wasting anybody time! a portion of the data which is not publicly available represent the entire data structure! 

 

ADD REPLY
0
Entering edit mode

Seemingly, if the code works, your data was representative, and I take it back :D In general it is tricky to debug given an incomplete reference dataset. I understand that some part of your data is private, however this can end up in 'works for me' -> 'doesn't work' ... cycles, where each person is talking about a different data-set. It is very important to help the people trying to help with as much information as possible, or people might get frustrated.

ADD REPLY
2
Entering edit mode
6.2 years ago

I think you want:

data <- read.delim("path to your file /example.txt", header=TRUE)

Note header=TRUE. With header=FALSE, all the columns are set to factor by default, probably you want all but the first column to be numeric.

(By the way, for the future reporting the exact error message and the command that generated it would help)

Dario

ADD COMMENT
0
Entering edit mode

Hi Dario, 

Thanks for your comment, but I still get the following error when I turn the header to true

> range.raw <- range(example)
Error in FUN(X[[1L]], ...) : 
  only defined on a data frame with all numeric variables

 

 

ADD REPLY
1
Entering edit mode
6.2 years ago

read.delim is just a wrapper for read.table(); it's often easier to just use read.table() and let it infer separations, column types, etc.  In particular,

df <- read.table('/path/to/example.txt.txt',header=TRUE)

works file for me on your data.

 

ADD COMMENT
0
Entering edit mode

The same I still get error 

range.raw <- range(example)
Error in FUN(X[[1L]], ...) : 
  only defined on a data frame with all numeric variables

what I did was to keep the probes ID in a separate file as follows: 

rprobes <- example[,1] 

then tried to get the data matrix by using the following function.
data <- data.matrix(example[,2:ncol(example)])

it seems that data.matrix changes the value of my data

ADD REPLY
2
Entering edit mode

Well, the first column certainly doesn't hold 'all numeric variables'. Shouldn't you define row.names=1 or something like that for read.table? Those error messages more often than not tell what the problem is..

ADD REPLY
1
Entering edit mode

But your data file doesn't have all numeric variables, so of course your data frame doesn't have all numeric variables.  What are you planning to do with the probe IDs?

If it's alright to just ignore them,  just strip them off

df2 <- df[,-c(1)]

and proceed using df2; or pass off the data to (say) range with something like

range(df[,-c(1)])

[1] -20.091  25.652

 Otherwise, if you want to (say) strip of the _at, _x_at, and _s_ats, and treat the rest as a number (I have no idea if that's ok; will the remaining ids be unique?  Should they be?), you can do that easily enough too:

df[,1] <- as.numeric(gsub("_.*","",as.character(df[,1])))

 

 

ADD REPLY
0
Entering edit mode
6.2 years ago
Ram 32k

Looks like your data might not be delimited properly (space-delimited with random spaces between columns). You might either wanna check that out or explore how to treat consecutive delimiters as one.

ADD COMMENT
0
Entering edit mode

if I `cat -A <file>` it looks well formatted.
 

ADD REPLY
0
Entering edit mode

So, all tabs?

ADD REPLY
0
Entering edit mode

It looks like that

ADD REPLY
0
Entering edit mode

Hi Ram, Might be the problem but I am working on it and so far could not find whether the problem is 

ADD REPLY
1
Entering edit mode

You said this gist is just the sample, right? If you could give us the actual file, we can figure out what the problem is.

ADD REPLY
0
Entering edit mode
6.2 years ago
dago ★ 2.7k

You might want to try to set

read.delim(...stringsaAsFactors =False)

 

ADD COMMENT
0
Entering edit mode

I think stringsaAsFactors =False is for data.table and not read.delim

However, this also does not help

ADD REPLY
0
Entering edit mode
6.2 years ago
Manvendra Singh ★ 2.1k
read.table("file", header=TRUE, stringsAsFactors=FALSE, sep="\t", dec=".")

should work

ADD COMMENT
0
Entering edit mode

thanks for your comment but it does not work unfortunately 

ADD REPLY
0
Entering edit mode
Dude, then you must try to import your file in excel sheet, if it gets imported then copy and paste from excel to notepad and then read it in R
ADD REPLY
0
Entering edit mode

I would not do that, imagine you have over 40000 probes and 1500 samples, would you personally copy and paste in the notepad ? 

 

ADD REPLY
0
Entering edit mode
Yes, Atleast first 10 rows to see what exactly problem is
ADD REPLY
0
Entering edit mode

Excel is not what I'd recommend, but bulk copy-paste is super easy. Ctrl+Shift+direction will copy to the last row or column of the range in use in the direction you select.

ADD REPLY
0
Entering edit mode

for sure it is easy if and only if your data is small. It is definitely not a good way to copy and paste your files over and over because of systematic error ! 

I personally avoid such things but for sure if you are working with 20 samples and 100 variables , more or less, it would be convenience to do such things 

ADD REPLY
0
Entering edit mode

I'd probably use it to ensure data type consistency. I avoid Excel as much as possible. UNIX is much much better with large files.

ADD REPLY
0
Entering edit mode
6.2 years ago
TriS ★ 4.3k

this worked for me

df <- read.table("test.txt", header=T, row.names = 1)
df <- apply(df, 2, function(x) sapply(x, as.numeric))
range(df)

> range(df)
-20.091  25.652

 

the key is to apply to each string the sapply function to make it numeric

ADD COMMENT
1
Entering edit mode

Thanks for your comment, however, it does not work for me. 

When I ran the following command on my data , i got an error which means there are some blank 

df <- read.table("test.txt", header=T, row.names = 1)

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 1 did not have 1085 elements

I solved the problem by adding "fill =TRUE"

However, I am sure I have 1500 columns but the data was wither 1800 columns 

Then I siad ok, might the problem solve by the other command 

df <- apply(df, 2, function(x) sapply(x, as.numeric))

But it did not , and of course the range was NA NA . any clue where the problem might be ?

ADD REPLY
0
Entering edit mode

add sep="\t" to the read.file() or use read.delim()...but looks like I am running a lil late and you already go an answer :). 

ADD REPLY
0
Entering edit mode

Hello TriS, 

Thanks for your comment, Yes I have got to the answer. my mistake was because of  header =False and also I did not use the row.names=1 when I was importing my file by "read.delim" function 

ADD REPLY

Login before adding your answer.

Traffic: 1653 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6