Reading a tab separated file
1
0
Entering edit mode
18 months ago
nazmuss221 • 0

I have a file like the following, when i try to read with read.delim

dataset4 <- read.delim("./Documents/time_series/cbe.dat",sep = "/t", stringsAsFactors = FALSE)


it would not work , it gives me the following error:

> invalid value for 'sep': must be a byte


but the dataset is clearly separated with /t, what i am doing wrong here?

I have also tried with read.table. It reads the dataset well but the values are read as characters not as numeric .

The dataset looks like the following :

"choc"/t"beer"/t"elec"
"1451"/t"96.3"/t"1497"
"2037"/t"84.4"/t"1463"
"2477"/t"91.2"/t"1648"
"2785"/t"81.9"/t"1595"
"2994"/t"80.5"/t"1777"
"2681"/t"70.4"/t"1824"
"3098"/t"74.8"/t"1994"
"2708"/t"75.9"/t"1835"
"2517"/t"86.3"/t"1787"
"2445"/t"98.7"/t"1699"
"2087"/t"100.9"/t"1633"
"1801"/t"113.8"/t"1645"
"1216"/t"89.8"/t"1597"
"2173"/t"84.4"/t"1577"
"2286"/t"87.2"/t"1709"
"3121"/t"85.6"/t"1756"
"3458"/t"72"/t"1936"
"3511"/t"69.2"/t"2052"
"3524"/t"77.5"/t"2105"
"2767"/t"78.1"/t"2016"

r • 546 views
ADD COMMENT
2
Entering edit mode
"\t"

ADD REPLY
1
Entering edit mode

Maybe you should try \/t

ADD REPLY
0
Entering edit mode

NO, Does not work either, the dataset looks like the following now with "\t" :

choc.tbeer.telec
1451/t96.3/t1497
2037/t84.4/t1463
2477/t91.2/t1648
2785/t81.9/t1595
2994/t80.5/t1777
6 2681/t70.4/t1824

ADD REPLY
0
Entering edit mode

/t is not a valid character. How did this even happen? Did someone hand-craft this mangled dataset?

ADD REPLY
0
Entering edit mode

My guess is that someone mis-typed /t instead of \t. Meant to use a tab, ended up using "/t"

ADD REPLY
0
Entering edit mode

Looks like someone intended to write a tab-separated file with \t and actually wrote a "custom" file format with /t. I would correct the file rather than trying to read it in as is.

ADD REPLY
0
Entering edit mode

I changed the backslash with a forwardslash. Now after executing the read.delim command as asked in the question , the head of the dataframe looks like this:

head(dataset4)
choc.tbeer.telec
1 1451\\t96.3\\t1497
2 2037\\t84.4\\t1463
3 2477\\t91.2\\t1648
4 2785\\t81.9\\t1595
5 2994\\t80.5\\t1777
6 2681\\t70.4\\t1824

ADD REPLY
0
Entering edit mode

With read.delim you do not need to specify the separator as tabulator is used by default. Try this:

dataset4 <- read.delim("./Documents/time_series/cbe.dat", colClasses = rep("numeric",3))

ADD REPLY
0
Entering edit mode

OP's delimiter is not a tab. It's two characters - forward-slash followed by t. Across all platforms, that is two distinct characters, not an escape sequence.

ADD REPLY
0
Entering edit mode

Should you file be separeted by \\t?

You could try changing the separator field to "\t". Try sed -i 's/\\t/\t/g' your_file. Do this in a copy of your file to test it!

ADD REPLY
0
Entering edit mode

can you try this hack with OP text?

df=read.csv("~/Desktop/test.txt", sep="t", strip.white = T)
data.frame(apply(df,2, function (x) gsub("[^0-9.-]","", x)))


output would be some thing like this:

 > data.frame(apply(df,2, function (x) gsub("[^0-9.-]","", x)))
choc. beer. elec
1   1451  96.3 1497
2   2037  84.4 1463

ADD REPLY
0
Entering edit mode

or sep on slash :(

df = read.delim("~/test.tsv", sep ="/")
df = apply(gsub("^t", "", as.matrix(df)), 2, as.numeric)
colnames(df) = gsub("^t","",colnames(df))

ADD REPLY
0
Entering edit mode

Try following. It woorked.

library(magrittr)
readr::read_delim("inputfile.txt" ,delim = "\"/t\"")  %>%
dplyr::select("choc","beer" ,"elec")

# A tibble: 20 x 3
choc  beer  elec
<dbl> <dbl> <dbl>
1  1451  96.3  1497
2  2037  84.4  1463
3  2477  91.2  1648
4  2785  81.9  1595
5  2994  80.5  1777
6  2681  70.4  1824
7  3098  74.8  1994
8  2708  75.9  1835
9  2517  86.3  1787
10  2445  98.7  1699
11  2087 101.   1633
12  1801 114.   1645
13  1216  89.8  1597
14  2173  84.4  1577
15  2286  87.2  1709
16  3121  85.6  1756
17  3458  72    1936
18  3511  69.2  2052
19  3524  77.5  2105
20  2767  78.1  2016


Created on 2020-04-29 by the reprex package (v0.3.0)

ADD REPLY
2
Entering edit mode
18 months ago
malteherold ▴ 60

As many answers suggested, your file is actually not a tab separated file but a "/t" separated file. So you can either read in the file as is with some trickery (see above), or with your own function, but the best would be to replace the "/t" characters, depending on your platform. When editing the text of the file you have to replace the "/t" characters with tabs, so not with literal "\t" but with something that interprets "\t" as tabs (for example the find and replace function in gedit).

With sed you have to be a bit careful:

https://unix.stackexchange.com/questions/145299/simple-sed-replacement-of-tabs-mysteriously-failing

This would work:

sed -i "s/\/t/\$(printf '\t')/g" ~/test.txt


Initially I tried to run this as a systemcall in R, but I gave up pretty quickly...

ADD COMMENT

Login before adding your answer.

Traffic: 1634 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6