Error in reading tab delimited file
0
0
Entering edit mode
2.5 years ago
arshad1292 ▴ 100

I have two files 1) leafdata file with readcount values 2) metadata file with sample information... both are in tab delimited format. They look like this:

Data file:

genus                     sample1 sample2 sample3 sample4 ........ sample206
Massilistercora           26       419    16      2974             159
Aminipila                 104      59     183     2594             209
Mogibacterium             502      971    591     218              2974 
Flintibacter              418      0      981     2397             264
.
.

Metadata file:

samplename    group      timepoint   gender
sample1       case       A           M
sample2       control    B           F
sample3       control    A           F
.
.
.
sample206     case       E           M

I loaded the data into R as below:

testdata <- read.table("leafdata.txt", sep = "\t", header = TRUE, check.names = FALSE)

Then checked the dimension as below:

dim(testdata)
2874 207

However when I loaded the metadata as below:

leafmetadata <- read.table("metadata.txt", sep = "\t", header = TRUE, check.names = FALSE)

Then dimensions as below:

dim(leafmetadata)
206 4 

My question is why do I get number 206 for metadata but 207 for the leafdata even though my sample number is same in both files? This is what causing error for further analysis. Am I reading the file incorrectly in R?

I would really appreciate if some expert could please help me to solve this issue. Many thanks in advance!

R tab-delimited • 943 views
ADD COMMENT
1
Entering edit mode

There is no error and please go through your data before you post this kind of queries. Hint: "genus"

ADD REPLY
0
Entering edit mode

Thanks for pointing that out. I also tried by removing "genus" but still get the same error.

ADD REPLY
0
Entering edit mode

You need dos2unix or unix2dos, depending on which system you are using, and the system on which the file was created.

ADD REPLY
0
Entering edit mode

Can you try following code in R (one or both of them) and print the output here?:

> length(names(testdata)[!names(testdata) %in% "genus"])    
> isTRUE(all.equal (leaftmetadata$samplename, names(testdata)[!names(testdata) %in% "genus"]))
> identical (leaftmetadata$samplename, names(testdata)[!names(testdata) %in% "genus"])

Assuming that following data exists:

testdata <- read.table("leafdata.txt", sep = "\t", header = TRUE, check.names = FALSE)
leafmetadata <- read.table("metadata.txt", sep = "\t", header = TRUE, check.names = FALSE)
length(names(testdata)[!names(testdata) %in% "genus"])
ADD REPLY

Login before adding your answer.

Traffic: 2523 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6