Question: Read a text file into a list
0
gravatar for Assa Yeroslaviz
2.4 years ago by
Assa Yeroslaviz1.2k
Munich
Assa Yeroslaviz1.2k wrote:

Hi,

I have a text file i would like to read into a list structure in R.

the files is something like that (which might be describe as a list of data frames):

[[1]]
                   NAME  MEM.SHIP
FBgn0037415 FBgn0037415 0.8035441
FBgn0010812 FBgn0010812 0.6579683
FBgn0265351 FBgn0265351 0.6443309
...
[[3]]
                   NAME  MEM.SHIP
FBgn0037227 FBgn0037227 0.9997242
FBgn0040682 FBgn0040682 0.9997242
...
[[9]]
                   NAME  MEM.SHIP
FBgn0026620 FBgn0026620 0.5241095
FBgn0263619 FBgn0263619 0.5420427
FBgn0263353 FBgn0263353 0.9812295
FBgn0037424 FBgn0037424 0.9793901
FBgn0037428 FBgn0037428 0.9779420
FBgn0037430 FBgn0037430 0.9540148
FBgn0004777 FBgn0004777 0.8962534
FBgn0004778 FBgn0004778 0.9810570
...

I would like it to have a list structure like that at the end:

> str(INPUT)
List of 3
 $ : Factor w/ 223 levels "FBgn*****",..: 194 129 222 213 42 130 45 131 132 133 ...
 $ : Factor w/ 210 levels "FBgn*****",..: 185 109 110 146 171 175 111 17 112 209 ...
 $ : Factor w/ 343 levels "FBgn*****",..: 27 296 326 228 229 263 19 39 230 26

I am reading the file in with scan, but I just get a character vector of all the elements together. I was wondering if there is a way to split the text file into a list by the pattern [[.*]] and than extract only the first column from each data frame.

thanks in advance

Assa

list R • 804 views
ADD COMMENTlink modified 2.4 years ago by zx87547.9k • written 2.4 years ago by Assa Yeroslaviz1.2k

Do you have any reason for not using a data.frame? That's a more straightforward data-container imho.

ADD REPLYlink written 2.4 years ago by WouterDeCoster40k

yes i know. I wish I could. I can't change the input files. This is how I got them. I think this is a list structure exported to a text file.

ADD REPLYlink written 2.4 years ago by Assa Yeroslaviz1.2k

Hello Assa Yeroslaviz!

We believe that this post does not fit the main topic of this site.

Not a bioinformatics question. Please ask StackOverflow

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink written 2.4 years ago by RamRS23k
1
gravatar for PoGibas
2.4 years ago by
PoGibas4.8k
Vilnius
PoGibas4.8k wrote:

This is the ugliest solution ever. I hope that output is what OP had in mind.
If anyone knows how to improve this code please help.

    pathInput <- "FILE"
    library(data.table)
    library(tidyverse)

   # Read in 
    d <- pathInput %>%
        read_delim(delim = "[[") %>%
        setDT()
   # Have to do this on different object
   # Dont't know why
    d2 <- as.data.table(d[, X1]) %>%
        .[!is.na(V1)] %>%
        .[, col1 := sapply(strsplit(V1, " "), "[[", 1)] %>%
        .[, col2 := sapply(strsplit(V1, " "), "[[", 2)] %>%
        .[, col3 := sapply(strsplit(V1, " "), "[[", 3)]
   # Where new table starts
    foo <- grep("NAME", d2$V1)
   # Write to list
    res <- list()
    for(i in 1:length(foo)) {
        if (i < length(foo)) {
            res[[i]] <- d2[(foo[i] + 1):(foo[i + 1] - 1), .(col1, col2, col3)]
        } else {
            res[[i]] <- d2[(foo[i] + 1):nrow(d2), .(col1, col2, col3)]
        }
    }

res returns:

[[1]]
          col1        col2      col3
1: FBgn0037415 FBgn0037415 0.8035441
2: FBgn0010812 FBgn0010812 0.6579683
3: FBgn0265351 FBgn0265351 0.6443309

[[2]]
          col1        col2      col3
1: FBgn0037227 FBgn0037227 0.9997242
2: FBgn0040682 FBgn0040682 0.9997242

[[3]]
          col1        col2      col3
1: FBgn0026620 FBgn0026620 0.5241095
2: FBgn0263619 FBgn0263619 0.5420427
3: FBgn0263353 FBgn0263353 0.9812295
4: FBgn0037424 FBgn0037424 0.9793901
5: FBgn0037428 FBgn0037428 0.9779420
6: FBgn0037430 FBgn0037430 0.9540148
7: FBgn0004777 FBgn0004777 0.8962534
8: FBgn0004778 FBgn0004778 0.9810570

From here you can create your tidy tables.

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by PoGibas4.8k

this look very nice (and tidy :D ). thanks

ADD REPLYlink written 2.4 years ago by Assa Yeroslaviz1.2k

I still don't know if this is the thing you wanted to do?

ADD REPLYlink written 2.4 years ago by PoGibas4.8k

me neither, I will try it though

ADD REPLYlink written 2.4 years ago by Assa Yeroslaviz1.2k
1
gravatar for zx8754
2.4 years ago by
zx87547.9k
London
zx87547.9k wrote:

Another cleaning version:

# read as lines, every line is a character
x <- readLines("myFile.txt")

# split on "[["
x <- split(x, cumsum(grepl("[[", x, fixed = TRUE)))

# tidy up
clean <-
  lapply(x, function(i){
    # get list id
    listID <- as.numeric(gsub("\\D+", "", i[1]))
    # column names
    header <- unlist(strsplit(gsub("\\s+", " ", trimws(i[2])), " "))
    # split on " " and convert to dataframe
    res <- as.data.frame(do.call(rbind, strsplit(tail(i, -2), " ")))[, 2:3]
    # add name and list id
    colnames(res) <- header
    res$ListID <- listID
    res
  })
ADD COMMENTlink written 2.4 years ago by zx87547.9k
0
gravatar for ivivek_ngs
2.4 years ago by
ivivek_ngs4.8k
Seattle,WA, USA
ivivek_ngs4.8k wrote:

pletny of way to do that I believe depending how you want to do it. Take a look at the similar solutions in stackoverflow.

Link1

Link2

Link3

You can come up with a solution from there. However this is not a bioinformatics question tbh. So try to use stackoverflow for such queries.

ADD COMMENTlink written 2.4 years ago by ivivek_ngs4.8k

thanks for the links. I have seen them all before I posted this question here. I have tried them, but couldn't get what I was trying to do. I know there are supposedly plenty of ways to achieve it. I just can't do it myself.

IMHO this is still bioinformatics related, as it related to R, it works on biological data processed by bioinformatic tools and it's related to my work. But I can post this question to the stack overflow site. I'll try it there. thanks

ADD REPLYlink written 2.4 years ago by Assa Yeroslaviz1.2k
1

Hmm, it does fall on the border, but this is more CS than actual bioinformatics. Anyway, I'll reopen the question.

ADD REPLYlink written 2.4 years ago by RamRS23k
0
gravatar for VHahaut
2.4 years ago by
VHahaut1.1k
Belgium
VHahaut1.1k wrote:

Can you not solve that using the split command? Something like this:

    # Read the data:
        a <- list(read.table(sep="\t", text = "rows NAME    MEM.SHIP
        FBgn0026620 FBgn0026620 0.5241095
        FBgn0263619 FBgn0263619 0.5420427
        FBgn0263353 FBgn0263353 0.9812295
        FBgn0037424 FBgn0037424 0.9793901
        FBgn0037428 FBgn0037428 0.9779420
        FBgn0037430 FBgn0037430 0.9540148", header=T), 
        read.table(text="rows   NAME    MEM.SHIP
        FBgn0037415 FBgn0037415 0.8035441
        FBgn0037430 FBgn0037430 0.6579683
        FBgn0265351 FBgn0265351 0.6443309", sep="\t", header=T))

    # Combine the lists:
    b <- do.call(rbind, a)

    # Extract MEM.SHIP info

    sapply(split(b, b$NAME), function(x) x["MEM.SHIP"])

    # Which give you this:

        $FBgn0026620.MEM.SHIP
        [1] 0.5241095

        $FBgn0037424.MEM.SHIP
        [1] 0.9793901

        $FBgn0037428.MEM.SHIP
        [1] 0.977942

        $FBgn0037430.MEM.SHIP
        [1] 0.9540148 0.6579683
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by VHahaut1.1k

thanks for the advice, but as I mentioned above, this is not what i want to achieve. I would like to keep each group as one vector in a list of vectors.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Assa Yeroslaviz1.2k

Maybe I still don't understand but have you simply tried to do str() on my output.

str(sapply(split(do.call(rbind, a), do.call(rbind, a)$NAME), function(x) x["MEM.SHIP"]))

List of 8
 $ FBgn0026620.MEM.SHIP: num 0.524
 $ FBgn0037424.MEM.SHIP: num 0.979
 $ FBgn0037428.MEM.SHIP: num 0.978
 $ FBgn0037430.MEM.SHIP: num [1:2] 0.954 0.658

and if you want some factors instead of numerical values:

str(sapply(sapply(split(do.call(rbind, a), do.call(rbind, a)$NAME), function(x) x["MEM.SHIP"]), function(y) as.factor(y)))
List of 8
 $ FBgn0026620.MEM.SHIP: Factor w/ 1 level "0.5241095": 1
 $ FBgn0037424.MEM.SHIP: Factor w/ 1 level "0.9793901": 1
ADD REPLYlink written 2.4 years ago by VHahaut1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1393 users visited in the last hour