Explain a little bit more. You have thousands of files just with 1 line and you want to put them together? If so, a simple cat command will do the trick.
Thank you so much for your reply I am sorry for confusing question. I already have file in gene by gene I need to create matrix for each row data. I have thousand of gene data. so I need to create matrix file for each. I split my file so that I can create matrix file for each.
> d %>%
+ rownames_to_column("genes") %>%
+ pivot_longer(-genes,names_to ="k", values_to ="v") %>%
+ mutate(k=str_split_fixed(k,"-",2)[,1],
+ k=as.integer(k)) %>%
+ group_by(genes) %>%
+ group_map(~{
+ tibble(complete(k = 1:max(k), fill = list(vol = 0),data=.x) %>%
+ group_by(k) %>%
+ mutate(v=paste(v, collapse =",")) %>%
+ distinct() %>%
+ ungroup() %>%
+ cSplit (.,"v")
+ )
+ })[[1]]# A tibble: 4 x 4
k v_1 v_2 v_3
<int><dbl><dbl><dbl>
1 1 NA NA NA
2 2 0.514 NA NA
3 3 0.535 0.436 NA
4 4 0.53 0.388 0.418
[[2]]# A tibble: 4 x 4
k v_1 v_2 v_3
<int><dbl><dbl><dbl>
1 1 NA NA NA
2 2 0.111 NA NA
3 3 0.222 0.333 NA
4 4 0.444 0.555 0.666
None of the group ops in dplyr (probably tidyverse) retain names of the lists, which is a pain. Here is another solution to retain names:
> d %>%
+ rownames_to_column("genes") %>%
+ pivot_longer(-genes, names_to ="k", values_to ="v") %>%
+ mutate(k = str_split_fixed(k, "-", 2)[, 1],
+ k = as.integer(k)) %>%
+ group_by(genes) %>%
+ complete(k = 1:max(k), fill = list(vol = 0)) %>%
+ group_by(genes, k) %>%
+ mutate(v = paste(v, collapse =",")) %>%
+ distinct() %>%
+ ungroup() %>%
+ cSplit(.,"v") %>%
+ split(.$genes) %>%
+ map(., ~ (data=.x %>% select(-genes)))$GeneA
k v_1 v_2 v_3
1: 1 NA NA NA
2: 2 0.514 NA NA
3: 3 0.535 0.436 NA
4: 4 0.530 0.388 0.418
$GeneB
k v_1 v_2 v_3
1: 1 NA NA NA
2: 2 0.111 NA NA
3: 3 0.222 0.333 NA
4: 4 0.444 0.555 0.666
Your expected solution has a column problem. Since maximum columns in OP data is 3 columns, 4 columns in ouput was okay. If there are more than 3 columns (i.e for eg. 3.7, 3.8, 4.10), expected column number would be (4 in this case) would be incorrect. Number of columns should be equivalent to number of values, IMO.
# example input for two genes, assuming every row has the same number of columns
d <- read.table(text ="
2-1 3-1 3-2 4-1 4-2 4-3
GeneA 0.514 0.535 0.436 0.530 0.388 0.418
GeneB 0.111 0.222 0.333 0.444 0.555 0.666",
check.names = FALSE)
library(data.table)# keep the gene names
d$Gene<- rownames(d)
setDT(d)# reshape wide-to-long
d <- melt(d, id.vars ="Gene")# split on "-", apply factor levels for long-to-wide reshape with "fill"
d[, c("c1", "c2") := tstrsplit(variable, split="-", fixed = TRUE)]
d[, c("c1", "c2") := lapply(.SD, factor, levels = 1:max(c(c1, c2))), .SDcols = c("c1", "c2")]
d <- dcast(d, Gene + c1 ~ c2, drop = FALSE)# split on gene names and convert to matrix
lapply(split(d[, -(1:2)], d$Gene), as.matrix)# $GeneA# 1 2 3 4# [1,] NA NA NA NA# [2,] 0.514 NA NA NA# [3,] 0.535 0.436 NA NA# [4,] 0.530 0.388 0.418 NA# # $GeneB# 1 2 3 4# [1,] NA NA NA NA# [2,] 0.111 NA NA NA# [3,] 0.222 0.333 NA NA# [4,] 0.444 0.555 0.666 NA
Explain a little bit more. You have thousands of files just with 1 line and you want to put them together? If so, a simple
cat
command will do the trick.Thank you so much for your reply I am sorry for confusing question. I already have file in gene by gene I need to create matrix for each row data. I have thousand of gene data. so I need to create matrix file for each. I split my file so that I can create matrix file for each.
Thank you so much!