Question

Add filename as column headers in file

0

Entering edit mode

3.7 years ago

smrutimayipanda ▴ 20

I have a combined file consisting of gene columns and its respective logFC values. I have used this R code for making this combined file.

library(data.table)
file_list <- list.files(pattern = "*.txt" )
import_files <- lapply(file_list,read.table,stringsAsFactors =FALSE)
rbinded_files <- na.omit(rbindlist(import_files,idcol="file"))
merged_data = dcast(rbinded_files,V2 ~file,fun=max, na.rm=TRUE)

But i didn't get the column headers which should be the filenames. Ex- my filename is output_GSE89333.txt. I want only GSE89333_logFC to be written in column headers.

ABAT     NA     1.674985738
ABCC5  0.452140288776222   NA

This is my file. I want first column header as Genenames and other column headers as their respective filenames. How to add code for this in my existing code? Please suggest me. Thanks in advance

R • 2.8k views

ADD COMMENT • link updated 3.7 years ago by rpolicastro 13k • written 3.7 years ago by smrutimayipanda ▴ 20

0

Entering edit mode

Can you add an example of what the inside of one of the files looks like, and an example of what you want the output to be?

ADD REPLY • link 3.7 years ago by rpolicastro 13k

0

Entering edit mode

The file looks like:

Gene.symbol           LogFC
ABAT                         NA

And i want

Gene.symbol           LogFC_filename1    LogFC_filename2
ABAT                         NA                           1.674985738

ADD REPLY • link updated 3.7 years ago by Ram 43k • written 3.7 years ago by smrutimayipanda ▴ 20

Ram · Answer 1 · 2020-08-02

3

Entering edit mode

3.7 years ago

rpolicastro 13k

Here's a mostly data.table solution with some tidyverse thrown on for convenience.

library("data.table")
library("tidyverse")

file_list <- list.files(pattern="\\.txt$")

imported_files <- lapply(file_list, function(x) {
  DT <- fread(x)
  new_colname <- x %>%
    basename %>%
    str_replace_all(c("^output_"="", "\\.txt$"="")) %>%
    str_c("LogFC_", .)
  setnames(DT, old="LogFC", new=new_colname)
  return(DT)
})

merged_data <- reduce(imported_files, merge, by="Gene.symbol", all=TRUE)

ADD COMMENT • link 3.7 years ago by rpolicastro 13k

0

Entering edit mode

I got this error:

Error in setnames(DT, old = "LogFC", new = new_colname) : 
  Items of 'old' not found in column names: [LogFC]. Consider skip_absent=TRUE.

What should I do?

ADD REPLY • link updated 3.7 years ago by Ram 43k • written 3.7 years ago by smrutimayipanda ▴ 20

0

Entering edit mode

It's telling you that there is no column named "LogFC" in the file, which you said there was in your reply. Change that to whatever the column name actually is.

ADD REPLY • link 3.7 years ago by rpolicastro 13k

0

Entering edit mode

Thank you sir its working but its giving me common genes, not all genes. is there any other command instead of merge and reduce?

ADD REPLY • link 3.7 years ago by smrutimayipanda ▴ 20

0

Entering edit mode

I edited the answer so it should retain all genes. The trick is adding the argument all=TRUE to the reduce function (which then passes it to the merge function).

ADD REPLY • link 3.7 years ago by rpolicastro 13k

0

Entering edit mode

Yeah its working now. Thank you so much for your kindness.

ADD REPLY • link 3.7 years ago by smrutimayipanda ▴ 20

0

Entering edit mode

hey rpolicastro sir, I have limit of writing 5 posts only. so I am writing here. Please reply for this.

I am getting an error when using this code for multiple text files.

library("data.table")
library("tidyverse")

file_list <- list.files(pattern="\\.tsv$")

 imported_files <- lapply(file_list, function(x) {
     DT <- fread(x)
     new_colname <- x %>%
         basename %>%
         str_replace_all(c("output_"="", "\\.tsv"="")) %>%
         str_c("LogFC_", .)
     setnames(DT, old="logFC", new=new_colname)
     return(DT)
 })

 merged_data <- reduce(imported_files, merge, by = "Gene.symbol", all = TRUE)

But I am getting this error :

Error in merge.data.table(out, elt, ...) : 
  x has some duplicated column name(s): ID.x,ID.y. Please remove or rename the duplicate(s) and try again.
In addition: Warning message:
In merge.data.table(out, elt, ...) :
  column names 'ID.x', 'ID.y' are duplicated in the result

I want to add filenames as column header in this file. This is my text file format:

ID  Gene.symbol logFC
8039748 A1BG    -0.019
7933640 A1CF    0.0685
7960947 A2M 0.223
7953775 A2ML1   0.0767
7914643 A3GALT2 0.098

and the filename is GSE107570.txt_filtered. this way the files are renamed and format is same. How to add filenames as column header? How to modify this code?

ADD REPLY • link updated 3.7 years ago by Ram 43k • written 3.7 years ago by smrutimayipanda ▴ 20

0

Entering edit mode

you should post it as a new question and provide detail on what the files look like, and what you want at the end.

ADD REPLY • link 3.7 years ago by rpolicastro 13k