Question: Add filename as column headers in file
0
gravatar for smrutimayipanda
12 days ago by
smrutimayipanda10 wrote:

I have a combined file consisting of gene columns and its respective logFC values. I have used this R code for making this combined file.

library(data.table)
file_list <- list.files(pattern = "*.txt" )
import_files <- lapply(file_list,read.table,stringsAsFactors =FALSE)
rbinded_files <- na.omit(rbindlist(import_files,idcol="file"))
merged_data = dcast(rbinded_files,V2 ~file,fun=max, na.rm=TRUE)

But i didn't get the column headers which should be the filenames. Ex- my filename is output_GSE89333.txt. I want only GSE89333_logFC to be written in column headers.

ABAT     NA     1.674985738
ABCC5  0.452140288776222   NA

This is my file. I want first column header as Genenames and other column headers as their respective filenames. How to add code for this in my existing code? Please suggest me. Thanks in advance

R • 156 views
ADD COMMENTlink modified 12 days ago by rpolicastro720 • written 12 days ago by smrutimayipanda10

Can you add an example of what the inside of one of the files looks like, and an example of what you want the output to be?

ADD REPLYlink written 12 days ago by rpolicastro720

The file looks like:

Gene.symbol           LogFC
ABAT                         NA

And i want

Gene.symbol           LogFC_filename1    LogFC_filename2
ABAT                         NA                           1.674985738
ADD REPLYlink modified 12 days ago by RamRS28k • written 12 days ago by smrutimayipanda10
3
gravatar for rpolicastro
12 days ago by
rpolicastro720
rpolicastro720 wrote:

Here's a mostly data.table solution with some tidyverse thrown on for convenience.

library("data.table")
library("tidyverse")

file_list <- list.files(pattern="\\.txt$")

imported_files <- lapply(file_list, function(x) {
  DT <- fread(x)
  new_colname <- x %>%
    basename %>%
    str_replace_all(c("^output_"="", "\\.txt$"="")) %>%
    str_c("LogFC_", .)
  setnames(DT, old="LogFC", new=new_colname)
  return(DT)
})

merged_data <- reduce(imported_files, merge, by="Gene.symbol", all=TRUE)
ADD COMMENTlink modified 12 days ago • written 12 days ago by rpolicastro720

I got this error:

Error in setnames(DT, old = "LogFC", new = new_colname) : 
  Items of 'old' not found in column names: [LogFC]. Consider skip_absent=TRUE.

What should I do?

ADD REPLYlink modified 12 days ago by RamRS28k • written 12 days ago by smrutimayipanda10

It's telling you that there is no column named "LogFC" in the file, which you said there was in your reply. Change that to whatever the column name actually is.

ADD REPLYlink written 12 days ago by rpolicastro720

Thank you sir its working but its giving me common genes, not all genes. is there any other command instead of merge and reduce?

ADD REPLYlink written 12 days ago by smrutimayipanda10

I edited the answer so it should retain all genes. The trick is adding the argument all=TRUE to the reduce function (which then passes it to the merge function).

ADD REPLYlink modified 12 days ago • written 12 days ago by rpolicastro720

Yeah its working now. Thank you so much for your kindness.

ADD REPLYlink written 12 days ago by smrutimayipanda10

hey rpolicastro sir, I have limit of writing 5 posts only. so I am writing here. Please reply for this.

I am getting an error when using this code for multiple text files.

library("data.table")
library("tidyverse")

file_list <- list.files(pattern="\\.tsv$")

 imported_files <- lapply(file_list, function(x) {
     DT <- fread(x)
     new_colname <- x %>%
         basename %>%
         str_replace_all(c("output_"="", "\\.tsv"="")) %>%
         str_c("LogFC_", .)
     setnames(DT, old="logFC", new=new_colname)
     return(DT)
 })

 merged_data <- reduce(imported_files, merge, by = "Gene.symbol", all = TRUE)

But I am getting this error :

Error in merge.data.table(out, elt, ...) : 
  x has some duplicated column name(s): ID.x,ID.y. Please remove or rename the duplicate(s) and try again.
In addition: Warning message:
In merge.data.table(out, elt, ...) :
  column names 'ID.x', 'ID.y' are duplicated in the result

I want to add filenames as column header in this file. This is my text file format:

ID  Gene.symbol logFC
8039748 A1BG    -0.019
7933640 A1CF    0.0685
7960947 A2M 0.223
7953775 A2ML1   0.0767
7914643 A3GALT2 0.098

and the filename is GSE107570.txt_filtered. this way the files are renamed and format is same. How to add filenames as column header? How to modify this code?

ADD REPLYlink modified 7 days ago by RamRS28k • written 10 days ago by smrutimayipanda10

you should post it as a new question and provide detail on what the files look like, and what you want at the end.

ADD REPLYlink written 10 days ago by rpolicastro720
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 850 users visited in the last hour