Question

renaming one column in multiple csv files at once and adding same rows within each columns in multiple csv files and merging them into single frame

0

Entering edit mode

3.4 years ago

pramach1 ▴ 40

This is a follow up post from my previous post.

#after setting up the working directory that has multiple csv files (92 files), is all read into a single list but as different data frames.
fnames <- list.files() 

#reading it into separate data frames within the list. Now the list will have 92 csv file as separate df. 
myfiles = lapply(fnames, read.delim)

Now that the list has all 92 csv files as separate data frame, we are extracting and keeping the columns we need. We need 2 columns named "name", and "fraction_total_reads".

lst1 <- lapply(myfiles, "[", c("name", "fraction_total_reads"))

Now each csv within the list has only 2 columns by "name" and "fraction_total_reads". But they all have different number of rows. Before I save these files...I want to rename the column fraction_total_reads, in each csv file to their corresponding sample name.

For example: within lst1... df[[1]] fraction_total_reads had to be renamed as P_A_1. so on and so forth.

lst2 <- rename(lst1[[x]], c("fraction_total_reads" = "P_A_1", "fraction_total_reads" = "P_A_2"...so on till all 92 samples ar named)

Would a correct version of the above code work? The above didn't work.

Once I rename then columns in each data frame within the list, I want to merge them by the column name. Remember they have same number of columns, but different number of rows. Thank you for any help regarding this.

R • 3.0k views

ADD COMMENT • link updated 3.4 years ago by gabrielafg ▴ 60 • written 3.4 years ago by pramach1 ▴ 40

0

Entering edit mode

Can you provide an example of the first few rows of two data.frames in your list, and an example of what the output you want should look like?

ADD REPLY • link 3.4 years ago by rpolicastro 13k

0

Entering edit mode

Sure. Here is the first 2 rows. This is what I have...

Data frame 1

name P_A_1
Diplosphaera 1.00E-05
Opitutus 0.00725

Dataframe 2

name P_A_2
Anaerosporobacter 6.00E-05
Blautia 0.00016

This is what I want

name P_A_1 P_A_2
2.Diplosphaera 1.00E-05 0
3.Opitutus 0.00725 0
4.Anaerosporobacter 0 6.00E-05
5.Blautia 0 0.00016

usually I do this by this code

df91<- read.csv("df91.csv")
df92<- read.csv("df92.csv")

df91 <- ddply(df91, "name", numcolwise(sum))
df92 <- ddply(df92, "name", numcolwise(sum))

merged1 <- Reduce(function(x,y) merge(x, y, by = "name", all.x = TRUE, all.y = TRUE),
             list(df91, df92)

I will get the desired output. In the list of files that has 92 csv files...I don't know how to do it.

ADD REPLY • link 3.4 years ago by pramach1 ▴ 40

0

Entering edit mode

Sorry, but I found the question a bit confusing. I don't understand what you mean with "adding same rows within each columns in multiple csv files".

About the merging, you want to create a single dataframe with all the P_A_X columns? What about the name columns?

ADD REPLY • link 3.4 years ago by gabrielafg ▴ 60

0

Entering edit mode

Yes. Sorry about the confusion. Here is what I have

name P_A_1
Cephaloticoccus 0.0098
Alterococcus 0.00018
**uncultured 0.00016
uncultured 1.00E-05**
Diplosphaera 1.00E-05
Opitutus 0.00725

I have 2 rows, that are here named as uncultured. I want to add the column 2 just for the uncultured rows.

If its just one single data frame (one csv) I am working on , I always use

df92<- read.csv("df92.csv")

df92 <- ddply(df92, "name", numcolwise(sum))

The output for this code will be

name P_A_1
Cephaloticoccus 0.0098
Alterococcus 0.00018
**uncultured 1.70E-04
Diplosphaera 1.00E-05
Opitutus 0.00725

It works for a single df (csv file) and not on the list. How to do this on a multiple csv files part of the list. within each csv file, if they have same row names, then add the values in the second column and display them as one row with that name.

ADD REPLY • link 3.4 years ago by pramach1 ▴ 40

0

Entering edit mode

Thanks for the clarification! I edited my previous answer with an additional suggestion. Let me know how it goes.

ADD REPLY • link 3.4 years ago by gabrielafg ▴ 60

score 1 · Accepted Answer · 2020-11-17

1

Entering edit mode

3.4 years ago

gabrielafg ▴ 60

For the rename question, you could try the code below. I haven't tested it so it's just a suggestion.

for (x in 1:92){
    names(lst1[[x]])[names(lst1[[x]]) == "fraction_total_reads"] = paste0("P_A_", x)
}

[edit] Based on your additional explanation, for the merge you could try:

final_df = data.frame(name = character())

for (df in 1:92){
    lst1[[df]] = ddply(lst1[[df]], "name", numcolwise(sum))
    final_df = merge(x = final_df, y = lst1[[df]], by = "name", all = TRUE)
}

final_df[is.na(final_df)] = 0

There may be some smarter solution but I hope this works!

ADD COMMENT • link 3.4 years ago by gabrielafg ▴ 60

0

Entering edit mode

This works. Thank you.

ADD REPLY • link 3.4 years ago by pramach1 ▴ 40

0

Entering edit mode

works perfect! Thank you.

ADD REPLY • link 3.4 years ago by pramach1 ▴ 40