renaming one column in multiple csv files at once and adding same rows within each columns in multiple csv files and merging them into single frame
1
0
Entering edit mode
3.4 years ago
pramach1 ▴ 40

This is a follow up post from my previous post.

#after setting up the working directory that has multiple csv files (92 files), is all read into a single list but as different data frames.
fnames <- list.files() 

#reading it into separate data frames within the list. Now the list will have 92 csv file as separate df. 
myfiles = lapply(fnames, read.delim)

Now that the list has all 92 csv files as separate data frame, we are extracting and keeping the columns we need. We need 2 columns named "name", and "fraction_total_reads".

lst1 <- lapply(myfiles, "[", c("name", "fraction_total_reads"))

Now each csv within the list has only 2 columns by "name" and "fraction_total_reads". But they all have different number of rows. Before I save these files...I want to rename the column fraction_total_reads, in each csv file to their corresponding sample name.

For example: within lst1... df[[1]] fraction_total_reads had to be renamed as P_A_1. so on and so forth.

lst2 <- rename(lst1[[x]], c("fraction_total_reads" = "P_A_1", "fraction_total_reads" = "P_A_2"...so on till all 92 samples ar named)

Would a correct version of the above code work? The above didn't work.

Once I rename then columns in each data frame within the list, I want to merge them by the column name. Remember they have same number of columns, but different number of rows. Thank you for any help regarding this.

R • 3.0k views
ADD COMMENT
0
Entering edit mode

Can you provide an example of the first few rows of two data.frames in your list, and an example of what the output you want should look like?

ADD REPLY
0
Entering edit mode

Sure. Here is the first 2 rows. This is what I have...

Data frame 1

  1. name P_A_1
  2. Diplosphaera 1.00E-05
  3. Opitutus 0.00725

Dataframe 2

  • name P_A_2
  • Anaerosporobacter 6.00E-05
  • Blautia 0.00016

This is what I want

  • name P_A_1 P_A_2
  • 2.Diplosphaera 1.00E-05 0
  • 3.Opitutus 0.00725 0
  • 4.Anaerosporobacter 0 6.00E-05
  • 5.Blautia 0 0.00016

usually I do this by this code

df91<- read.csv("df91.csv")
df92<- read.csv("df92.csv")

df91 <- ddply(df91, "name", numcolwise(sum))
df92 <- ddply(df92, "name", numcolwise(sum))

merged1 <- Reduce(function(x,y) merge(x, y, by = "name", all.x = TRUE, all.y = TRUE),
             list(df91, df92)

I will get the desired output. In the list of files that has 92 csv files...I don't know how to do it.

ADD REPLY
0
Entering edit mode

Sorry, but I found the question a bit confusing. I don't understand what you mean with "adding same rows within each columns in multiple csv files".

About the merging, you want to create a single dataframe with all the P_A_X columns? What about the name columns?

ADD REPLY
0
Entering edit mode

Yes. Sorry about the confusion. Here is what I have

  1. name P_A_1
  2. Cephaloticoccus 0.0098
  3. Alterococcus 0.00018
  4. **uncultured 0.00016
  5. uncultured 1.00E-05**
  6. Diplosphaera 1.00E-05
  7. Opitutus 0.00725

I have 2 rows, that are here named as uncultured. I want to add the column 2 just for the uncultured rows.

If its just one single data frame (one csv) I am working on , I always use

df92<- read.csv("df92.csv")

df92 <- ddply(df92, "name", numcolwise(sum))

The output for this code will be

  1. name P_A_1
  2. Cephaloticoccus 0.0098
  3. Alterococcus 0.00018
  4. **uncultured 1.70E-04
  5. Diplosphaera 1.00E-05
  6. Opitutus 0.00725

It works for a single df (csv file) and not on the list. How to do this on a multiple csv files part of the list. within each csv file, if they have same row names, then add the values in the second column and display them as one row with that name.

ADD REPLY
0
Entering edit mode

Thanks for the clarification! I edited my previous answer with an additional suggestion. Let me know how it goes.

ADD REPLY
1
Entering edit mode
3.4 years ago
gabrielafg ▴ 60

For the rename question, you could try the code below. I haven't tested it so it's just a suggestion.

for (x in 1:92){
    names(lst1[[x]])[names(lst1[[x]]) == "fraction_total_reads"] = paste0("P_A_", x)
}

[edit] Based on your additional explanation, for the merge you could try:

final_df = data.frame(name = character())

for (df in 1:92){
    lst1[[df]] = ddply(lst1[[df]], "name", numcolwise(sum))
    final_df = merge(x = final_df, y = lst1[[df]], by = "name", all = TRUE)
}

final_df[is.na(final_df)] = 0

There may be some smarter solution but I hope this works!

ADD COMMENT
0
Entering edit mode

This works. Thank you.

ADD REPLY
0
Entering edit mode

works perfect! Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 1619 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6