Question: renaming one column in multiple csv files at once and adding same rows within each columns in multiple csv files and merging them into single frame
0
gravatar for pramach1
17 days ago by
pramach10
pramach10 wrote:

This is a follow up post from my previous post.

#after setting up the working directory that has multiple csv files (92 files), is all read into a single list but as different data frames.
fnames <- list.files() 

#reading it into separate data frames within the list. Now the list will have 92 csv file as separate df. 
myfiles = lapply(fnames, read.delim)

Now that the list has all 92 csv files as separate data frame, we are extracting and keeping the columns we need. We need 2 columns named "name", and "fraction_total_reads".

lst1 <- lapply(myfiles, "[", c("name", "fraction_total_reads"))

Now each csv within the list has only 2 columns by "name" and "fraction_total_reads". But they all have different number of rows. Before I save these files...I want to rename the column fraction_total_reads, in each csv file to their corresponding sample name.

For example: within lst1... df[[1]] fraction_total_reads had to be renamed as P_A_1. so on and so forth.

lst2 <- rename(lst1[[x]], c("fraction_total_reads" = "P_A_1", "fraction_total_reads" = "P_A_2"...so on till all 92 samples ar named)

Would a correct version of the above code work? The above didn't work.

Once I rename then columns in each data frame within the list, I want to merge them by the column name. Remember they have same number of columns, but different number of rows. Thank you for any help regarding this.

R • 200 views
ADD COMMENTlink modified 17 days ago by gabrielafg10 • written 17 days ago by pramach10

Can you provide an example of the first few rows of two data.frames in your list, and an example of what the output you want should look like?

ADD REPLYlink written 17 days ago by rpolicastro2.4k

Sure. Here is the first 2 rows. This is what I have...

Data frame 1

  1. name P_A_1
  2. Diplosphaera 1.00E-05
  3. Opitutus 0.00725

Dataframe 2

  • name P_A_2
  • Anaerosporobacter 6.00E-05
  • Blautia 0.00016

This is what I want

  • name P_A_1 P_A_2
  • 2.Diplosphaera 1.00E-05 0
  • 3.Opitutus 0.00725 0
  • 4.Anaerosporobacter 0 6.00E-05
  • 5.Blautia 0 0.00016

usually I do this by this code

df91<- read.csv("df91.csv")
df92<- read.csv("df92.csv")

df91 <- ddply(df91, "name", numcolwise(sum))
df92 <- ddply(df92, "name", numcolwise(sum))

merged1 <- Reduce(function(x,y) merge(x, y, by = "name", all.x = TRUE, all.y = TRUE),
             list(df91, df92)

I will get the desired output. In the list of files that has 92 csv files...I don't know how to do it.

ADD REPLYlink modified 17 days ago • written 17 days ago by pramach10

Sorry, but I found the question a bit confusing. I don't understand what you mean with "adding same rows within each columns in multiple csv files".

About the merging, you want to create a single dataframe with all the P_A_X columns? What about the name columns?

ADD REPLYlink written 17 days ago by gabrielafg10

Yes. Sorry about the confusion. Here is what I have

  1. name P_A_1
  2. Cephaloticoccus 0.0098
  3. Alterococcus 0.00018
  4. **uncultured 0.00016
  5. uncultured 1.00E-05**
  6. Diplosphaera 1.00E-05
  7. Opitutus 0.00725

I have 2 rows, that are here named as uncultured. I want to add the column 2 just for the uncultured rows.

If its just one single data frame (one csv) I am working on , I always use

df92<- read.csv("df92.csv")

df92 <- ddply(df92, "name", numcolwise(sum))

The output for this code will be

  1. name P_A_1
  2. Cephaloticoccus 0.0098
  3. Alterococcus 0.00018
  4. **uncultured 1.70E-04
  5. Diplosphaera 1.00E-05
  6. Opitutus 0.00725

It works for a single df (csv file) and not on the list. How to do this on a multiple csv files part of the list. within each csv file, if they have same row names, then add the values in the second column and display them as one row with that name.

ADD REPLYlink modified 17 days ago • written 17 days ago by pramach10

Thanks for the clarification! I edited my previous answer with an additional suggestion. Let me know how it goes.

ADD REPLYlink written 17 days ago by gabrielafg10
1
gravatar for gabrielafg
17 days ago by
gabrielafg10
gabrielafg10 wrote:

For the rename question, you could try the code below. I haven't tested it so it's just a suggestion.

for (x in 1:92){
    names(lst1[[x]])[names(lst1[[x]]) == "fraction_total_reads"] = paste0("P_A_", x)
}

[edit] Based on your additional explanation, for the merge you could try:

final_df = data.frame(name = character())

for (df in 1:92){
    lst1[[df]] = ddply(lst1[[df]], "name", numcolwise(sum))
    final_df = merge(x = final_df, y = lst1[[df]], by = "name", all = TRUE)
}

final_df[is.na(final_df)] = 0

There may be some smarter solution but I hope this works!

ADD COMMENTlink modified 17 days ago • written 17 days ago by gabrielafg10

This works. Thank you.

ADD REPLYlink written 17 days ago by pramach10

works perfect! Thank you.

ADD REPLYlink written 16 days ago by pramach10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1304 users visited in the last hour