Question

how to merge multiple files based on column header

0

Entering edit mode

5 months ago

Bioinfonext ▴ 460

Hi, I would like to merge three files based on column header Sample_ID and file should only have those samples that are common in all these three file. Could you please help with this.

in.sites <- read.table("CpG.csv", header=T, sep=",", as.is=T, na.strings="NA")
    in.sites[c(1:3), c(1:3)]
        Sample_ID Mean_M_from_TRUEinvar_per_person n_TRUEinvar_CpGs_in_split
    1 NSS.1.0093                        -3.857742                     77095
    2 NSS.1.0095                        -3.761795                     77095
    3 NSS.1.0096                        -3.715694                     77095
     pheno <- read.table("Phe_121023.csv", header=T, sep=",", as.is=T, na.strings="NA")
    pheno[c(1:3), c(1:3)]
        Sample_ID         BeacChip.ID   Sentrix_ID
    1 NSS.1.0093 200772280026_R05C01 200772280026
    2 NSS.1.0095 200772280026_R07C01 200772280026
    3 NSS.1.0096 200772280026_R08C01 200772280026
     PCs <- read.table("Control_probe_PCs_all_preprocessed.txt", header=T, sep="\t", as.is=T, na.strings="NA")
    PCs[c(1:3), c(1:3)]
        Sample_ID       PC1      PC2
    1 NSS.1.0093 -25382.95 22243.17
    2 NSS.1.0095 -29640.00 27610.33
    3 NSS.1.0096 -41261.36 30188.37

Many thanks

R statistics biostatistics • 568 views

ADD COMMENT • link updated 5 months ago by iraun 6.2k • written 5 months ago by Bioinfonext ▴ 460

Ram · Answer 1 · 2023-11-02

1

Entering edit mode

5 months ago

bk11 ★ 2.4k

A simple way to do will be like this-

tmp=merge(in.sites, pheno, by="Sample_ID")
final=merge(tmp, PCs, by="Sample_ID"

write.table(final, file="your_merged_commonData.txt", sep="\t")

ADD COMMENT • link updated 5 months ago by Ram 43k • written 5 months ago by bk11 ★ 2.4k

Ram · Answer 2 · 2023-11-02

1

Entering edit mode

5 months ago

zau saa ▴ 120

library(dplyr)
tmp <- inner_join(in.sites, pheno, by = "Sample_ID")
merged <- inner_join(tmp, PCs, by = "Sample_ID")

ADD COMMENT • link updated 5 months ago by Ram 43k • written 5 months ago by zau saa ▴ 120

0

Entering edit mode

my answer is ok now. I misunderstand the question before.

ADD REPLY • link 5 months ago by zau saa ▴ 120

Ram · Answer 3 · 2023-11-02

0

Entering edit mode

5 months ago

iraun 6.2k

Not tested, but maybe something like this might do the job.

list_of_frames <- list(df1, df2, df3)

# Extract common Sample_IDs
common_samples <- Reduce(intersect, lapply(list_of_frames, function(df) df$Sample_ID))

# Keep only rows with common Sample_ID
filtered_frames <- lapply(list_of_frames, function(df) df[df$Sample_ID %in% common_samples, ])

# Merge the filtered dataframes based on Sample_ID
merged_data <- Reduce(function(x, y) merge(x, y, by = "Sample_ID"), filtered_frames)

ADD COMMENT • link updated 5 months ago by Ram 43k • written 5 months ago by iraun 6.2k

0

Entering edit mode

what about function df?

ADD REPLY • link 5 months ago by Bioinfonext ▴ 460

0

Entering edit mode

What about it? It is just an anonymous function (similar to lambda in python), df is the name of the input argument, could be x or whichever other name you prefer. It is of course possible to extract the function, and then apply it with lapply. E.g:

extract_sample_id <- function(df) df$Sample_ID
common_samples <- Reduce(intersect, lapply(list_of_frames, extract_sample_id))

ADD REPLY • link 5 months ago by iraun 6.2k