how to merge two files without duplicating same column
2
0
Entering edit mode
3 months ago
Bioinfonext ▴ 460

Dear all,

I am using below command to merge two files based on one of the similar column, but this command also duplicate the other common columns;

file1 <- read.table("CpG.csv", header=T, sep=",", as.is=T, na.strings="NA")
    file1[c(1:3), c(1:3)]
        Sample_ID                          PC1                         PC2
    1 NSS.1.0093                        -25382.95                     22243.17
    2 NSS.1.0095                        -29640.00                     27610.33
    3 NSS.1.0096                        -41261.36                     30188.37
     file2 <- read.table("Phe_121023.csv", header=T, sep=",", as.is=T, na.strings="NA")
    file2[c(1:3), c(1:3)]
        Sample_ID         BeacChip.ID   Sentrix_ID
    1 NSS.1.0093 200772280026_R05C01 200772280026
    2 NSS.1.0095 200772280026_R07C01 200772280026
    3 NSS.1.0096 200772280026_R08C01 200772280026
     PCs <- read.table("Control_probe_PCs_all_preprocessed.txt", header=T, sep="\t", as.is=T, na.strings="NA")
    PCs[c(1:3), c(1:3)]
        Sample_ID       PC1      PC2
    1 NSS.1.0093 -25382.95 22243.17
    2 NSS.1.0095 -29640.00 27610.33
    3 NSS.1.0096 -41261.36 30188.37

tmp=merge(file1, file2, by="Sample_ID")

write.table(tmp, file="your_merged_commonData.txt", sep="\t")
R statistics biostatistics • 349 views
ADD COMMENT
2
Entering edit mode
3 months ago

Yes, indeed, when both data.frames contain the same column (or a similarly named one) then the suffix .x and .y will be added to the column name.

It is not up to R to guess which columns you would like to keep. Either you exclude them already prior to merging or you drop them later. This for example will drop the columns with .y and rename those .x:

tmp <- tmp[,!grepl("\\.y$",colnames(tmp))]
colnames(tmp) <- gsub("\\.x$","",colnames(tmp))
ADD COMMENT
1
Entering edit mode
3 months ago
DBScan ▴ 300

You can use a join function from dplyr package like this:

inner_join(file1, PCs)

This would keep only sample which occur in both of your dataframes.

ADD COMMENT

Login before adding your answer.

Traffic: 1480 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6