Question: Merging two files based on the identifier column (gene symbols)
0
gravatar for mohammedtoufiq91
11 months ago by
mohammedtoufiq91110 wrote:

Hi,

I have two different *.csv files with different column headers except one column, i.e, one with the gene symbols and expression data (samples), and the other with the gene symbols and phenotypic data/attributes, in both the files, one column (gene symbols) is same. I would like to merge both the files based on mapping with the gene symbol column and save all the data in one file for further data analysis. I would like to know how this could be done.

Thank you,

Toufiq

ADD COMMENTlink modified 11 months ago by Jean-Karim Heriche23k • written 11 months ago by mohammedtoufiq91110
1

Have you read the help page of the merge function?

?merge
ADD REPLYlink written 11 months ago by Benn8.0k

Thank you so much. @Benn

ADD REPLYlink modified 11 months ago • written 11 months ago by mohammedtoufiq91110

Cross-posted: https://support.bioconductor.org/p/124514

ADD REPLYlink written 11 months ago by ATpoint36k
3
gravatar for Jean-Karim Heriche
11 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche23k wrote:

This can be done in the terminal with the join utility (sort the files on gene symbol first), e.g. join -a1 -a2 file1.csv file2.csv

The -a option is used to keep unpairable lines from the corresponding file, i.e. in case a gene symbol is in one file but not the other.

ADD COMMENTlink written 11 months ago by Jean-Karim Heriche23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1015 users visited in the last hour