So I have 2 files and something peculiar occurred.
My origin file is a list of SELECT genes like this:
A1BG
A2M
A2MP1
And I have a second file with gene synonyms which is like this:
A1BG A1B;ABG;GAB;HYST2477
A2M A2MD;CPAMD5
So, if I read the second file in R and do summary() then the output for the synonyms column is each gene individually. like so:
symbol synonyms
A1BG: 1 A1B: 7
A2M: 1 TRNAL_CAA: 2
This basically means that in the second file, R can tell that the ';' is a separator in the 2nd column.
But when I append to the 1st file the info from the synonyms file and do summary() for the produced file I get this:
symbol synonyms
A1BG: 1 A1B;ABG;GAB;HYST2477 :1
A2M: 1 A2MD;CPAMD5: 1
I read both files like this:
synonyms file:
df <- read.csv('homo_sapiens_synonyms.csv', header=TRUE, sep='\t')
joined file:
df <- read.csv('synonyms.csv', header=TRUE, sep='\t')
Why R doesn't separate the values in the synonyms column on the joined file?
This question seems to be a duplicate of this: Colapse column values to multiple rows for further analysis.
I think this is because you are overwriting the contents of
dfby second file contents, not appending it.To do so you have to store both file contents in separate data frames.