Separate column values for summary

0

Entering edit mode

7.5 years ago

Jack ▴ 120

So I have 2 files and something peculiar occurred.

My origin file is a list of SELECT genes like this:

A1BG
A2M
A2MP1

And I have a second file with gene synonyms which is like this:

A1BG   A1B;ABG;GAB;HYST2477
A2M    A2MD;CPAMD5

So, if I read the second file in R and do summary() then the output for the synonyms column is each gene individually. like so:

symbol     synonyms
A1BG: 1    A1B: 7
A2M: 1     TRNAL_CAA: 2

This basically means that in the second file, R can tell that the ';' is a separator in the 2nd column.

But when I append to the 1st file the info from the synonyms file and do summary() for the produced file I get this:

symbol     synonyms
A1BG: 1    A1B;ABG;GAB;HYST2477 :1
A2M: 1     A2MD;CPAMD5: 1

I read both files like this:

synonyms file:

df <- read.csv('homo_sapiens_synonyms.csv', header=TRUE, sep='\t')

joined file:

df <- read.csv('synonyms.csv', header=TRUE, sep='\t')

Why R doesn't separate the values in the synonyms column on the joined file?

R summary csv • 1.4k views

ADD COMMENT • link updated 5.7 years ago by Biostar 20 • written 7.5 years ago by Jack ▴ 120

0

Entering edit mode

ADD REPLY • link 7.5 years ago by ddiez ★ 2.0k

0

Entering edit mode

I think this is because you are overwriting the contents of df by second file contents, not appending it.

To do so you have to store both file contents in separate data frames.

ADD REPLY • link 5.7 years ago by Nitin Narwade ★ 1.6k

Login before adding your answer.