convert rows to columns in r
1
0
Entering edit mode
4.6 years ago
APJ ▴ 40

Hi,

I have a tibble which looks like

 head(TPM_a0)
# A tibble: 6 x 3
  depmap_id  gene_name expression
  <chr>      <chr>          <dbl>
1 ACH-000956 TSPAN6          2.65
2 ACH-000429 TSPAN6          3.85
3 ACH-000857 TSPAN6          5.63
4 ACH-000783 TSPAN6          2.25
5 ACH-000963 TSPAN6          5.11
6 ACH-000812 TSPAN6          4.81

I would like to convert to a dataframe, where each row represents gene_name and each column is a depmap_id. I tried spread() function in R,

TPM_a2 <- TPM_a0 %>% spread(depmap_id, expression)

But ended up with the following error. Any ideas?

Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 64932 rows:
* 917895, 1262407
* 509207, 566047
* 1202487, 1208311, 1230683
* 1044847, 1050811, 1052435, 1052519, 1208703, 1211419
* 202075, 869539
* 293075, 1460703
* 264907, 1588831
* 503411, 569127
* 1568195, 1618959
R • 4.5k views
ADD COMMENT
1
Entering edit mode

Error indicates you must be having duplicate depmap_ids for same gene. For example: You must be having something like this:

ACH-000840 TP53  4.75
ACH-000840 TP53  3.23

So when you try spreading your data frame, it does not know which value to put for gene TP53 depmap_id ACH-000840. Your key-value pair needs to be unique. Check your values at row numbers in your error message to find out which key-value pairs are not unique.

ADD REPLY
0
Entering edit mode
4.6 years ago

Base R solution something like:

reshape(TPM_a0, idvar="gene_name", timevar="depmap_id", direction ="wide")
ADD COMMENT

Login before adding your answer.

Traffic: 2130 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6