Question: How do i create and populates new sample columns based on values (samples barcodes) from an existing column
gravatar for andresllucena
5 weeks ago by
andresllucena0 wrote:

I have a data frame as follows:

miRNA_region  barcode                      read_count
MIMAT0000062  TCGA-05-4244-01A-01T-1108-13      14492
MIMAT0000063  TCGA-05-4244-01A-01T-1108-13       8767
MIMAT0000064  TCGA-05-4244-01A-01T-1108-13        610
MIMAT0000065  TCGA-05-4244-01A-01T-1108-13        750
MIMAT0000066  TCGA-05-4244-01A-01T-1108-13        804
MIMAT0000067  TCGA-05-4244-01A-01T-1108-13       4748
MIMAT0000062  TCGA-05-4384-01A-01T-1754-13     505712
MIMAT0000063  TCGA-05-4384-01A-01T-1754-13     121127
MIMAT0000064  TCGA-05-4384-01A-01T-1754-13      12833
MIMAT0000065  TCGA-05-4384-01A-01T-1754-13       1455
MIMAT0000067  TCGA-05-4384-01A-01T-1754-13      15284

Barcode corresponds to different samples. I need to convert the values in ''barcode'' column into new columns and get something like:

miRNA_region    TCGA-05-4244-01A-01T-1108-13    TCGA-05-4384-01A-01T-1754-13
MIMAT0000062    14492                           505712
MIMAT0000063    8767                            121127
MIMAT0000064    610                             12833
MIMAT0000065    750                             1455
MIMAT0000066    804                             15284
tcga R • 171 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by andresllucena0
gravatar for rpolicastro
5 weeks ago by
Bloomington, IN
rpolicastro3.9k wrote:

There are values in your example output that don't appear in your example data, but I'll assume you just want to convert your data from long to wide format.

example data

df <- structure(list(miRNA_region = c("MIMAT0000062", "MIMAT0000063", 
"MIMAT0000062", "MIMAT0000063"), barcode = c("TCGA-05-4244-01A-01T-1108-13", 
"TCGA-05-4244-01A-01T-1108-13", "TCGA-05-4384-01A-01T-1754-13", 
"TCGA-05-4384-01A-01T-1754-13"), read_count = c(14492, 8767, 
505712, 121127)), class = "data.frame", row.names = c(NA, -4L

> df
  miRNA_region                      barcode read_count
1 MIMAT0000062 TCGA-05-4244-01A-01T-1108-13      14492
2 MIMAT0000063 TCGA-05-4244-01A-01T-1108-13       8767
3 MIMAT0000062 TCGA-05-4384-01A-01T-1754-13     505712
4 MIMAT0000063 TCGA-05-4384-01A-01T-1754-13     121127

pivot_wider from tidyr (part of the tidyverse). See vignette("pivot") for more information.


df_wide <- pivot_wider(df, names_from=barcode, values_from=read_count)

> df_wide
# A tibble: 2 x 3
  miRNA_region `TCGA-05-4244-01A-01T-1108-13` `TCGA-05-4384-01A-01T-1754-13`
  <chr>                                 <dbl>                          <dbl>
1 MIMAT0000062                          14492                         505712
2 MIMAT0000063                           8767                         121127

If you have a lot of data it's quicker and more memory efficient to do this with dcast from data.table. See vignette("datatable-reshape") for more information.


df_wide <- dcast(df, miRNA_region ~ barcode, value.var="read_count")

> df_wide
   miRNA_region TCGA-05-4244-01A-01T-1108-13 TCGA-05-4384-01A-01T-1754-13
1: MIMAT0000062                        14492                       505712
2: MIMAT0000063                         8767                       121127
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by rpolicastro3.9k

I'm sorry, I will correct that. I used values from samples after the first one trying to improve the example. Thank you so much.

P.S.: function pivot_wider seem to be from tidyr package and not dplyr.

ADD REPLYlink written 5 weeks ago by andresllucena0

P.S.: function pivot_wider seem to be from tidyr package and not dplyr.

Nice catch, I edited my post with the correction.

ADD REPLYlink written 5 weeks ago by rpolicastro3.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 961 users visited in the last hour