Trouble manipulating dataset
1
0
Entering edit mode
5.2 years ago
2405592M ▴ 140

Hi Guys, I'm having trouble with some data manipulation. I have the following table where I have the raw counts from an RNA-seq experiment. I'm trying to group all my counts by tRNA wobble position.

Isodecoder Anticodon Wobble Loci Fragment_type AAV_Ctrl1 AAV_Ctrl2 AAV_Ctrl3 AAV_Cre1 AAV_Cre2 AAV_Cre3
Ala       AGC      A    1   wholecounts         8         3         0       35       58       61
Ala       AGC      A   10   wholecounts         0         0         0        0        2        0
Ala       AGC      A   12   wholecounts         0         0         0        0        0        0
Ala       AGC      A    2   wholecounts      1228       839       766     1115     1525      784
Ala       AGC      A    3   wholecounts       286       125       120      504      387      380
Ala       AGC      A    4   wholecounts       675       541       353      328      452      367


I'm essentially trying to sum counts by Isodecoder and wobble position (i.e for Ala, I'd have 1 value for wobble A, T, C and G for each of my conditions Ctrl1-3 and Cre1-3). Tried to use dplyr with no luck. THANKS IN ADVANCE!!!

RNA-Seq R • 858 views
0
Entering edit mode

4
Entering edit mode
5.2 years ago
shawn.w.foley ★ 1.3k

You can use aggregate through base R to accomplish this if you're just trying to sum by Isodecoder.

> df.agg <- aggregate(df[,-seq(5)],list(df$Isodecoder),sum) > df.agg Group.1 AAV_Ctrl1 AAV_Ctrl2 AAV_Ctrl3 AAV_Cre1 AAV_Cre2 AAV_Cre3 1 Ala 2197 1508 1239 1982 2424 1592  The above command is making a new data frame called df.agg by taking all of the counts in df (removing the first five columns), aggregating it based on the values in df$Isodecoder, and performing the sum function.

0
Entering edit mode

Perfect! Aggregate was the function I needed! Cheers!