Creating column that groups data from another columns
0
0
Entering edit mode
2.5 years ago
Bioinfo ▴ 20

Hello I have data that look like this

     group     a    b   c   d
     AO1      10  10 14 15
     AO1       8   8 17 15
     AO1      12  15 17 20 
     A02       8   5  2  8
     A02      18   9 12  5
     A02       8   5  2  9
     A03      14  15 10 10  
     A03       9   5  3 16   
     A11      12   5  8  8 
     A11       2   3 12  4 
     A12      12  17 10  2
     A12       9   7  5  8
     A13      18  10 15  9
     A13       2   1  9  4
     A21       9  12  1  7
     A21      10   4 15 13
     A22      17   8  5  8
     A22       9  10  6 17
     A23       7  10  1  3
     A23      17   4 15 12

And i wat to create another columns that groups informations from the columns Group with R software:

so the output i want is this :

   group  supgrp   a   b  c  d
     AO1     A0    10  10 14 15
     AO1     A0     8   8 17 15
     AO1     A0    12  15 17 20 
     A02     A0     8   5  2  8
     A02     A0    18   9 12  5
     A02     A0     8   5  2  9
     A03     A0    14  15 10 10  
     A03     A0     9   5  3 16   
     A11     A1    12   5  8  8 
     A11     A1     2   3 12  4 
     A12     A1    12  17 10  2
     A12     A1     9   7  5  8
     A13     A1    18  10 15  9
     A13     A1     2   1  9  4
     A21     A2     9  12  1  7
     A21     A2    10   4 15 13
     A22     A2    17   8  5  8
     A22     A2     9  10  6 17
     A23     A2     7  10  1  3
     A23     A2    17   4 15 12

Thank you very much

R statistics function data • 795 views
ADD COMMENT
0
Entering edit mode

awk:

$ awk -v OFS="\t" 'NR==1 {print $1, "subgrp", $2,$3,$4,$5}; NR>1 {print $1, substr($1,1,2),$2,$3,$4,$5}' test.txt

works till subgrp 9 assuming that subgrp starts with single letter (A-Z), followed by single number (0-9)

R:

> df=read.table("test.txt",  header = T, row.names = NULL)
> df$subgrp= stringr::str_extract(df$group, "^[A-Z]+[0-9]")
> df
   group  a  b  c  d subgrp
1    A01 10 10 14 15     A0
2    A01  8  8 17 15     A0
3    A01 12 15 17 20     A0
4    A02  8  5  2  8     A0
5    A02 18  9 12  5     A0
6    A02  8  5  2  9     A0
7    A03 14 15 10 10     A0
8    A03  9  5  3 16     A0
9    A11 12  5  8  8     A1
10   A11  2  3 12  4     A1
11   A12 12 17 10  2     A1
12   A12  9  7  5  8     A1
13   A13 18 10 15  9     A1
14   A13  2  1  9  4     A1
15   A21  9 12  1  7     A2
16   A21 10  4 15 13     A2
17   A22 17  8  5  8     A2
18   A22  9 10  6 17     A2
19   A23  7 10  1  3     A2
20   A23 17  4 15 12     A2

works till subgrp 9 assuming that subgrp starts with single or multiple characters, followed by single number (0-9)

@OP: why is A0 is AO instead of A0 like rest of the entries?

ADD REPLY
0
Entering edit mode

Hello , Thank you for your answer Actually the name of the data i have is more complicated , i just give simple example to explain the issue And i want to do this with R software

ADD REPLY

Login before adding your answer.

Traffic: 2885 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6