Split delimited text into a new row
2
0
Entering edit mode
3.6 years ago
AP ▴ 80

Hello everyone,

I have a tab delimitted file like this with three columns,

FUN_004018-T1   359 GO:0016491|GO:0046872|GO:0055114
FUN_003797-T1   570 GO:0000287|GO:0030976
FUN_003797-T1   570 GO:0030976
FUN_003797-T1   570 GO:0016831

I want to arrange this file for further analysis like below:

FUN_004018-T1   359 GO:0016491
FUN_004018-T1   359  GO:0046872
FUN_004018-T1   359  GO:0055114
FUN_003797-T1   570 GO:0000287
FUN_003797-T1     570  GO:0030976
FUN_003797-T1   570 GO:0030976
FUN_003797-T1   570 GO:0016831

Could you please help on how I can achieve this. Thank you

sed awk grep • 807 views
ADD COMMENT
1
Entering edit mode

Use awk. Split $3 by | and print $1 $2 $3_split[i] for each component of $3.

Please use the above outline to write code yourself.

Or use R and tidyr::separate_rows - that would be a lot easier. See: https://tidyr.tidyverse.org/reference/separate_rows.html

ADD REPLY
3
Entering edit mode
3.6 years ago
 awk '{N=split($3,a,/\|/);for(i=1;i<=N;i++) print $1,$2 ,a[i]}' input.txt
ADD COMMENT
4
Entering edit mode
3.6 years ago
Ram 43k

Here's a tidyverse solution, just because:

tmp_df <- read.table(text="FUN_004018-T1 359 GO:0016491|GO:0046872|GO:0055114
FUN_003797-T1 570 GO:0000287|GO:0030976
FUN_003797-T1 570 GO:0030976
FUN_003797-T1 570 GO:0016831", sep=" ", header=FALSE, stringsAsFactors = FALSE, col.names = c('col1', 'col2','col3'))

tmp_df
           col1 col2                             col3
1 FUN_004018-T1  359 GO:0016491|GO:0046872|GO:0055114
2 FUN_003797-T1  570            GO:0000287|GO:0030976
3 FUN_003797-T1  570                       GO:0030976
4 FUN_003797-T1  570                       GO:0016831

tidyr::separate_rows(tmp_df, col3, sep = "[|]")
# A tibble: 7 x 3
  col1           col2 col3      
  <chr>         <int> <chr>     
1 FUN_004018-T1   359 GO:0016491
2 FUN_004018-T1   359 GO:0046872
3 FUN_004018-T1   359 GO:0055114
4 FUN_003797-T1   570 GO:0000287
5 FUN_003797-T1   570 GO:0030976
6 FUN_003797-T1   570 GO:0030976
7 FUN_003797-T1   570 GO:0016831
ADD COMMENT
0
Entering edit mode

It's always a GREAT sight when I see somebody post a Tidyverse solution. :D

ADD REPLY

Login before adding your answer.

Traffic: 2004 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6