how to write scripts to split GO terms as one per line
3
0
Entering edit mode
4.4 years ago
Anny ▴ 20

Hi all,

I got a file with the first column containing id and second column containing annotated gene ontology numbers. As the following

CPIW_00004002-RA GO:0005515

CPIW_00004002-RA GO:0010997|GO:0097027|GO:1904668

CPIW_00004003-RA GO:0003824|GO:0008152

CPIW_00004003-RA GO:0003987|GO:0016208|GO:0019427

CPIW_00004004-RA GO:0006506|GO:0016021|GO:0016758

CPIW_00004005-RA GO:0004360|GO:1901137

CPIW_00004005-RA GO:0097367|GO:1901135

CPIW_00004006-RA GO:0005515

CPIW_00004007-RA GO:0016787

CPIW_00004016-RA GO:0003824|GO:0046872

I want to split them as one id with one GO term, as

CPIW_00004002-RA GO:0005515

CPIW_00004002-RA GO:0010997

CPIW_00004002-RA GO:0097027

CPIW_00004002-RA GO:1904668

CPIW_00004003-RA GO:0003824

CPIW_00004003-RA GO:0008152

How to write a script to make this work?

Thanks!

Alexie

linux perl python file format • 1.0k views
ADD COMMENT
0
Entering edit mode

This is a programming question, not a bioinformatics one. Ask on StackOverflow.

ADD REPLY
2
Entering edit mode
4.4 years ago
 awk '{n=split($2,a,/\|/); for(i=1;i<=n;++i) print $1,a[i];}' input.txt
ADD COMMENT
2
Entering edit mode
4.4 years ago
st.ph.n ★ 2.6k
#!/usr/bin/env python
import sys

with open(sys.argv[1], 'r') as f:
        for line in f:
                for i in line.strip().split('\t')[1].split('|'):
                        print line.strip().split('\t')[0], '\t', i
ADD COMMENT
1
Entering edit mode
4.4 years ago
Chirag Parsania ★ 1.9k

Using R

library("psych") ## read data from clipboard 
dat <- read.clipboard(header = F)
dat <- apply(dat,2,as.character)
out <- apply(dat,1,function(elem){
        geneId <- elem[1]
        goIds <- elem[2]
        splitted <-  unlist(strsplit(goIds,'\\|',))
        return(cbind(geneID=rep(geneId,length(splitted)), splitted))
})

do.call("rbind",out)

EDIT: 2 Alternative, only few lines

library("psych") ## read data from clipboard 
dat <- read.clipboard(header = F)
out <- lapply(as.character(dat$V2),function(elem){unlist(strsplit(elem,"\\|"))})
cbind(rep(as.character(dat$V1),lengths(out)), unlist(out))
ADD COMMENT

Login before adding your answer.

Traffic: 1735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6