Splitting NCBI Accession ID
4
1
Entering edit mode
6.9 years ago
Promi ▴ 10

Hi,

I have a file 'ids.txt' containing IDs like this:

gi|78062356|ref|YP_372264.1|

gi|206563435|ref|YP_002234198.1|

gi|402568881|ref|YP_006618225.1|

gi|54024439|ref|YP_118681.1|

gi|146275970|ref|YP_001166130.1|

How can I have them into two columns one having the gi's and the other having all the ref's like as shown below?

78062356       YP_372264.1

206563435     YP_002234198.1

402568881     YP_006618225.1

54024439       YP_118681.1

146275970     YP_001166130.1

Preferably in R or Python.

blast ncbi accessionid R Python • 1.4k views
ADD COMMENT
4
Entering edit mode
6.9 years ago
Sej Modha 5.3k

Python version:

f1=open('text.txt')
for line in f1:
    old=line.rstrip('\n').split("|")
    gi=old[1]
    acc=old[3]
    print(gi+'\t'+acc)

f1.close()
ADD COMMENT
2
Entering edit mode
6.9 years ago
h.mon 35k
cut -d'|' -f2,4 ids.txt | tr '|' '\t'
ADD COMMENT
0
Entering edit mode

Worked perfectly. Thanks!

ADD REPLY
1
Entering edit mode
6.9 years ago
GenoMax 141k

Preferably in R or Python.

When you specify a requirement like you should also say if this is an assignment question. If not, this can be easily done using shell (awk -F '|' '{print $2"\t"$4}' your_file > new_file).

ADD COMMENT
0
Entering edit mode

Worked perfectly! Thanks!

Sorry I didn't get you about the assignment question.

ADD REPLY
0
Entering edit mode

People sometimes ask for solutions in a specific language if they are looking for answers to assignment/homework questions.

ADD REPLY
0
Entering edit mode
6.9 years ago
Sej Modha 5.3k

Simple sed solution:

sed -e 's/gi|//g;s/|[a-z]*|/\t/g' inputfile.txt
ADD COMMENT

Login before adding your answer.

Traffic: 2865 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6