removing prob IDs that match to more than one AGI ID
1
0
Entering edit mode
8.2 years ago
zizigolu ★ 4.3k

Hi

I have this list of gene IDs

Affy ID         AGI      
244901_at       ATMG00640      
244902_at       ATMG00650      
244903_at       ATMG00660      
244904_at       ATMG00670      
244905_at       ATMG00680      
244906_at       ATMG00690      
244907_at       ATMG00710      
244908_at       ATMG00720      
244909_at       ATMG00740;AT2G07686      
244910_s_at     ATMG00750;AT2G07686      
244911_at       ATMG00820      
244912_at       AT2G07783;ATMG00830      
244913_at       ATMG00840;AT2G07682      
244914_at       ATMG00850;AT2G07682      
244915_s_at     ATMG00860;AT2G07682      
244916_at       ATMG00880;ATMG00870;AT2G07682      
244917_at       ATMG00880;ATMG00870;AT2G07682      
244918_at       ATMG00890      
244919_at       AT2G07768;ATMG00960

As you consider, some _at IDs match to more than one AGI ID. then how I can remove such _at IDs from my list (which are match to more than one AGI)

Thank you

gene R • 1.3k views
ADD COMMENT
0
Entering edit mode

Do you only want to remove the duplicate ATM id's and keep the _at ID? The solution below would remove that _at ID line.

ADD REPLY
0
Entering edit mode

Thank you, I want to remove the _at ID match to more than one AGI id

ADD REPLY
0
Entering edit mode
8.2 years ago
Alopex • 0

Does it need to be done in R?

In Excel: =FIND(";",B2)

Python:

import csv
with open('inputfile.csv','r') as r, open('outputfile.csv','w') as w:
    for line in r:
        if ';' not in line:
            w.write(line)
ADD COMMENT

Login before adding your answer.

Traffic: 2067 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6