Question

how the duplicated interactions looks like in HPRD data ?

0

Entering edit mode

6.9 years ago

Chaimaa ▴ 260

Hi, guys, I have downloaded HPRD database Release 9 which have 39240 interactions and i want to delete the self-interactions and duplicated interactions programmaticaly, but i don't know how the duplicated interactions looks like in this data .

I mean self-interactions like this:FES FES, but what about duplicated interactions looks like ? Plz I appreciate any help !

HPRD • 1.3k views

ADD COMMENT • link 6.9 years ago by Chaimaa ▴ 260

score 1 · Answer 1 · 2017-05-19

1

Entering edit mode

6.9 years ago

Jean-Karim Heriche 27k

The way to go is to map all identifiers used by HPRD to the same annotated reference genome e.g. EnsEMBL to make sure that each protein has the same ID throughout the data then look for multiple occurrences of ID1-ID2 and ID2-ID1. Note that HPRD data is >7 years old and some of the identifiers used may be obsolete.

ADD COMMENT • link 6.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

@Jean-Karim Heriche Friend is HPRD data release 9 has such form of duplicated interactions? like {'A' 'B' } and {'B' 'A'}?

ADD REPLY • link 6.9 years ago by Chaimaa ▴ 260

0

Entering edit mode

I don't remember and I don't have the data anymore. Anyway, my scripts were always set up to remove duplicates unless I cared about the distinction (e.g. different types of experiments). You don't say what you're trying to do but if you want a more comprehensive set of human protein interactions, I would suggest to use iRefIndex.

ADD REPLY • link 6.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

I have the HPRD data release9 as a textfile and I want to remove the duplicated interactions from it.

ADD REPLY • link 6.9 years ago by Chaimaa ▴ 260

0

Entering edit mode

I understood that you have HPRD data and want to remove duplicates. I already answered this: just write your data processing script in such a way that if there are duplicates, it deals with them in the way you want. If you just want to know whether or not there are duplicates, just write a simple script to find out. By "what you're trying to do", I was referring to what biological question you're trying to answer and wondering whether HPRD is the best data set for this.

ADD REPLY • link 6.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

I want to build a network by linking the different list of genes I found based on any human data (should be human data) is iRefIndex.can do this job? is it human data?

ADD REPLY • link 6.9 years ago by Chaimaa ▴ 260

1

Entering edit mode

IRefIndex is a compilation of several protein-protein interaction databases and so includes human data. Read the paper to understand how it's done. To get human data only, just filter on the taxon ID in the relevant columns. So if you need to look for interactions involving genes in your lists then you're better off using iRefIndex (or any other compilation of multiple data sources) than just a single (outdated) data source.

To access the iRefIndex data, you can also use the iRefR package for R and there's a plug-in for Cytoscape 2.8. Finally there's also a web interface at iRefWeb.

ADD REPLY • link 6.9 years ago by Jean-Karim Heriche 27k