MIPS data access
1
0
Entering edit mode
9.2 years ago
deepthi_vr • 0

I am doing a Protein protein interaction network clustering project. For testing the result I want to download protein protein interaction data set from MIPS(Munich Information Center for Protein Sequences). I've downloaded data from http://mips.helmholtz-muenchen.de/proj/ppi/.

But it is in PSI-MI format. Can anyone help me in extracting data set from this format.Also how to get benchmark set of MIPS. Old links are showing as not found. Please help me.

MIPS-PPI-dataset • 3.2k views
ADD COMMENT
0
Entering edit mode
9.2 years ago

The format is XML so you could parse it with whatever tool you normally use for parsing XML. Here is some perl code I've been using:

use strict;
use warnings;
use XML::Simple;

my $MIPS_file = $ARGV[0];
my $xml = XML::Simple->new();
my $data = $xml->XMLin("$MIPS_file");
my $intList = $data->{'entry'}->{'interactionList'}->{'interaction'};
foreach my $int (@{$intList}) {
  my $experiment_type = $int->{'experimentList'}->{'experimentDescription'}->{'interactionDetection'}->{'names'}->{'shortLabel'};
  my $partList = $int->{'participantList'}->{'proteinParticipant'};
  my ($p1,$p2);
  foreach my $protPart(@{$partList}) {
      if ($protPart->{'proteinInteractor'}->{'organism'}->{'ncbiTaxId'} eq "9606") { # select human proteins
    if (!$p1) {
      $p1 = $protPart->{'proteinInteractor'}->{'xref'}->{'primaryRef'}->{'id'};
    }
    else {
      $p2 = $protPart->{'proteinInteractor'}->{'xref'}->{'primaryRef'}->{'id'};
    }
      }
  }
  print "$p1\$p2\n";
}

I am not sure what benchmark you're referring to. There used to be a file with protein complexes that many people used as reference but I think this isn't a good dataset for benchmarking protein interaction clustering algorithms because many large housekeeping complexes are overrepresented (e.g. ribosome, polymerases). As an alternative, you could use Reactome which also has annotated protein complexes. Keep in mind though that what some biologists will view as one complex will be seen as two complexes by others so you should choose a reference dataset that matches the level of granularity you want to achieve with your clustering.

ADD COMMENT
0
Entering edit mode

In many papers which I'd referred for clustering uses MIPS data set. But now the site they referenced for mips data is not getting. So I downloaded data from http://mips.helmholtz-muenchen.de/proj/ppi/. Actually I don't know if it is the correct data or not. Also I need specifically the ppi data set of Saccharomyces cerevisiae. Can you please help me on that.

ADD REPLY
0
Entering edit mode

The link you got points to mammalian protein data. You won't find yeast proteins in there. I think the MIPS data are not maintained anymore. I suggest you try an up-to-date well-maintained database like IntAct. You can download the S. cerevisiae interactions from their ftp site.

ADD REPLY
0
Entering edit mode

Thank you very much for that information. Since I've to compare my result with some existing papers and most of them are using MIPS, DIP or Biogrid data I went for that. I've downloaded data set of yeast from DIP and the problem is that DIP id is there, but not the common name or ORF name. I'm planning to compare result with CYC2008 benchmark, which has common name/ORF name for proteins in complexes. Could you please help me in that(converting DIP id to Common name)? Thank you once again for your help

ADD REPLY
0
Entering edit mode

The files available in the download section of DIP should already have gene symbols and gene names. They also contain RefSeq and UniProt IDs so you could collect these and use them as input to Ensembl's Biomart to get other names or identifiers not present in the files.

ADD REPLY
0
Entering edit mode

If downloading data dated recently , it doesn't have gene symbols. It has DIP-id and Reseq/Uniprot IDs(not for every interaction). Every interaction has only DIP-ID.

ADD REPLY
0
Entering edit mode

Are you sure you have the right file ? Here is how the first S. cerevisiae interactions in the Mi-tab file look like (first two columns only):

ID interactor A    ID interactor B
DIP-328N    DIP-232N|uniprotkb:Q07812
DIP-1048N|refseq:NP_002871|uniprotkb:P04049    DIP-1043N|refseq:NP_000624|uniprotkb:P10415

The PSI-MI (XML) file has gene names e.g.:

<interactor id="671">
<names>
<shortLabel>ILV6</shortLabel>
<fullName>Acetolactate synthase small subunit, mitochondrial precursor</fullName>
</names>
<xref>
<primaryRef db="dip" dbAc="MI:0465" id="DIP-671N" refType="identity" refTypeAc="MI:0356"/>
<secondaryRef db="refseq" dbAc="MI:0481" id="NP_009918" refType="identity" refTypeAc="MI:0356"/>
<secondaryRef db="uniprot knowledge base" dbAc="MI:0486" id="P25605" refType="identity" refTypeAc="MI:0356"/>
<secondaryRef db="entrez gene/locuslink" dbAc="MI:0477" id="850348" refType="gene product" refTypeAc="MI:0251"/>
</xref>
ADD REPLY
0
Entering edit mode

I am using tab files, that is not having gene names. I will now try the PSI-Mi file. Thank you very much

ADD REPLY
0
Entering edit mode

I tried PSIMI file with a perl script , but not working. Could you please give me script for parsing DIP ppi interactions only. Thank you

ADD REPLY
0
Entering edit mode

I tried this code. But not getting any output.The argument is the dip file.mif25 type.

use strict;
use warnings;
use XML::Simple;

my $DIP_file = $ARGV[0];
my $xml = XML::Simple->new();
my $data = $xml->XMLin("$DIP_file");
my $intList = $data->{'entrySet'}->{'entry'}->{'interactorList'}->{'interactor'};
print $intList;
foreach my $int (@{$intList})
 {
   print $int->{'names'}->{'shortLabel'}->text;
}

Could you please help me

ADD REPLY
0
Entering edit mode

Interactions are in the <interactionList> section. Try something like

my $intList = $data->{'entry'}->{'interactionList'}->{'interaction'};
ADD REPLY

Login before adding your answer.

Traffic: 1520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6