Question: Could this function that maps gene pairs to STRING Database be more efficient?
0
gravatar for arronar
12 months ago by
arronar160
Austria
arronar160 wrote:

Hello.

I'm trying to map PPI information from STRING DB into microarray data that have been converted into adjacency matrix using bicor() function from WGCNA package.

So to do that, I'm checking each pair of genes and send requests to STRING's REST API.

Here you can see the actual code of my function:

mapGenes2PPI <- function( adjacency_matrix )
{
  require(jsonlite)

  genes_vector <- row.names( adjacency_matrix )

  # Create a mask matrix that will contain either 0 when not interacting or 1 when interaction happens
  mask_matrix <- matrix( ncol = ncol(adjacency_matrix) , nrow = nrow(adjacency_matrix) )
  colnames(mask_matrix) <- colnames(adjacency_matrix)
  row.names(mask_matrix) <- row.names(adjacency_matrix)

  for ( gene_index in 1:length(genes_vector) )
  { 
    gene_name_one <- genes_vector[ gene_index ] 
    print( paste0("http://string-db.org/api/json/resolve?species=9606&format=only-ids&identifier=",gene_name_one) ) 
    gene_id_one <-  fromJSON( paste0("http://string-db.org/api/json/resolve?species=9606&format=only-ids&identifier=",gene_name_one) )

    for ( column_index in 1:ncol(adjacency_matrix) )
    {
      gene_name_two <- colnames(adjacency_matrix)[ column_index ]
      print( paste0("http://string-db.org/api/json/resolve?species=9606&format=only-ids&identifier=",gene_name_two) )
      gene_id_two <- fromJSON( paste0("http://string-db.org/api/json/resolve?species=9606&format=only-ids&identifier=",gene_name_two) )

      # Because one gene could have one or more STRING id, we create a table with all possible combinations
      # between the ids of gene_one_id and gene_two_id
      combs <- expand.grid(gene_id_one,gene_id_two)

      if( gene_name_one == gene_name_two )
        next()
      else
      {
        # For each combination on combs matrix
        for ( row in 1:nrow(combs) )
        {
          print( paste0("[+] Checking ", gene_name_one," [", gene_index, "]" , " ( ", combs[row,1], " ) with ", gene_name_two, " [", column_index, "]", " ( ", combs[row,2], " ) ") )

          print( paste0("http://string-db.org/api/json/interactionsList?identifiers=", combs[row,1], "%0D", combs[row,2] ) ) 
          result <- fromJSON( paste0("http://string-db.org/api/json/interactionsList?identifiers=", combs[row,1], "%0D", combs[row,2] ) )

          if( length(result) != 0 )
          {
            print( "Genes interact each other" )
            print( result )
            mask_matrix[ gene_index , column_index ] <- 1
            break
          }
          else
            mask_matrix[ gene_index , column_index ] <- 0

          # Add some delay
          Sys.sleep(1)
        }

      }
    }
  }

  mask_matrix
}

This function returns a mask_matrix that has the same dimensions with the adjacency matrix and values of 1 if interaction exists and 0 otherwise.

The thing now is that it has to check a 20,000*20,000 matrix which seems enormous and it takes a lot of time. So do you thing that there is a better more effiicient way to do such a calculation?

ADD COMMENTlink modified 11 months ago by Biostar ♦♦ 20 • written 12 months ago by arronar160
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1316 users visited in the last hour