Could this function that maps gene pairs to STRING Database be more efficient?
0
0
Entering edit mode
6.1 years ago
arronar ▴ 280

Hello.

I'm trying to map PPI information from STRING DB into microarray data that have been converted into adjacency matrix using bicor() function from WGCNA package.

So to do that, I'm checking each pair of genes and send requests to STRING's REST API.

Here you can see the actual code of my function:

mapGenes2PPI <- function( adjacency_matrix )
{
  require(jsonlite)

  genes_vector <- row.names( adjacency_matrix )

  # Create a mask matrix that will contain either 0 when not interacting or 1 when interaction happens
  mask_matrix <- matrix( ncol = ncol(adjacency_matrix) , nrow = nrow(adjacency_matrix) )
  colnames(mask_matrix) <- colnames(adjacency_matrix)
  row.names(mask_matrix) <- row.names(adjacency_matrix)

  for ( gene_index in 1:length(genes_vector) )
  { 
    gene_name_one <- genes_vector[ gene_index ] 
    print( paste0("http://string-db.org/api/json/resolve?species=9606&format=only-ids&identifier=",gene_name_one) ) 
    gene_id_one <-  fromJSON( paste0("http://string-db.org/api/json/resolve?species=9606&format=only-ids&identifier=",gene_name_one) )

    for ( column_index in 1:ncol(adjacency_matrix) )
    {
      gene_name_two <- colnames(adjacency_matrix)[ column_index ]
      print( paste0("http://string-db.org/api/json/resolve?species=9606&format=only-ids&identifier=",gene_name_two) )
      gene_id_two <- fromJSON( paste0("http://string-db.org/api/json/resolve?species=9606&format=only-ids&identifier=",gene_name_two) )

      # Because one gene could have one or more STRING id, we create a table with all possible combinations
      # between the ids of gene_one_id and gene_two_id
      combs <- expand.grid(gene_id_one,gene_id_two)

      if( gene_name_one == gene_name_two )
        next()
      else
      {
        # For each combination on combs matrix
        for ( row in 1:nrow(combs) )
        {
          print( paste0("[+] Checking ", gene_name_one," [", gene_index, "]" , " ( ", combs[row,1], " ) with ", gene_name_two, " [", column_index, "]", " ( ", combs[row,2], " ) ") )

          print( paste0("http://string-db.org/api/json/interactionsList?identifiers=", combs[row,1], "%0D", combs[row,2] ) ) 
          result <- fromJSON( paste0("http://string-db.org/api/json/interactionsList?identifiers=", combs[row,1], "%0D", combs[row,2] ) )

          if( length(result) != 0 )
          {
            print( "Genes interact each other" )
            print( result )
            mask_matrix[ gene_index , column_index ] <- 1
            break
          }
          else
            mask_matrix[ gene_index , column_index ] <- 0

          # Add some delay
          Sys.sleep(1)
        }

      }
    }
  }

  mask_matrix
}

This function returns a mask_matrix that has the same dimensions with the adjacency matrix and values of 1 if interaction exists and 0 otherwise.

The thing now is that it has to check a 20,000*20,000 matrix which seems enormous and it takes a lot of time. So do you thing that there is a better more effiicient way to do such a calculation?

microarray annotation R STRINGDb • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 2094 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6