Question

Could this function that maps gene pairs to STRING Database be more efficient?

0

Entering edit mode

6.1 years ago

arronar ▴ 280

Hello.

I'm trying to map PPI information from STRING DB into microarray data that have been converted into adjacency matrix using bicor() function from WGCNA package.

So to do that, I'm checking each pair of genes and send requests to STRING's REST API.

Here you can see the actual code of my function:

mapGenes2PPI <- function( adjacency_matrix )
{
  require(jsonlite)

  genes_vector <- row.names( adjacency_matrix )

  # Create a mask matrix that will contain either 0 when not interacting or 1 when interaction happens
  mask_matrix <- matrix( ncol = ncol(adjacency_matrix) , nrow = nrow(adjacency_matrix) )
  colnames(mask_matrix) <- colnames(adjacency_matrix)
  row.names(mask_matrix) <- row.names(adjacency_matrix)

  for ( gene_index in 1:length(genes_vector) )
  { 
    gene_name_one <- genes_vector[ gene_index ] 
    print( paste0("http://string-db.org/api/json/resolve?species=9606&format=only-ids&identifier=",gene_name_one) ) 
    gene_id_one <-  fromJSON( paste0("http://string-db.org/api/json/resolve?species=9606&format=only-ids&identifier=",gene_name_one) )

    for ( column_index in 1:ncol(adjacency_matrix) )
    {
      gene_name_two <- colnames(adjacency_matrix)[ column_index ]
      print( paste0("http://string-db.org/api/json/resolve?species=9606&format=only-ids&identifier=",gene_name_two) )
      gene_id_two <- fromJSON( paste0("http://string-db.org/api/json/resolve?species=9606&format=only-ids&identifier=",gene_name_two) )

      # Because one gene could have one or more STRING id, we create a table with all possible combinations
      # between the ids of gene_one_id and gene_two_id
      combs <- expand.grid(gene_id_one,gene_id_two)

      if( gene_name_one == gene_name_two )
        next()
      else
      {
        # For each combination on combs matrix
        for ( row in 1:nrow(combs) )
        {
          print( paste0("[+] Checking ", gene_name_one," [", gene_index, "]" , " ( ", combs[row,1], " ) with ", gene_name_two, " [", column_index, "]", " ( ", combs[row,2], " ) ") )

          print( paste0("http://string-db.org/api/json/interactionsList?identifiers=", combs[row,1], "%0D", combs[row,2] ) ) 
          result <- fromJSON( paste0("http://string-db.org/api/json/interactionsList?identifiers=", combs[row,1], "%0D", combs[row,2] ) )

          if( length(result) != 0 )
          {
            print( "Genes interact each other" )
            print( result )
            mask_matrix[ gene_index , column_index ] <- 1
            break
          }
          else
            mask_matrix[ gene_index , column_index ] <- 0

          # Add some delay
          Sys.sleep(1)
        }

      }
    }
  }

  mask_matrix
}

This function returns a mask_matrix that has the same dimensions with the adjacency matrix and values of 1 if interaction exists and 0 otherwise.

The thing now is that it has to check a 20,000*20,000 matrix which seems enormous and it takes a lot of time. So do you thing that there is a better more effiicient way to do such a calculation?

microarray annotation R STRINGDb • 1.1k views

ADD COMMENT • link updated 6.1 years ago by Biostar 20 • written 6.1 years ago by arronar ▴ 280