reactomePA error input file
2.8 years ago
camillab. ▴ 130

Hi, I am relatively new to R so apologies if the code/question is not in the right format! I will to improve! I am trying to perform enrichment analysis with reactomePA (R package) on a smaller list of genes (477) and I have a problem with organizing the dataset. As far as I understood the input file should contain only two-column: Entrez ID (column n.1) and fold change (column n.2). I converted the ensemble ID with Biomart with the online tool, and then created a new file with ID and FC. My dataset:

# A tibble: 6 x 2
Entrezgene_ID log2fc
<chr>          <dbl>
1 14             -1.02
2 80755          -1.45
3 60496          -1.17
4 6059           -1.48
5 10061          -1.35
6 10006          -1.51


Then I was trying to following this code:

#load packages
library(org.Hs.eg.db)
library(DOSE)
library(ReactomePA)

## feature 1: numeric vector
geneList <- d[,2]

## feature 2: named vector
names(geneList) <- as.character(d[,1])

## feature 3: decreasing order
geneList<- sort(geneList, decreasing = TRUE)


But when I try to name the vector I obtain with a list of entrez ID separated by comma and no FC (geneList: 477 obs, 1 variable c ("14", "80755",... and so on). I was expecting to found then in rows next to the fold change, Am I wrong? and of course if I try to run to organise in decreasing order ( "feature 3") I got this error because of course I have basically a list of number included in the " " not associated with any numbers :

Error: Can't subset columns that don't exist. x Locations 141, 373, 119, 229, 230, etc. don't exist. i There are only 1 column.

Thank you very much for your help!

Camilla

RNA-Seq R reactomePA error vector • 1.4k views
2.8 years ago
russhh 5.7k

unlike when you subset a data-frame, when you subset a tibble using tbl[, col] syntax, you always receive a tibble. For a data.frame, extracting a single column in this way would return a vector. What you've done is extract geneList as a Nx1 tibble, and tried to set the names on that.

To extract a vector from a tibble, use genes <- tbl[[col]] syntax, and then use names(genes) <- tbl[[other_col]]

Plus, if you already have tibble loaded, you can use it's deframe function to do this in one step: https://stackoverflow.com/a/56479548/1845650 ; genes <- tibble::deframe(d) . That only works for two-column data-frames: the first column becomes the vector-names and the second column becomes the vector-contents

There are several other ways of doing this mentioned in that SO thread, dplyr::pull for example

so if I use

genes <- tibble::deframe(d)

I should obtain already a vector with my values and their associated "names"?

Hi! Thank you! so the code should be:

#load packages
library(org.Hs.eg.db)
library(DOSE)
library(ReactomePA)

## feature 1: numeric vector
genes <- d[[,2]]

## feature 2: named vector
names(genes) <- d[[,1]]

## feature 3: decreasing order
geneLIST <- sort(geneList, decreasing = TRUE)



because it gives me this error when I use [[ ]] :

Error: Subscript can't be missing for tibbles in [[.

and if I do :

## feature 1: numeric vector
genes <- d[,2]

## feature 2: named vector
names(genes) <- d[,1]


I have the same results (and error) that before.

No, the code should be genes <- d[[2]]; names(genes) <- d[[1]]

On a tibble or data.frame, the [[ function extracts a column as a vector: [[(my_df, column_index). It takes the data.frame and a column index as argument; so when it's used as an operator it should look like my_df[[column_index]] (not my_df[[, column_index]])