Question: reactomePA error input file
0
gravatar for camillab.
10 weeks ago by
camillab.10
London
camillab.10 wrote:

Hi, I am relatively new to R so apologies if the code/question is not in the right format! I will to improve! I am trying to perform enrichment analysis with reactomePA (R package) on a smaller list of genes (477) and I have a problem with organizing the dataset. As far as I understood the input file should contain only two-column: Entrez ID (column n.1) and fold change (column n.2). I converted the ensemble ID with Biomart with the online tool, and then created a new file with ID and FC. My dataset:

# A tibble: 6 x 2
  Entrezgene_ID log2fc
  <chr>          <dbl>
1 14             -1.02
2 80755          -1.45
3 60496          -1.17
4 6059           -1.48
5 10061          -1.35
6 10006          -1.51

Then I was trying to following this code:

#load packages
library(org.Hs.eg.db)
library(DOSE)
library(ReactomePA)

 ## feature 1: numeric vector
geneList <- d[,2]

 ## feature 2: named vector
 names(geneList) <- as.character(d[,1])

## feature 3: decreasing order
geneList<- sort(geneList, decreasing = TRUE)
 head(geneList)

But when I try to name the vector I obtain with a list of entrez ID separated by comma and no FC (geneList: 477 obs, 1 variable c ("14", "80755",... and so on). I was expecting to found then in rows next to the fold change, Am I wrong? and of course if I try to run to organise in decreasing order ( "feature 3") I got this error because of course I have basically a list of number included in the " " not associated with any numbers :

Error: Can't subset columns that don't exist. x Locations 141, 373, 119, 229, 230, etc. don't exist. i There are only 1 column.

Thank you very much for your help!

Camilla

vector error rna-seq reactomepa R • 168 views
ADD COMMENTlink modified 10 weeks ago by russhh5.5k • written 10 weeks ago by camillab.10
0
gravatar for russhh
10 weeks ago by
russhh5.5k
UK, U. Glasgow
russhh5.5k wrote:

unlike when you subset a data-frame, when you subset a tibble using tbl[, col] syntax, you always receive a tibble. For a data.frame, extracting a single column in this way would return a vector. What you've done is extract geneList as a Nx1 tibble, and tried to set the names on that.

To extract a vector from a tibble, use genes <- tbl[[col]] syntax, and then use names(genes) <- tbl[[other_col]]

ADD COMMENTlink written 10 weeks ago by russhh5.5k
1

Plus, if you already have tibble loaded, you can use it's deframe function to do this in one step: https://stackoverflow.com/a/56479548/1845650 ; genes <- tibble::deframe(d) . That only works for two-column data-frames: the first column becomes the vector-names and the second column becomes the vector-contents

ADD REPLYlink written 10 weeks ago by russhh5.5k

There are several other ways of doing this mentioned in that SO thread, dplyr::pull for example

ADD REPLYlink written 10 weeks ago by russhh5.5k

so if I use

genes <- tibble::deframe(d)

I should obtain already a vector with my values and their associated "names"?

ADD REPLYlink written 10 weeks ago by camillab.10

Hi! Thank you! so the code should be:

#load packages
library(org.Hs.eg.db)
library(DOSE)
library(ReactomePA)

## feature 1: numeric vector
genes <- d[[,2]]

## feature 2: named vector
names(genes) <- d[[,1]]

## feature 3: decreasing order
geneLIST <- sort(geneList, decreasing = TRUE)
head(geneList)

x <- enrichPathway(gene=geneLIST,pvalueCutoff=0.05, readable=T)

because it gives me this error when I use [[ ]] :

Error: Subscript can't be missing for tibbles in [[.

and if I do :

## feature 1: numeric vector
genes <- d[,2]

## feature 2: named vector
names(genes) <- d[,1]

I have the same results (and error) that before.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by camillab.10
1

No, the code should be genes <- d[[2]]; names(genes) <- d[[1]]

ADD REPLYlink written 10 weeks ago by russhh5.5k
1

On a tibble or data.frame, the [[ function extracts a column as a vector: `[[`(my_df, column_index). It takes the data.frame and a column index as argument; so when it's used as an operator it should look like my_df[[column_index]] (not my_df[[, column_index]])

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by russhh5.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 712 users visited in the last hour