Question: How to combine count matrix with mouse gene information from Ensemble using Biomart in R
0
gravatar for lamia_203
20 days ago by
lamia_20330
lamia_20330 wrote:

Hi,

I am trying to add a row of gene id next to my rows of gene symbols. normlaised_mouse3 is my count matrix for rna-seq analysis. I've used to code:

library('biomaRt')
normalised_mouse_biomart <- read.delim("normalised_mouse3.txt")
mart <- useDataset("mmusculus_gene_ensembl", useMart("ensembl"))
Genes <- normalised_mouse_biomart$V2
ensLookup <- gsub("\\.[0-9]*$", "", Genes)
G_list <- getBM(filters= "ensembl_gene_id", 
                attributes= c("ensembl_gene_id" , "mgi_symbol"),
                values= ensLookup,
                mart= mart)
mouse <- cbind(G_list$mgi_symbol, normalised_mouse_biomart)

G_list contains the correct list of gene id and gene symbol but there are error messages when merging the gene ID and count matrix:

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 54439, 54446

I don't understand how 7 gene names are missing from the G_list when I've given the correct number of gene symbols.

R rna-seq biomart ensembl • 125 views
ADD COMMENTlink modified 13 days ago by zx87546.8k • written 20 days ago by lamia_20330
1

Well for starters, how many genes are in ensLookup? How many unique genes in G_list? Is there any possibility that the gsub is mangling a few gene names?

ADD REPLYlink written 20 days ago by swbarnes25.0k

If you want to store meta-data for your genes alongside read-counts for those features in your samples, you might be better storing your data in a edgeR::DGEList.

ADD REPLYlink written 13 days ago by russhh4.2k
3
gravatar for Kevin Blighe
20 days ago by
Kevin Blighe39k
Republic of Ireland
Kevin Blighe39k wrote:

Hey,

I would not do this:

cbind(G_list$mgi_symbol, normalised_mouse_biomart)

Why? - biomaRt will not return the data in the same order as it was submitted. It will neither, most likely, be able to map all genes. It follows, too, that biomaRt may annotate one of your genes multiple times.

You should always check the contents of your objects before performing operations on them. In programming, expect the unexpected.

You can use a combination of !duplicated(), match(), and which() for the purposes of re-ordering the biomaRt return object so that it is harmonised with your input data.

Kevin

ADD COMMENTlink written 20 days ago by Kevin Blighe39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1113 users visited in the last hour