Question

Can't merge gene names with accession numbers

0

Entering edit mode

3.9 years ago

mintdrink • 0

So I've been working on a script in RStudio to organize/analyze RNASeq data and export it in various formats (i.e., heatmap, PCA, Excel CSV).

In my code, however, my gene names aren't matching with my accession numbers -- I get the following error:

Error in as.vector(x) : no method for coercing this S4 class to a vector

when I try to run this line:

genebm <- merge.data.frame(dds, getBM(filters="external_gene_name", attributes="ensembl_gene_id"), by = "row.names")

Please help!

## RNA-seq analysis with DESeq2
## Adapted in part from Stephen Turner and Margaret Gruca

# Import & pre-process ----------------------------------------------------

library(DESeq2)
library(biomaRt)
library(pheatmap)

directory <- "C:\\Users\\Username\\Desktop\\Holster\\RNASeq\\H149ABulkCounts\\Edited\\Human\\HEK"

mart <- useDataset("hsapiens_gene_ensembl", useMart(host = "aug2017.archive.ensembl.org", biomart = "ENSEMBL_MART_ENSEMBL"))

fileNames <- list.files(directory, pattern="*.counts")
sampleNames <- sub("(e.counts)","",fileNames)
conditionNames <- c("FUS1","FUS1", "H149A", "H149A")
sampleTable <- data.frame(sampleName = sampleNames,
fileName = fileNames,
condition = conditionNames)

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
          directory = directory,
          design = ~condition)
dds <- DESeq(ddsHTSeq)
res <- results(dds)

#QC -- Check sizeFactors, should be close to 1 (ideally) -- largely affected by depth (read counts)
sizeFactors(dds)

#dds <- dds[ rowSums(counts(dds)) > 50, ] 

dim(dds)

FPvC_counts <- (counts(dds, normalized=TRUE))[,-c(7,8,9,10)]
FPvC_data <- merge(as.data.frame(res), as.data.frame(FPvC_counts), by='row.names', sort=FALSE)
genes <- FPvC_data$Row.names
new_names <- getBM(filters= "ensembl_gene_id", attributes=c("ensembl_gene_id","external_gene_name"), values=genes,mart=mart)
out <- merge(FPvC_data,new_names,by.x="Row.names",by.y="ensembl_gene_id")
out <- out[,c(c(ncol(out),1:(ncol(out)-1)))]
out <- out[,-2]
write.table(out, file= "NS2_FUS1.txt", sep='\t')


genebm <- merge.data.frame(dds, getBM(filters="external_gene_name", attributes="ensembl_gene_id"), by = "row.names")
a <- c(genebm[3])
counts_genebm <-counts[genebm$ensembl_gene_id,]
row.names(counts_genebm) <- genebm[,"V1"]
cal_z_score <- function(x){
  (x - mean(x)) / sd(x)
}

dim(dds)
head(dds)

# Plot dispersions
png("qc-dispersions.png", 1000, 1000, pointsize=20)
plotDispEsts(dds, main="Dispersion plot")
dev.off()

# Regularized log transformation for clustering/heatmaps, etc
rld <- rlogTransformation(dds)
head(assay(rld))
hist(assay(rld))

# Colors for plots below
## Ugly:
## (mycols <- 1:length(unique(condition)))
## Use RColorBrewer, better
library(RColorBrewer)
(mycols <- brewer.pal(8, "Dark2")[1:length(unique(condition))])

# Sample distance heatmap
sampleDists <- as.matrix(dist(t(assay(rld))))
library(gplots)
png("qc-heatmap-samples.png", w=1000, h=1000, pointsize=20)
heatmap.2(as.matrix(sampleDists), key=F, trace="none",
          col=colorpanel(100, "orange", "red"),
         # ColSideColors=mycols[condition], 
         # RowSideColors=mycols[condition],
          margin=c(10, 10), main="Sample Distance Matrix")
dev.off()

# Principal components analysis
## Could do with built-in DESeq2 function:
## DESeq2::plotPCA(rld, intgroup="condition")
rld_pca <- function (rld, intgroup = "condition", ntop = 500, colors=NULL, legendpos="bottomleft", main="PCA Biplot", textcx=1, ...) {
  require(genefilter)
  require(calibrate)
  require(RColorBrewer)
  rv = rowVars(assay(rld))
  select = order(rv, decreasing = TRUE)[seq_len(min(ntop, length(rv)))]
  pca = prcomp(t(assay(rld)[select, ]))
  fac = factor(apply(as.data.frame(colData(rld)[, intgroup, drop = FALSE]), 1, paste, collapse = " : "))
  if (is.null(colors)) {
    if (nlevels(fac) >= 3) {
      colors = brewer.pal(nlevels(fac), "Paired")
    }   else {
      colors = c("black", "red")
    }
  }
  pc1var <- round(summary(pca)$importance[2,1]*100, digits=1)
  pc2var <- round(summary(pca)$importance[2,2]*100, digits=1)
  pc1lab <- paste0("PC1 (",as.character(pc1var),"%)")
  pc2lab <- paste0("PC1 (",as.character(pc2var),"%)")
  plot(PC2~PC1, data=as.data.frame(pca$x), bg=colors[fac], pch=21, xlab=pc1lab, ylab=pc2lab, main=main, ...)
  with(as.data.frame(pca$x), textxy(PC1, PC2, labs=rownames(as.data.frame(pca$x)), cex=textcx))
  legend(legendpos, legend=levels(fac), col=colors, pch=20)
  #     rldyplot(PC2 ~ PC1, groups = fac, data = as.data.frame(pca$rld),
  #            pch = 16, cerld = 2, aspect = "iso", col = colours, main = draw.key(key = list(rect = list(col = colours),
  #                                                                                         terldt = list(levels(fac)), rep = FALSE)))
}

png("qc-pca.png", 1400, 1000, pointsize=20)
rld_pca(rld, colors=mycols, intgroup="condition", xlim=c(-75, 35))
dev.off()


#condx <- "TNFaDex10"
#condy <- "Untreated10"

# Get differential expression results, specify which conditions to contrast (max=2)
res <- results(dds, contrast = c("condition", condx, condy))
table(res$padj<0.05)
## Order by adjusted p-value
res <- res[order(res$padj), ]
## Merge with normalized count data
resdata <- merge(as.data.frame(res), as.data.frame(counts(dds, normalized=TRUE)), by="row.names", sort=FALSE)
names(resdata)[1] <- "Gene"
head(resdata)
## Write results
resdata <- resdata[complete.cases(resdata), ]
write.csv(resdata, file="diffexprresultsHEK.csv")
dev.off()

RNA-Seq gene R • 1.8k views

ADD COMMENT • link updated 2.8 years ago by Istvan Albert 100k • written 3.9 years ago by mintdrink • 0

0

Entering edit mode

Get your datasets to the easiest forms to merge. Ensure they're both data.frame objects and they have a column each with matching values and compatible type. That should take care of the error.

ADD REPLY • link 3.9 years ago by Ram 43k

0

Entering edit mode

Hmm, how do I do that?

ADD REPLY • link 3.9 years ago by mintdrink • 0

0

Entering edit mode

You wish to compare res to the data.frame output of getBM(). Save the getBM() call to an object, say getbm_res. Check to see if class(getbm_res) gives you data.frame as one of the classes. Similarly, see if class(res) also has data.frame. You can even use is.data.frame(getbm_res) and is.data.frame(res) to check this. Once you're sure they're both data.frames, you can merge them.

ADD REPLY • link 3.9 years ago by Ram 43k

0

Entering edit mode

Hm, I don't think I'm doing my merge right. I tried this:

## Write results
resdata <- resdata[complete.cases(resdata), ]

#to convert gene accession number to gene name
theBM <- getBM(filters="external_gene_name", 
      attributes="ensembl_gene_id",
      values=1,
      mart=mart)
resdata <- merge.data.frame(resdata, theBM)

But the dimensions of my resulting matrix (resdata) are vastly different (before merge it's 20381 x 11, and after it's 0 x 12).

ADD REPLY • link 3.9 years ago by mintdrink • 0

0

Entering edit mode

That tells me that while you have matching column names between the two data frames, the matching columns don't have any values that can be matched. Can you show me the output to:

head(resdata)
head(theBM)

ADD REPLY • link 3.9 years ago by Ram 43k

0

Entering edit mode

> head(resdata)
               genes baseMean log2FoldChange      lfcSE      stat
1  ENSG00000261150.2 4093.856      -7.916610 0.19208846 -41.21335
2 ENSG00000164877.18 2362.584      -5.263462 0.14933304 -35.24647
3 ENSG00000120334.15 3727.905       3.323255 0.10535523  31.54334
4 ENSG00000100906.10 6269.836       2.954827 0.09756926  30.28440
5  ENSG00000182759.3 1514.207      -5.670821 0.19420807 -29.19972
6  ENSG00000124145.6 4802.562       2.793963 0.09658532  28.92741
         pvalue          padj HEK-FUS1-1.counts HEK-FUS1-2.counts
1  0.000000e+00  0.000000e+00         8260.9704         8046.9633
2 3.886087e-272 3.960117e-268         5075.5152         4134.7980
3 2.213074e-218 1.503489e-214          665.0855          690.6977
4 1.839839e-201 9.374439e-198         1513.6129         1346.5252
5 1.955277e-187 7.970100e-184         3440.1873         2499.9233
6 5.400106e-184 1.834326e-180         1262.3583         1154.7392
  HEK-H149A-1.counts HEK-H149A-2.counts
1           34.32842           33.16120
2          113.82581          126.19680
3         6450.12946         7105.70891
4        10806.22529        11412.98074
5           60.52642           56.18982
6         8302.96076         8490.18914

> head(theBM)
  hgnc_symbol
1      NIPAL3
2       LAS1L
3       ENPP4
4      SEMA3F
5      ANKIB1
6       MYH16

Thank you

ADD REPLY • link 3.9 years ago by mintdrink • 0

0

Entering edit mode

You have no matching columns.

ADD REPLY • link 3.9 years ago by Ram 43k

0

Entering edit mode

I'm trying to have the HGNC symbols replace the "genes" Ensembl ID's, is merge not the correct function?

ADD REPLY • link 3.9 years ago by mintdrink • 0

0

Entering edit mode

Hello equinox145111!

We believe that this post no longer fits on this site.

You jumped to creating a new post with more specifics: Problem with merge data while trying to convert gene names

For this reason we have closed this question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLY • link 3.9 years ago by Ram 43k

0

Entering edit mode

Hello equinox145111!

We believe that this post does not fit the main topic of this site.

See for explanation: C: Problem with merge data while trying to convert gene names

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLY • link 3.9 years ago by ATpoint 82k

score 0 · Answer 1 · 2020-06-19

0

Entering edit mode

3.9 years ago

swbarnes2 14k

You can't add gene data to dds like that. It's a complex object, and you can't coerce it into being a data frame. Maybe you could put that data in rowData somehow, but not with merge.

You can make the results file a data frame, and you can add your gene annotation data there. That's where you want it anyway; that's what you look at, not dds.

ADD COMMENT • link 3.9 years ago by swbarnes2 14k

0

Entering edit mode

Thanks so much! I'm really new to R though -- how would I do that? Again, thanks :).

ADD REPLY • link 3.9 years ago by mintdrink • 0

0

Entering edit mode

Just merge it like you merged the count data to the results.

ADD REPLY • link 3.9 years ago by swbarnes2 14k

0

Entering edit mode

I don't think I'm doing my merge right. I tried this:

## Write results
resdata <- resdata[complete.cases(resdata), ]

#to convert gene accession number to gene name
theBM <- getBM(filters="external_gene_name", 
      attributes="ensembl_gene_id",
      values=genes,
      mart=mart)
resdata <- merge.data.frame(resdata, theBM, by.x="genes",by.y="ensembl_gene_id")

But the dimensions of my resulting matrix (resdata) are vastly different (before merge it's 20381 x 11, and after it's 0 x 12). This is a bit more updated (what I've tried) than my previous comment above -- but it's still not working and I have no idea how to proceed :(. Thank you.

ADD REPLY • link 3.9 years ago by mintdrink • 0