Question: Can't merge gene names with accession numbers
0
gravatar for equinox145111
7 weeks ago by
equinox1451110 wrote:

So I've been working on a script in RStudio to organize/analyze RNASeq data and export it in various formats (i.e., heatmap, PCA, Excel CSV).

In my code, however, my gene names aren't matching with my accession numbers -- I get the following error:

Error in as.vector(x) : no method for coercing this S4 class to a vector

when I try to run this line:

genebm <- merge.data.frame(dds, getBM(filters="external_gene_name", attributes="ensembl_gene_id"), by = "row.names")

Please help!

## RNA-seq analysis with DESeq2
## Adapted in part from Stephen Turner and Margaret Gruca

# Import & pre-process ----------------------------------------------------

library(DESeq2)
library(biomaRt)
library(pheatmap)

directory <- "C:\\Users\\Neal Krishna\\Desktop\\Holster\\RNASeq\\H149ABulkCounts\\Edited\\Human\\HEK"

mart <- useDataset("hsapiens_gene_ensembl", useMart(host = "aug2017.archive.ensembl.org", biomart = "ENSEMBL_MART_ENSEMBL"))

fileNames <- list.files(directory, pattern="*.counts")
sampleNames <- sub("(e.counts)","",fileNames)
conditionNames <- c("FUS1","FUS1", "H149A", "H149A")
sampleTable <- data.frame(sampleName = sampleNames,
fileName = fileNames,
condition = conditionNames)

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
          directory = directory,
          design = ~condition)
dds <- DESeq(ddsHTSeq)
res <- results(dds)

#QC -- Check sizeFactors, should be close to 1 (ideally) -- largely affected by depth (read counts)
sizeFactors(dds)

#dds <- dds[ rowSums(counts(dds)) > 50, ] 

dim(dds)

FPvC_counts <- (counts(dds, normalized=TRUE))[,-c(7,8,9,10)]
FPvC_data <- merge(as.data.frame(res), as.data.frame(FPvC_counts), by='row.names', sort=FALSE)
genes <- FPvC_data$Row.names
new_names <- getBM(filters= "ensembl_gene_id", attributes=c("ensembl_gene_id","external_gene_name"), values=genes,mart=mart)
out <- merge(FPvC_data,new_names,by.x="Row.names",by.y="ensembl_gene_id")
out <- out[,c(c(ncol(out),1:(ncol(out)-1)))]
out <- out[,-2]
write.table(out, file= "NealS2_FUS1.txt", sep='\t')


genebm <- merge.data.frame(dds, getBM(filters="external_gene_name", attributes="ensembl_gene_id"), by = "row.names")
a <- c(genebm[3])
counts_genebm <-counts[genebm$ensembl_gene_id,]
row.names(counts_genebm) <- genebm[,"V1"]
cal_z_score <- function(x){
  (x - mean(x)) / sd(x)
}

dim(dds)
head(dds)

# Plot dispersions
png("qc-dispersions.png", 1000, 1000, pointsize=20)
plotDispEsts(dds, main="Dispersion plot")
dev.off()

# Regularized log transformation for clustering/heatmaps, etc
rld <- rlogTransformation(dds)
head(assay(rld))
hist(assay(rld))

# Colors for plots below
## Ugly:
## (mycols <- 1:length(unique(condition)))
## Use RColorBrewer, better
library(RColorBrewer)
(mycols <- brewer.pal(8, "Dark2")[1:length(unique(condition))])

# Sample distance heatmap
sampleDists <- as.matrix(dist(t(assay(rld))))
library(gplots)
png("qc-heatmap-samples.png", w=1000, h=1000, pointsize=20)
heatmap.2(as.matrix(sampleDists), key=F, trace="none",
          col=colorpanel(100, "orange", "red"),
         # ColSideColors=mycols[condition], 
         # RowSideColors=mycols[condition],
          margin=c(10, 10), main="Sample Distance Matrix")
dev.off()

# Principal components analysis
## Could do with built-in DESeq2 function:
## DESeq2::plotPCA(rld, intgroup="condition")
rld_pca <- function (rld, intgroup = "condition", ntop = 500, colors=NULL, legendpos="bottomleft", main="PCA Biplot", textcx=1, ...) {
  require(genefilter)
  require(calibrate)
  require(RColorBrewer)
  rv = rowVars(assay(rld))
  select = order(rv, decreasing = TRUE)[seq_len(min(ntop, length(rv)))]
  pca = prcomp(t(assay(rld)[select, ]))
  fac = factor(apply(as.data.frame(colData(rld)[, intgroup, drop = FALSE]), 1, paste, collapse = " : "))
  if (is.null(colors)) {
    if (nlevels(fac) >= 3) {
      colors = brewer.pal(nlevels(fac), "Paired")
    }   else {
      colors = c("black", "red")
    }
  }
  pc1var <- round(summary(pca)$importance[2,1]*100, digits=1)
  pc2var <- round(summary(pca)$importance[2,2]*100, digits=1)
  pc1lab <- paste0("PC1 (",as.character(pc1var),"%)")
  pc2lab <- paste0("PC1 (",as.character(pc2var),"%)")
  plot(PC2~PC1, data=as.data.frame(pca$x), bg=colors[fac], pch=21, xlab=pc1lab, ylab=pc2lab, main=main, ...)
  with(as.data.frame(pca$x), textxy(PC1, PC2, labs=rownames(as.data.frame(pca$x)), cex=textcx))
  legend(legendpos, legend=levels(fac), col=colors, pch=20)
  #     rldyplot(PC2 ~ PC1, groups = fac, data = as.data.frame(pca$rld),
  #            pch = 16, cerld = 2, aspect = "iso", col = colours, main = draw.key(key = list(rect = list(col = colours),
  #                                                                                         terldt = list(levels(fac)), rep = FALSE)))
}

png("qc-pca.png", 1400, 1000, pointsize=20)
rld_pca(rld, colors=mycols, intgroup="condition", xlim=c(-75, 35))
dev.off()


#condx <- "TNFaDex10"
#condy <- "Untreated10"

# Get differential expression results, specify which conditions to contrast (max=2)
res <- results(dds, contrast = c("condition", condx, condy))
table(res$padj<0.05)
## Order by adjusted p-value
res <- res[order(res$padj), ]
## Merge with normalized count data
resdata <- merge(as.data.frame(res), as.data.frame(counts(dds, normalized=TRUE)), by="row.names", sort=FALSE)
names(resdata)[1] <- "Gene"
head(resdata)
## Write results
resdata <- resdata[complete.cases(resdata), ]
write.csv(resdata, file="diffexprresultsHEK.csv")
dev.off()
rna-seq R gene • 177 views
ADD COMMENTlink modified 7 weeks ago by swbarnes28.2k • written 7 weeks ago by equinox1451110

Get your datasets to the easiest forms to merge. Ensure they're both data.frame objects and they have a column each with matching values and compatible type. That should take care of the error.

ADD REPLYlink written 7 weeks ago by RamRS28k

Hmm, how do I do that?

ADD REPLYlink written 7 weeks ago by equinox1451110

You wish to compare res to the data.frame output of getBM(). Save the getBM() call to an object, say getbm_res. Check to see if class(getbm_res) gives you data.frame as one of the classes. Similarly, see if class(res) also has data.frame. You can even use is.data.frame(getbm_res) and is.data.frame(res) to check this. Once you're sure they're both data.frames, you can merge them.

ADD REPLYlink written 7 weeks ago by RamRS28k

Hm, I don't think I'm doing my merge right. I tried this:

## Write results
resdata <- resdata[complete.cases(resdata), ]

#to convert gene accession number to gene name
theBM <- getBM(filters="external_gene_name", 
      attributes="ensembl_gene_id",
      values=1,
      mart=mart)
resdata <- merge.data.frame(resdata, theBM)

But the dimensions of my resulting matrix (resdata) are vastly different (before merge it's 20381 x 11, and after it's 0 x 12).

ADD REPLYlink written 7 weeks ago by equinox1451110

That tells me that while you have matching column names between the two data frames, the matching columns don't have any values that can be matched. Can you show me the output to:

head(resdata)
head(theBM)
ADD REPLYlink written 7 weeks ago by RamRS28k
> head(resdata)
               genes baseMean log2FoldChange      lfcSE      stat
1  ENSG00000261150.2 4093.856      -7.916610 0.19208846 -41.21335
2 ENSG00000164877.18 2362.584      -5.263462 0.14933304 -35.24647
3 ENSG00000120334.15 3727.905       3.323255 0.10535523  31.54334
4 ENSG00000100906.10 6269.836       2.954827 0.09756926  30.28440
5  ENSG00000182759.3 1514.207      -5.670821 0.19420807 -29.19972
6  ENSG00000124145.6 4802.562       2.793963 0.09658532  28.92741
         pvalue          padj HEK-FUS1-1.counts HEK-FUS1-2.counts
1  0.000000e+00  0.000000e+00         8260.9704         8046.9633
2 3.886087e-272 3.960117e-268         5075.5152         4134.7980
3 2.213074e-218 1.503489e-214          665.0855          690.6977
4 1.839839e-201 9.374439e-198         1513.6129         1346.5252
5 1.955277e-187 7.970100e-184         3440.1873         2499.9233
6 5.400106e-184 1.834326e-180         1262.3583         1154.7392
  HEK-H149A-1.counts HEK-H149A-2.counts
1           34.32842           33.16120
2          113.82581          126.19680
3         6450.12946         7105.70891
4        10806.22529        11412.98074
5           60.52642           56.18982
6         8302.96076         8490.18914

> head(theBM)
  hgnc_symbol
1      NIPAL3
2       LAS1L
3       ENPP4
4      SEMA3F
5      ANKIB1
6       MYH16

Thank you

ADD REPLYlink written 7 weeks ago by equinox1451110

You have no matching columns.

ADD REPLYlink written 7 weeks ago by RamRS28k

I'm trying to have the HGNC symbols replace the "genes" Ensembl ID's, is merge not the correct function?

ADD REPLYlink written 7 weeks ago by equinox1451110

Hello equinox145111!

We believe that this post no longer fits on this site.

You jumped to creating a new post with more specifics: Problem with merge data while trying to convert gene names

For this reason we have closed this question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by RamRS28k

Hello equinox145111!

We believe that this post does not fit the main topic of this site.

See for explanation: C: Problem with merge data while trying to convert gene names

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by ATpoint36k
0
gravatar for swbarnes2
7 weeks ago by
swbarnes28.2k
United States
swbarnes28.2k wrote:

You can't add gene data to dds like that. It's a complex object, and you can't coerce it into being a data frame. Maybe you could put that data in rowData somehow, but not with merge.

You can make the results file a data frame, and you can add your gene annotation data there. That's where you want it anyway; that's what you look at, not dds.

ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by swbarnes28.2k

Thanks so much! I'm really new to R though -- how would I do that? Again, thanks :).

ADD REPLYlink written 7 weeks ago by equinox1451110

Just merge it like you merged the count data to the results.

ADD REPLYlink written 7 weeks ago by swbarnes28.2k

I don't think I'm doing my merge right. I tried this:

## Write results
resdata <- resdata[complete.cases(resdata), ]

#to convert gene accession number to gene name
theBM <- getBM(filters="external_gene_name", 
      attributes="ensembl_gene_id",
      values=genes,
      mart=mart)
resdata <- merge.data.frame(resdata, theBM, by.x="genes",by.y="ensembl_gene_id")

But the dimensions of my resulting matrix (resdata) are vastly different (before merge it's 20381 x 11, and after it's 0 x 12). This is a bit more updated (what I've tried) than my previous comment above -- but it's still not working and I have no idea how to proceed :(. Thank you.

ADD REPLYlink written 7 weeks ago by equinox1451110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1552 users visited in the last hour