Very basic R question: How do I combine this dataframe with a "value"?
1
0
Entering edit mode
19 months ago
cdeantoneo31 ▴ 20

I'm a noob, so I apologize for what is probably a very basic question, but I cant quite figure out how to do what I'm trying to do correctly. I also don't think I have the vocabulary to accurately explain what it is I'm confused about, so I apologize in advance.

I have successfully replaced the ensemble IDs with gene symbols from MGI numerous times with biomart. However, I am struggling with this count file that has the ensemble ID versions

I can remove the version numbers easily using the following, and then I can use biomart to successfully convert the ensemble IDs into symbols

df <- read.csv("Tuveson_counts_LRT.csv", sep=",")
head(df)
                      X    baseMean log2FoldChange     lfcSE      stat    pvalue      padj significant
1  ENSMUSG00000000486.7   1.3283025    -0.78624588 1.5531561 0.4214789 0.9806809        NA        <NA>
2  ENSMUSG00000079557.4  31.1085926     0.08715468 0.3561105 2.7204579 0.6056395 0.9999994        <NA>
3 ENSMUSG00000026276.10 118.3799877    -0.02395615 0.1968759 0.5095415 0.9725655 0.9999994        <NA>
4  ENSMUSG00000032656.8   5.8821849    -0.15815182 0.7890379 0.2655061 0.9919307 0.9999994        <NA>
5  ENSMUSG00000022456.9   0.9019521    -1.93237167 2.0918497 1.4395258 0.8372970        NA        <NA>
6 ENSMUSG00000020486.11   5.8367904     0.12988447 0.7918816 0.6535026 0.9569368 0.9999994        <NA>

genes <- df$X
genes <- gsub("\\..*","", genes)
head(genes)
[1] "ENSMUSG00000000486" "ENSMUSG00000079557" "ENSMUSG00000026276" "ENSMUSG00000032656" "ENSMUSG00000022456"
[6] "ENSMUSG00000020486"

mart <- useDataset("mmusculus_gene_ensembl", useMart("ensembl"))
G_list <- getBM(filters="ensembl_gene_id", 
+                 attributes= c("ensembl_gene_id", "mgi_symbol"), 
+                 values = genes,
+                 mart = mart)
head(G_list)
     ensembl_gene_id mgi_symbol
1 ENSMUSG00000000028      Cdc45
2 ENSMUSG00000000058       Cav2
3 ENSMUSG00000000088      Cox5a
4 ENSMUSG00000000127        Fer
5 ENSMUSG00000000142      Axin2
6 ENSMUSG00000000148      Brat1

Usually I would use the following to merge the output from G_list and the original df, but that wont work now since the "renamed" column is actually the value df$X.

counts_symbol <- merge(df, G_list, by.x ="X", by.y="ensembl_gene_id")
head(counts_symbol)
[1] X              baseMean       log2FoldChange lfcSE          stat           pvalue         padj          
[8] significant    mgi_symbol    
<0 rows> (or 0-length row.names)

So how do I change the actual column X in df so that the version numbers are removed, and so the merge works correctly?

TIA!

rstudio ensembl R biomart • 662 views
ADD COMMENT
0
Entering edit mode
19 months ago
df$X <- gsub("\\.\\d+$", "", df$X)
ADD COMMENT
0
Entering edit mode

yup, that'll do it! tysm

but can you explain the difference between why this didnt work

genes <- df$X
genes <- gsub("\\..*","", genes)

but this did?

ADD REPLY
0
Entering edit mode

genes <- df$X copies the data from the X column and assigns this copy to the genes variable. Since you were operating on a copy of part of the original data.frame, and not the original data.frame itself, the original data.frame remained unchanged.

You could have went back and modified the original data.frame by adding this third line of code to what you have above df$X <- genes, which is overriding the old X column with the modified X column data saved to the genes variable.

ADD REPLY

Login before adding your answer.

Traffic: 2325 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6