Question

Help with adding new ID names to read count

0

Entering edit mode

2.1 years ago

mjs_novice • 0

Hi everyone I have a simple question about adding a new column with a new Identifier name from a read count CSV sheet with only gene names.

Currently I have a mouse RNAseq file with each gene name and read counts for each sample. Image example: enter image description here

I want to use the KaryoploteR package to plot my DEGs on each chromosome. Here is the general info on KaryoploteR: https://bernatgel.github.io/karyoploter_tutorial/

The group had a tutorial using an example fly data and I was successful at getting the same figures they created: https://bernatgel.github.io/karyoploter_tutorial//Examples/GeneExpression/GeneExpression.html

Now I wanted to load my data and use mouse genome to run this analysis. However, my starting data has gene names instead of a naming convention their package can use. In their practice tutorial, they used an identifier from FlyBase.

I want to add a new column next to the gene names in my dataset with a new identifier that I can run with DESeq2 and KaryotypeR. I just do not know how to: 1) Tell Rstudio to look up my genes and find a new naming identifier 2) Create a new column in my data table 3) Insert the new names into the column next to the correct gene name

I have tried BioMart through Ensembl online but this only allows me to convert my gene names. It will not keep my read count data together with the new names.

Thank you for anyone who took the time to help me through my problem! Remember, I am new to Rstudio so please make your advice available to new users.

gene name R column conversion • 706 views

ADD COMMENT • link updated 2.1 years ago by cpad0112 21k • written 2.1 years ago by mjs_novice • 0

0

Entering edit mode

I have tried BioMart through Ensembl online but this only allows me to convert my gene names. It will not keep my read count data together with the new names.

Merge biomart results with counts data frame (dplyr or merge)
Select columns of interest (dplyr or base R)
Reorder the columns as per your requirements (dplyr or base R)

PS: Never post images of the data.

ADD REPLY • link 2.1 years ago by cpad0112 21k

score 0 · Answer 1 · 2022-04-05

library(dplyr)

d1<-data.frame(WT=c(2,49,626),KO=c(3,103,686),row.names=c("Xkr4","Rp1","Sox17"))
d1$gene_symbol<-rownames(d1)
d2<-data.frame(gene_symbol=c("Xkr4","Rp1","Sox17"),gene_id=c("ENSG123","ENSG234","ENSG345"))
d3<-left_join(d1,d2,by="gene_symbol")

Here, d3 should include both symbol and id of your genes. Is it as you expected?

BTW. Some packages may only use symbols or ids as input name. But there are some packages use both as input. There will be a parameter to set it either as Entrez id, symbol, ensembl id. something like that. I don't know how KaryoploteR does it. You can check the vignette.