Help with adding new ID names to read count
1
0
Entering edit mode
2.1 years ago
mjs_novice • 0

Hi everyone I have a simple question about adding a new column with a new Identifier name from a read count CSV sheet with only gene names.

Currently I have a mouse RNAseq file with each gene name and read counts for each sample. Image example: enter image description here

I want to use the KaryoploteR package to plot my DEGs on each chromosome. Here is the general info on KaryoploteR: https://bernatgel.github.io/karyoploter_tutorial/

The group had a tutorial using an example fly data and I was successful at getting the same figures they created: https://bernatgel.github.io/karyoploter_tutorial//Examples/GeneExpression/GeneExpression.html

Now I wanted to load my data and use mouse genome to run this analysis. However, my starting data has gene names instead of a naming convention their package can use. In their practice tutorial, they used an identifier from FlyBase.

I want to add a new column next to the gene names in my dataset with a new identifier that I can run with DESeq2 and KaryotypeR. I just do not know how to: 1) Tell Rstudio to look up my genes and find a new naming identifier 2) Create a new column in my data table 3) Insert the new names into the column next to the correct gene name

I have tried BioMart through Ensembl online but this only allows me to convert my gene names. It will not keep my read count data together with the new names.

Thank you for anyone who took the time to help me through my problem! Remember, I am new to Rstudio so please make your advice available to new users.

gene name R column conversion • 706 views
ADD COMMENT
0
Entering edit mode
I have tried BioMart through Ensembl online but this only allows me to convert my gene names. It will not keep my read count data together with the new names.
  1. Merge biomart results with counts data frame (dplyr or merge)
  2. Select columns of interest (dplyr or base R)
  3. Reorder the columns as per your requirements (dplyr or base R)

PS: Never post images of the data.

ADD REPLY
0
Entering edit mode
2.1 years ago
tomas4482 ▴ 390
library(dplyr)

d1<-data.frame(WT=c(2,49,626),KO=c(3,103,686),row.names=c("Xkr4","Rp1","Sox17"))
d1$gene_symbol<-rownames(d1)
d2<-data.frame(gene_symbol=c("Xkr4","Rp1","Sox17"),gene_id=c("ENSG123","ENSG234","ENSG345"))
d3<-left_join(d1,d2,by="gene_symbol")

Here, d3 should include both symbol and id of your genes. Is it as you expected?

BTW. Some packages may only use symbols or ids as input name. But there are some packages use both as input. There will be a parameter to set it either as Entrez id, symbol, ensembl id. something like that. I don't know how KaryoploteR does it. You can check the vignette.

ADD COMMENT

Login before adding your answer.

Traffic: 2538 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6