Question: Adding gene names to ballgown object gexpr(bg)
0
gravatar for maria.traka
16 months ago by
maria.traka20
maria.traka20 wrote:

Hi, I'm having an issue with adding names to the gene expression of a bg object. It seems that the texpr$gene_id has multiple entries for the same MSTRGx (as expected I guess considering it's different isoforms for the same gene) but unfortunately for some of the genes the first one of the texpr entries is "." and not the actual gene name. This results in my gene names having lots of ".". How can i work around this? I am missing a lot of genes here from all my downstream functional analysis. Can you help? Thanks, Maria

gene_expression_ESC = gexpr(bg_ESC_89)
indicesG <- match(rownames(gene_expression_ESC), texpr(bg_ESC_89, 'all')$gene_id)
gene_names_F <- texpr(bg_ESC_89, 'all')$gene_name[indicesG]
gene_names_T <- texpr(bg_ESC_89, 'all')$t_name[indicesG]
gene_expression_ESC_N <- data.frame(geneNames=gene_names_F,ensIDs=gene_names_T, gene_expression_ESC)
rna-seq ballgown • 693 views
ADD COMMENTlink written 16 months ago by maria.traka20

are there any genes/transcripts in reference gtf starting with "."? Validate reference gtf. If there are no issues with gtf, you can filter out those genes starting with "." from texpr object.

ADD REPLYlink written 16 months ago by cpad011211k

I'm using the Ensembl Homo_sapiens.GRCh38.89.gtf dowloaded from their ftp site so it's not that. I suspect these are putative novel isoforms of known genes that are listed and because they happened to be listed before the known transcripts match is hitting those. I have now managed a workaround where as you suggest i remove the "." entries from the texpr object but it seems very convoluted to me. Anyhow, here it is:

whole_tx_table_ESC = texpr(bg_ESC_89, 'all')
A=whole_tx_table_ESC[,c("gene_id","gene_name","t_name")] 
Bi=which(A[,2]!=".") #find out the indices that do not contain "."
B=A[Bi,] #create a new data.frame with gene names 
indicesG <- match(rownames(gene_expression_ESC), B$gene_id)
GE=data.frame(geneNames=B$gene_name[indicesG],ensIDs=B$gene_id[indicesG],ensTID=B$t_name[indicesG], gene_expression_ESC)

Has anyone else had the same problem? I have to say i bumped into this problem when i was looking for something completely different... I can't think why this would be unique to my data...

ADD REPLYlink written 16 months ago by maria.traka20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2234 users visited in the last hour