Question: gene_Id in ENCODE gene expression table
0
gravatar for fusion.slope
11 weeks ago by
fusion.slope200
fusion.slope200 wrote:

Hello,

I would like to use the expression value of some genes in the ENCODE project.

I have a table with me but I can notice that the name of the genes is just a number. Does anyone know which format is it?

Here example: https://ibb.co/gcefj9

Does anyone know the name of this format so that I can convert to geneId?

Thanks in advance!

conversion encode gene • 189 views
ADD COMMENTlink written 11 weeks ago by fusion.slope200
1

HGNC ID, perhaps? Do you know what kinds of genes you are looking at? For example: https://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=21175

ADD REPLYlink written 11 weeks ago by Alex Reynolds26k

I have a list of genes. I will take some genes that I know the name in the Gene Id and check in this website you suggested if they match. Then i will use http://biodb.jp/ to convert. Thanks for the info.

ADD REPLYlink written 11 weeks ago by fusion.slope200

Length and effective length numbers are small these to be full genes. It would be hard to say what those numeric gene ID's are. Where did you get the file from? Do you have a link?

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by genomax59k

For example the tsv file here:

https://www.encodeproject.org/experiments/ENCSR000CPH/

click in file details

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by fusion.slope200
1

This is what the explanation legend says:

Estimated expression levels from RSEM as a tsv file. The columns are as follows:

column 1: gene_id - gene name of the gene the transcript belongs to (parent gene). If no gene information is provided, gene_id and transcript_id is the same.
column 2: transcript_id(s) - transcript name of this transcript
column 3: length - the transcript's sequence length (poly(A) tail is not counted)
column 4: effective_length - the length containing only the positions that can generate a valid fragment
column 5: expected_count - the sum of the posterior probability of each read comes from this transcript over all reads

truncated for brevity.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by genomax59k

thanks a lot, i think alex already answered my question :) but to confirm i should check rsem output to see which gene_id reference they use..

ADD REPLYlink written 11 weeks ago by fusion.slope200
1

I don't think those are HGNC ID's. They are things which did not have a gene name.

If no gene information is provided, gene_id and transcript_id is the same.

Further down in the file you have normal gene identifiers.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by genomax59k

oh thanks a lot, I have scrolled a bit the file but did not go down enough to see the ENSEMBL gene annotation! much appreciated genomax!

ADD REPLYlink written 11 weeks ago by fusion.slope200

See: How to add images to a Biostars post

ADD REPLYlink written 11 weeks ago by RamRS19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 589 users visited in the last hour