Question: Strange Gene Ids In Tcga
2
gravatar for jack
5.6 years ago by
jack450
jack450 wrote:

Hi all,

I've downloaded RNA-seq data from TCGA, and when I look at different expression data, the ID of first few genes are strange. does anybody knows why ?

gene
?|100130426
?|100133144
?|100134869
?|10357
?|10431
?|136542
?|155060
?|26823
?|280660
?|317712
?|340602
?|388795
annotation tcga genomic rna-seq • 5.4k views
ADD COMMENTlink modified 4.6 years ago by cankutcubuk170 • written 5.6 years ago by jack450
1

I want to know this as well. Will find out for you.

ADD REPLYlink written 5.6 years ago by Ryan D3.3k

What cancer type and which files specifically?

ADD REPLYlink written 5.6 years ago by Chris Miller21k

for example: sample TCGA-A6-2683-01

ADD REPLYlink written 5.6 years ago by jack450
3
gravatar for Ryan D
5.6 years ago by
Ryan D3.3k
USA
Ryan D3.3k wrote:

According to the description file these should be Entrez/LocusLink gene IDs.

For instance, the first one, is LOC100130426, a hypothetical locus. This may explain why many don't have HGNC names. Check out the description in the workflow.

---snip---

File: *.trimmed.annotated.gene.quantification.txt

  • gene: This is the Entrez/LocusLink gene symbol followed by the Entrez/LocusLink gene ID.
  • raw_counts: The number of reads mapping to this gene.
  • median_length_normalized: This is the total aligned bases to all transcript models associated with this gene divided by the mean transcript length.
  • RPKM: See the DESCRIPTION.txt file in the mage-tab bunlde for information on how this is calculated.
ADD COMMENTlink written 5.6 years ago by Ryan D3.3k

Thanks for the solution Ryan, But the links that you posted are broken now. Can please update them? Since the "TCGA Data Portal is no longer operational" where can we find the mapping between TCGA gene Id to Entrez Gene IDs. To be specific I'm working with the BRCA dataset and would like to get the Entrez ID's for my corresponding TCGA IDs.

ADD REPLYlink written 3.0 years ago by Luke0
0
gravatar for cankutcubuk
4.6 years ago by
cankutcubuk170
Spain
cankutcubuk170 wrote:

I have an other question.

Some of the gene_IDs has string extension as "_calculated"

What does it mean?

Example:

==> OV__bcgsc.ca__illuminahiseq_rnaseq__gene.quantification__Jul-08-2014.txt <==
Hybridization REF
gene
?|100132510_calculated
?|100134860_calculated
?|10357_calculated
?|10431_calculated

Cheers

Cankut CUBUK
Computational Genomics Program - Systems Genomics Lab
Centro de Investigación Príncipe Felipe (CIPF)
C/ Eduardo Primo Yúfera nº3
46012 Valencia, Spain
http://bioinfo.cipf.es ​

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by cankutcubuk170

Please post this as a new question rather than adding it as an answer to a year old question.

ADD REPLYlink written 4.6 years ago by Devon Ryan91k

Ok I will do, thanks

ADD REPLYlink written 4.6 years ago by cankutcubuk170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2122 users visited in the last hour