TCGA data Query (GDCquery): external_gene_name " are missing
1
0
Entering edit mode
3 months ago

Hi, I just got some weird output from TCGA dataset. As you can see in the below picture, some of the "external_gene_name " are missing. Would you please help me out with this issue? Thank you. enter image description here


query.seq <- GDCquery(project = "TCGA-BRCA", 
                      data.category = "Transcriptome Profiling", 
                      data.type = "Gene Expression Quantification",
                      sample.type = c("Solid Tissue Normal", "Primary Tumor"),
                      workflow.type = "HTSeq - Counts")


GDCdownload(query.seq)

seq.brca <- GDCprepare(query = query.seq, summarizedExperiment = TRUE)
DATASET GDCquery GDCprepare TCGA GDCdownload • 341 views
ADD COMMENT
0
Entering edit mode
3 months ago
GenoMax 106k

ENSG00000281904 is annotated as novel gene so that is why you have no official gene name. This gene was manually annotated by Ensembl. Others may be similar.

ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank you, So do you mean that I can neglect them for my analysis? Actually, when I used the "gencode.gene.info.v22.csv" file from TCGA, it has assigned some name to them (highlighted part in the first picture attached).

enter image description here

But on the other hand, my friend get the exact name of the genes one year ago by "gencode.gene.info.v22.csv", but they are not the same in figure 1, I mean they have aliases. for example;

enter image description here

RP11-418H16.1 = AC007389.5

CH17-132F21.5= AC233263.6

So I'm wondering how can I get the same gene names "AC007389.5 and AC233263.6 , ... " ?

ADD REPLY
0
Entering edit mode

I mean, are you realistically interested in genes like these? They probably even have 0 counts across all of your samples. Unless you are specifically studying low-expressed predicted genes, then maybe just filter these out.

ADD REPLY
0
Entering edit mode

Thanks again. Yes, I need them to use in my analysis if I could get the gene names such as "AC007389.5" instead of "RP11-418H16.1" as I mentioned above.

ADD REPLY

Login before adding your answer.

Traffic: 1155 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6