Entering edit mode
5.0 years ago
modarzi
▴
160
Hi,
For analyzing TCGA data set, I need annotation file. So, based on this link:
https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files
I downloaded gencode.v22.annotation.gtf but based on this link I couldn’t find read this file. So, my question is can I read Homo_sapiens.GRCh38.92.gtf instead gencode.v22.annotation.gtf?
I appreciate if anybody share his/her comment with me.
Best Regards,
Mohammad
What do you mean by this? That file is
gzip-compressed
and will need to either be uncompressed or viewed withzmore
orzcat
.I mean that these 2 file have any structural different. because I couldn't read gencode.v22.annotation.gtf file via "refGenome" package.but I can read Homo_sapiens.GRCh38.92.gtf via refGenome package. So, I want to know can I use Homo_sapiens.GRCh38.92.gtf instead gencode.v22.annotation.gtf?
I appreciate if you share your comment with me.
Best Regards,
Hello again Mohammad. I was talking to you in another thread but you never answered my question: what current gene IDs does your dataset have? Your posts imply that you have already obtained an expression 'matrix' of some sort, but what are the gene IDs in this dataset? Moreover, why do you feel that you need a GTF?
I have been analysing TCGA data for many years and I have never had to download one of these GTFs.
Linked threads:
Thanks, Dear Dr. Blighe
Hi,
As you know, my data set belong to TCGA and the labels of genes that I see in my downloaded file is e.g. same as ENSG00000000003. Again, in WGCNA, I have to provide annotation file same as WGCNA tutorial(GeneAnnotation.csv). based on that, I wrote an email to Dr. Langfelder and he answered me that "you need to get a table that has, for each gene, the gene ID as used in your data, Entrez and gene symbol."
So, based on his comment I tried to solve my problem but really I don't know how should I prepare GeneAnnotation.csv file for my data set. For this reason, I sent 3 posts for getting comments in biostars.org. As you said, you have long time experience for WGCNA and RNA-seq analysis.I am thank full if you share your comment with me.
Best Regards, Mohammad
@OP:
try loading ensembl annotation gtf from here:ftp://ftp.ensembl.org/pub/current_gtf/homo_sapiens/
All GDC data has been updated to GENCODE v36, instead of v22, for more than a year.