problem for comparing 2 type of annotation file
0
0
Entering edit mode
5.9 years ago
modarzi ▴ 170

Hi,

For analyzing TCGA data set, I need annotation file. So, based on this link:

https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files

I downloaded gencode.v22.annotation.gtf but based on this link I couldn’t find read this file. So, my question is can I read Homo_sapiens.GRCh38.92.gtf instead gencode.v22.annotation.gtf?

I appreciate if anybody share his/her comment with me.

Best Regards,

Mohammad

gencode RNA-Seq TCGA • 1.3k views
ADD COMMENT
0
Entering edit mode

I couldn’t find read this file

What do you mean by this? That file is gzip-compressed and will need to either be uncompressed or viewed with zmore or zcat.

ADD REPLY
0
Entering edit mode

I mean that these 2 file have any structural different. because I couldn't read gencode.v22.annotation.gtf file via "refGenome" package.but I can read Homo_sapiens.GRCh38.92.gtf via refGenome package. So, I want to know can I use Homo_sapiens.GRCh38.92.gtf instead gencode.v22.annotation.gtf?

I appreciate if you share your comment with me.

Best Regards,

ADD REPLY
1
Entering edit mode

Hello again Mohammad. I was talking to you in another thread but you never answered my question: what current gene IDs does your dataset have? Your posts imply that you have already obtained an expression 'matrix' of some sort, but what are the gene IDs in this dataset? Moreover, why do you feel that you need a GTF?

I have been analysing TCGA data for many years and I have never had to download one of these GTFs.

Linked threads:

ADD REPLY
0
Entering edit mode

Thanks, Dear Dr. Blighe

Hi,

As you know, my data set belong to TCGA and the labels of genes that I see in my downloaded file is e.g. same as ENSG00000000003. Again, in WGCNA, I have to provide annotation file same as WGCNA tutorial(GeneAnnotation.csv). based on that, I wrote an email to Dr. Langfelder and he answered me that "you need to get a table that has, for each gene, the gene ID as used in your data, Entrez and gene symbol."

So, based on his comment I tried to solve my problem but really I don't know how should I prepare GeneAnnotation.csv file for my data set. For this reason, I sent 3 posts for getting comments in biostars.org. As you said, you have long time experience for WGCNA and RNA-seq analysis.I am thank full if you share your comment with me.

Best Regards, Mohammad

ADD REPLY
0
Entering edit mode

@OP:

try loading ensembl annotation gtf from here:ftp://ftp.ensembl.org/pub/current_gtf/homo_sapiens/

ADD REPLY
0
Entering edit mode

All GDC data has been updated to GENCODE v36, instead of v22, for more than a year.

ADD REPLY

Login before adding your answer.

Traffic: 2523 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6