Question: problem for comparing 2 type of annotation file
0
gravatar for modarzi
4 months ago by
modarzi20
modarzi20 wrote:

Hi,

For analyzing TCGA data set, I need annotation file. So, based on this link:

https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files

I downloaded gencode.v22.annotation.gtf but based on this link I couldn’t find read this file. So, my question is can I read Homo_sapiens.GRCh38.92.gtf instead gencode.v22.annotation.gtf?

I appreciate if anybody share his/her comment with me.

Best Regards,

Mohammad

ADD COMMENTlink written 4 months ago by modarzi20

I couldn’t find read this file

What do you mean by this? That file is gzip-compressed and will need to either be uncompressed or viewed with zmore or zcat.

ADD REPLYlink written 4 months ago by genomax56k

I mean that these 2 file have any structural different. because I couldn't read gencode.v22.annotation.gtf file via "refGenome" package.but I can read Homo_sapiens.GRCh38.92.gtf via refGenome package. So, I want to know can I use Homo_sapiens.GRCh38.92.gtf instead gencode.v22.annotation.gtf?

I appreciate if you share your comment with me.

Best Regards,

ADD REPLYlink written 4 months ago by modarzi20
1

Hello again Mohammad. I was talking to you in another thread but you never answered my question: what current gene IDs does your dataset have? Your posts imply that you have already obtained an expression 'matrix' of some sort, but what are the gene IDs in this dataset? Moreover, why do you feel that you need a GTF?

I have been analysing TCGA data for many years and I have never had to download one of these GTFs.

Linked threads:

ADD REPLYlink modified 4 months ago • written 4 months ago by Kevin Blighe28k

Thanks, Dear Dr. Blighe

Hi,

As you know, my data set belong to TCGA and the labels of genes that I see in my downloaded file is e.g. same as ENSG00000000003. Again, in WGCNA, I have to provide annotation file same as WGCNA tutorial(GeneAnnotation.csv). based on that, I wrote an email to Dr. Langfelder and he answered me that "you need to get a table that has, for each gene, the gene ID as used in your data, Entrez and gene symbol."

So, based on his comment I tried to solve my problem but really I don't know how should I prepare GeneAnnotation.csv file for my data set. For this reason, I sent 3 posts for getting comments in biostars.org. As you said, you have long time experience for WGCNA and RNA-seq analysis.I am thank full if you share your comment with me.

Best Regards, Mohammad

ADD REPLYlink written 4 months ago by modarzi20

@OP:

try loading ensembl annotation gtf from here:ftp://ftp.ensembl.org/pub/current_gtf/homo_sapiens/

ADD REPLYlink written 4 months ago by cpad01129.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 957 users visited in the last hour