Question: Rattus norvegicus Rnor5.0 GTF annotation in GENCODE format ?
1
gravatar for piotr.grabowski
5.0 years ago by
Germany
piotr.grabowski60 wrote:

Dear BioStars,

 

I am using a piece of software (PARalyzer for PAR-clip data). The pipeline I am using requires a GENCODE format .gtf annotation. Since the official GENCODE rat annotation is not available, maybe somebody tried re-structuring the UCSC annotation to fit the GENCODE format ?

I already tried appending a second "filler" column to match the column order, but there are way more differences in the last multi-entry column. The home-made annotation with filler column didn't work (all works perfectly for GENCODE human and mouse annotation).

 

I would be very thankful for any tips or maybe files ;)

Have a nice weekend!

 

Piotr

gencode rat annotation • 2.9k views
ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by piotr.grabowski60
1
gravatar for piotr.grabowski
5.0 years ago by
Germany
piotr.grabowski60 wrote:

I tried using and playing around with Ensembl, UCSC and NCBI .gtf's with no success.

Few of the important differences are:

1) Names of chromosomes (GENCODE uses "chr1" isntead of "1" or "chrM" instead of "MT"))

2) The second column in GENCODE format is the source of the annotation (ENSEMBL/HAVANA)

3) The 9th column (with key-value pairs) is quite different as well (e.g. non-GENCODE gtf doesn't contain "level" information).

It is possible to re-structure the .gtf from UCSC to such format and make it very similar, but if somebody already has something like this and knows it works then I would much appreciate using such tested .gtf then making my own and testing it while re-inventing the wheel...

 

 

Best,

Piotr

ADD COMMENTlink written 5.0 years ago by piotr.grabowski60

You want Ensembl. We make the GENCODE GTFs (for human and mouse) and we make the GTFs for all our other species in the same style.

ADD REPLYlink written 5.0 years ago by Emily_Ensembl18k

They are exactly in the same style ?

I used the newest rat Ensembl GTF, but had to make some minor changes - for example in human GENCODE gtf there is a "gene_type" not "gene_biotype" in the key:value column, exon numbers have no quotes in GENCODE GTF and the chromosome names are different (chr1 in GENCODE vs 1 in ENSEMBL).

I ended up writing a Python script in the end for making those minor changes. All seems to work now.

Best regards,
Piotr

ADD REPLYlink written 5.0 years ago by piotr.grabowski60

Is this script available on GitHub? I am running into the same problem, and I also have the same problem of not wanting to spend time re-inventing the wheel. My pipeline works perfectly if I use GENCODE GTFs, but does not work if I use the Ensembl GTF for rat.

ADD REPLYlink written 2.6 years ago by Tom20
0
gravatar for Bert Overduin
5.0 years ago by
Bert Overduin3.6k
Edinburgh Genomics, The University of Edinburgh
Bert Overduin3.6k wrote:

I have no idea what the difference between the GENCODE GTF format and regular GTF format exactly is, but have you had a look at the Ensembl GTF file for rat?

ADD COMMENTlink written 5.0 years ago by Bert Overduin3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1517 users visited in the last hour