UCSC hg19 gtf (genePredToGtf OS incompatibility)
1
2
Entering edit mode
8.2 years ago
umn_bist ▴ 390

So I found that the annotated GTF file for hg19 from UCSC table does not adhere to the standard GTF format. Thus, I've been getting a fatal error in STAR:

Fatal INPUT FILE error, no valid exon lines in the GTF file: /work/cellbiology/s167125/Documents/ucsc_hg19/ucsc.hg19.gtf
Solution: check the formatting of the GTF file. Most likely cause is the difference in chromosome naming between GTF and FASTA file.

I know that I can retrieve a good GTF file via genePredToGtf application but this is only compatible with Linux 64. I only have access to a Mac. I am wondering if there is an alternative method to retrieve a GTF for UCSC's hg19 reference genome.

Thank you for the help

RNA-Seq genome • 8.3k views
ADD COMMENT
2
Entering edit mode

Is there a reason you want to use the UCSC annotation? The one from Ensembl/Gencode is almost always better (there's a reason that UCSC now uses the copy from gencode).

ADD REPLY
0
Entering edit mode

Yes, so I checked the header of my refgenome (ucsc_hg19.fa) as well as its annotated gtf file (ucsc_hg19.gtf) and it uses 'chr' notation.

Digging further, I realized UCSC does not keep a GTF file of its gene structures - they are all in GenePred Format.

ADD REPLY
0
Entering edit mode

You can export the UCSC gene predictions in GTF from the table browser.

ADD REPLY
0
Entering edit mode

That is what I thought as well, but see this wiki page

UCSC does not keep gene structures in GTF format, we use a single line format for a single gene with all the information about that gene in the single line: GenePred format.

Extracting GTF format files from the genePred format can be performed with the genePredToGtf: kent command utility.

At this time, this genePredToGtf command can provide better GTF files than available from the table browser.
ADD REPLY
0
Entering edit mode

To be honest, no. It's just something I had on hand and had generated the index using STAR already. I found that Alex Dobin of STAR recommends using genecode.

ADD REPLY
0
Entering edit mode

Yup, Gencode/Ensembl (they're more or less identical) are what you'll find most people (myself included) recommending.

ADD REPLY
0
Entering edit mode

@Devon Ryan, Could you say a bit more about why Ensemble annotation is better than UCSC's? Thanks!

ADD REPLY
0
Entering edit mode

It's more likely to represent the transcripts you see in your experiments.

ADD REPLY
0
Entering edit mode

@Devon Ryan, because Ensembl people curate the annotation better?

ADD REPLY
0
Entering edit mode

Ensembl and UCSC use completely different methods to arrive at the annotations (historically, at least for recent mouse and human annotations they should be the same).

ADD REPLY
0
Entering edit mode

It says the most likely issue is the chromosome naming convention. So it could be as simple as adding or removing a "chr" from the GTF or reference file.

ADD REPLY
0
Entering edit mode
8.2 years ago
umn_bist ▴ 390

Deleted. See comment above.

ADD COMMENT

Login before adding your answer.

Traffic: 2681 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6