Question: gtf file for htseq count
gravatar for Tawfiq
5.6 years ago by
United States
Tawfiq10 wrote:

In htseq count we need the .gtf file and in the tutorial they said we cannot use one from UCSC, is any one know  the source to get the hg19.


EDIT: Post title edited by Ashutosh

rna-seq tutorial • 8.0k views
ADD COMMENTlink modified 2.4 years ago by Biostar ♦♦ 20 • written 5.6 years ago by Tawfiq10
gravatar for komal.rathi
5.6 years ago by
Children's Hospital of Philadelphia, Philadelphia, PA
komal.rathi3.6k wrote:

Which tutorial? By the way, its best that you use the Ensembl GTF when running htseq-count. 

ADD COMMENTlink modified 5.6 years ago by Ashutosh Pandey12k • written 5.6 years ago by komal.rathi3.6k

oh, sorry,

i meant in the HTSeq 0.6.1p2 documentation

at the answer on one of the common question.

Thanks alot, I got one and it works with me.

ADD REPLYlink written 5.6 years ago by Tawfiq10

I have moved the comment to answer. 

ADD REPLYlink written 5.6 years ago by Ashutosh Pandey12k
gravatar for EagleEye
5.6 years ago by
EagleEye6.6k wrote:

You can also use GTF from gencode (I am using it without any problem). And by the way the GTF formats from any repository should work with HTSeq.

ADD COMMENTlink written 5.6 years ago by EagleEye6.6k

It is true that Gencode GTF works fine with htseq-count, I have used that as well. But I'd be cautious before saying that other formats (especially UCSC) works as well as Gencode and Ensembl. I have observed that some programs like the python scripts in DEXSeq & even some Cufflinks' programs like cuffcompare, work really well with Ensembl but not with Gencode.  

ADD REPLYlink written 5.6 years ago by komal.rathi3.6k

Can you please post the errors which you get with Gencode GTF? So that it will be helpful for others to know about it and rectify. It would be great help if you can post (Also mention the Gencode version).

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by EagleEye6.6k

EagleEye Sure. Sometime soon.


Alright, so I found my own question that I posted a couple of months(?) back. I couldn't figure out what's wrong until I changed my GTF to Ensembl and things started chugging along. By the way, my pipeline got stuck at the differential expression stage using the cuffdiff program.


Quoted from DEXSeq Manual Section 2.4:

We have tested our tools chiefly with GTF files from Ensembl and hence recommend to prefer these, as files from other providers sometimes do not adhere fully to the GTF standard and cause the preprocessing to fail.

ADD REPLYlink modified 7 months ago by RamRS27k • written 5.6 years ago by komal.rathi3.6k
Yes I agree that UCSC GTF will not work properly. Thanks for mentioning it. I should have mentioned it clearly.
ADD REPLYlink written 5.6 years ago by EagleEye6.6k

komal.rathi and Santhilal Subhash

I just thought to share this that 

htseq-count only reports one hit per aligned read, If a read is alligned for two different transcript then it is counted for same gene where it belongs to.

whatever GTF you use, your GTF file needs to indicate which transcripts belong to the same gene. e.g. exon lines from two transcripts of same same gene should have same gene_ID but different transcript_ID.

I know that we can not use UCSC table browser GTF because it has same gene_ID and transcript_ID, so htseq-count looses all those reads.

All we need to loook in our gtf is that gene_ID and transcript_ID is different then htseq-count works best

ADD REPLYlink written 5.6 years ago by Manvendra Singh2.1k

I am facing the same problem with HTSeq. I downloaded the GTF from UCSC genome browser. I am using NCBI's RefSeq (Human Transcriptome) as a reference. for this reference what is the best way to get the GTF file for HTSeq???

Thank you in advance.

ADD REPLYlink written 2.7 years ago by KVC_bioinfo440
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1521 users visited in the last hour