9th column of the GTF file is exon_id instead of gene_id, is this normal?
0
0
Entering edit mode
3.4 years ago
Kai_Qi ▴ 130

Hi:

For RNA-seq analysis, the GTF I have used was downloaded from here : I used the genes.gtf for indexing the genome. But today, when I use featureCount I found that in this GTF the 9th column is exon_id:

$ head genes.gtf 
1   processed_transcript    exon    11869   12227   .   +   .   exon_id "ENSE00002234944"; exon_number "1"; gene_biotype "pseudogene"; gene_id "ENSG00000223972"; gene_name "DDX11L1"; gene_source "ensembl_havana"; transcript_id "ENST00000456328"; transcript_name "DDX11L1-002"; transcript_source "havana"; tss_id "TSS15133";
1   processed_transcript    transcript  11869   14409   .   +   .   gene_biotype "pseudogene"; gene_id "ENSG00000223972"; gene_name "DDX11L1"; gene_source "ensembl_havana"; transcript_id "ENST00000456328"; transcript_name "DDX11L1-002"; transcript_source "havana"; tss_id "TSS15133";

After reading a while it occurred to me that under most of cases the 9th column of a GTF file should be a gene_id. Does this mean that all the work I have doing by using this indexed genome for analysis could be inappropriate? If not, what should I feed the GTF.attrType for featurecounts?

Thank you very much,

RNA-Seq alignment sequencing • 1.5k views
ADD COMMENT
0
Entering edit mode

The 9th column of a GTF file is a ; separated list of key value pairs, where the key and the value are space-separated. There is no general "gene_id" as 9th column convention as such - only that the 9th column can and generally does contain a gene_id if the feature is of type gene or transcript. Since the feature type is exon here, I am not surprised seeing an exon_id.

ADD REPLY
0
Entering edit mode

So does this mean there is no problem using this GTF for indexing and downstream analysis (differential gene expression)? I feel if the parameter of my featureCounts is:

featureCount(files, annot.ext="genes.gtf", isGTFannotationFile=TRUE, GTF.featureType="exon", GTF.attType="exon_id",...)

I would probably get the count of exons.

If I wanted to get the count of genes maybe I should use:

`featureCount(files, annot.ext="genes.gtf", isGTFannotationFile=TRUE, GTF.featureType="exon", GTF.attType="gene_id",...)`

instead? Sorry for being naive. I feel just began to be familiar with the general process.

ADD REPLY

Login before adding your answer.

Traffic: 2785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6