Entering edit mode
3.4 years ago
Kai_Qi
▴
130
Hi:
For RNA-seq analysis, the GTF I have used was downloaded from here : I used the genes.gtf for indexing the genome. But today, when I use featureCount I found that in this GTF the 9th column is exon_id:
$ head genes.gtf
1 processed_transcript exon 11869 12227 . + . exon_id "ENSE00002234944"; exon_number "1"; gene_biotype "pseudogene"; gene_id "ENSG00000223972"; gene_name "DDX11L1"; gene_source "ensembl_havana"; transcript_id "ENST00000456328"; transcript_name "DDX11L1-002"; transcript_source "havana"; tss_id "TSS15133";
1 processed_transcript transcript 11869 14409 . + . gene_biotype "pseudogene"; gene_id "ENSG00000223972"; gene_name "DDX11L1"; gene_source "ensembl_havana"; transcript_id "ENST00000456328"; transcript_name "DDX11L1-002"; transcript_source "havana"; tss_id "TSS15133";
After reading a while it occurred to me that under most of cases the 9th column of a GTF file should be a gene_id. Does this mean that all the work I have doing by using this indexed genome for analysis could be inappropriate? If not, what should I feed the GTF.attrType for featurecounts?
Thank you very much,
The 9th column of a GTF file is a
;
separated list of key value pairs, where the key and the value are space-separated. There is no general "gene_id" as 9th column convention as such - only that the 9th column can and generally does contain agene_id
if the feature is of type gene or transcript. Since the feature type is exon here, I am not surprised seeing anexon_id
.So does this mean there is no problem using this GTF for indexing and downstream analysis (differential gene expression)? I feel if the parameter of my featureCounts is:
I would probably get the count of exons.
If I wanted to get the count of genes maybe I should use:
instead? Sorry for being naive. I feel just began to be familiar with the general process.