Question: Using GFF3 file results in my gene annotations have names like rna1, rna2, gene1, gene2
2.6 years ago
I am trying to annotate my (MACS2) ChIP seq Peak file with Homer with a custom GFF3 file. The GFF3 is refseq GRCh38. When I use this GFF3 file, Homer uses the parent ID (ex. rna7) rather than the transcript ID (ex. NR_026818.1)

Is there a way I can convert my GFF3 file to a GTF file that uses the transcript ID rather than the parent ID?

I noticed this problem when I used STAR/RSEM with a GFF3 file. Once STAR aligns the transcripts, they have parent ID names. I have to use --amend-names in RSEM and I end up with annotated transcripts with the name RNA##_transcriptID. All I want is the transcript ID.

What is the point of the parent ID? It doesn't help to have a bunch of annotated genes with the name gene1, gene2, gene3.

gff3 sample

NC_000001.11    RefSeq  region  1   248956422   .   +   .   ID=id0;Dbxref=taxon:9606;Name=1;chromosome=1;gbkey=Src;genome=chromosome;mol_type=genomic DNA
NC_000001.11    BestRefSeq  gene    11874   14409   .   +   .   ID=gene0;Dbxref=GeneID:100287102,HGNC:HGNC:37102;Name=DDX11L1;description=DEAD/H (Asp-Glu-Ala-Asp/His) box helicase 11 like 1;gbkey=Gene;gene=DDX11L1;gene_biotype=misc_RNA;pseudo=true
NC_000001.11    BestRefSeq  transcript  11874   14409   .   +   .   ID=rna0;Parent=gene0;Dbxref=GeneID:100287102,Genbank:NR_046018.2,HGNC:HGNC:37102;Name=NR_046018.2;gbkey=misc_RNA;gene=DDX11L1;product=DEAD/H (Asp-Glu-Ala-Asp/His) box helicase 11 like 1;transcript_id=NR_046018.2
NC_000001.11    BestRefSeq  exon    11874   12227   .   +   .   ID=id1;Parent=rna0;Dbxref=GeneID:100287102,Genbank:NR_046018.2,HGNC:HGNC:37102;gbkey=misc_RNA;gene=DDX11L1;product=DEAD/H (Asp-Glu-Ala-Asp/His) box helicase 11 like 1;transcript_id=NR_046018.2
snp rna-seq chip-seq • 1.6k views
2.6 years ago
When I convert gff file to gtf using gffread, everything works fine.

You can use gffread, found on Cufflinks and also on Stringtie.

