Hi, I am trying to convert a Canine gene annotation (GTF) file downloaded from Ensembl to BED file using the gtf2bed tool within the BEDOPS application. Using this command gives an error:
$ gtf2bed < Canis_familiaris.CanFam3.1.85_noheader.gtf > Canis_familiaris.CanFam3.1.85_noheader.bed
Error: Potentially missing gene or transcript ID from GTF attributes (malformed GTF at line ?)
I checked the first few lines of the GTF file and it seems to match up with the required format:
$ head Canis_familiaris.CanFam3.1.85_noheader.gtf
X ensembl gene 1575 5716 . + . gene_id "ENSCAFG00000010935"; gene_version "3"; gene_source "ensembl"; gene_biotype "protein_coding";
X ensembl transcript 1575 5716 . + . gene_id "ENSCAFG00000010935"; gene_version "3"; transcript_id "ENSCAFT00000017396"; transcript_version "3"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_source "ensembl"; transcript_biotype "protein_coding";
I looked at the source code on github for this tool and can see that is check for gene or transcript id and if not present gives this error. But the gene_id is present here in the first line, so not sure how it is reaching the error condition.
I would appreciate any help with troubleshooting this error.
Thank you, - Pankaj
This method did not work for me. gtf2bed only outputs the gtf untouched .
I don't have enough information to debug, but one suggestion is that you put a
teestatement in between
gtf2bedso that you can examine what comes out of
The first solution (awk) worked. Thanks!
Why is that the lines for "gene" only cannot be used and requires lines to have "transcript_id"? I was thinking how to get a BED file with gene IDs.
The next version of the kit will remove this requirement, and it will also allow the user to select a different key for retrieval into the ID field. This is available now in the development branch, available via