Question

chimeraTE GTF format error

0

Entering edit mode

12 weeks ago

frarodmar17 • 0

I am trying to run chimeraTE mode 1 using the T2T reference genome and its corresponding GTF annotations file, but I always find the same error:

ERROR: Bad GTF format: GTF does not contain coordinates of genes! The 3rd column must contain "gene" Exiting....

The truth is that I have tried to keep only features recognised as "gene" and assign this sequence type to all the features but the error continues appearing.

I would appreciate your feedback so much.

chimeraTE • 7.6k views

ADD COMMENT • link updated 10 weeks ago by lieven.sterck 16k • written 12 weeks ago by frarodmar17 • 0

0

Entering edit mode

It always helps to add (at least) a few lines of the files you are working with, that way we can better spot potential problems.

ADD REPLY • link 12 weeks ago by lieven.sterck 16k

0

Entering edit mode

Sorry I was having problems to upload the code lines. In this case, I have tried to run the software with this GTF: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.gtf.gz. I would like to send you the head but it is very difficult to understand in this box format.

I literally ran the chimeraTE mode 1 using this GTF as --gene argument. However, the error I found is the following:

Checking gene and TE annotations
GTF GENE
GCF_009914755.1_T2T-CHM13v2.0_genomic.gtf contains:
ERROR: Bad GTF format
GCF_009914755.1_T2T-CHM13v2.0_genomic.gtf does not contain coordinates of genes! The 3rd column must contain "gene" Exiting...

ADD REPLY • link updated 12 weeks ago by GenoMax 154k • written 12 weeks ago by frarodmar17 • 0

0

Entering edit mode

Third column does contain gene in this file:

#gtf-version 2.2
#!genome-build T2T-CHM13v2.0
#!genome-build-accession NCBI_Assembly:GCF_009914755.1
#!annotation-date 08/01/2025
#!annotation-source NCBI RefSeq GCF_009914755.1-RS_2025_08
NC_060925.1     BestRefSeq      **gene**    7506    138480  .       -       .       gene_id "LOC127239154"; transcript_id ""; db_xref "GeneID:127239154"; description "uncharacterized LOC127239154"; gbkey "Gene"; gene "LOC127239154"; gene_biot
ype "lncRNA"; partial "true";

ADD REPLY • link 12 weeks ago by GenoMax 154k

0

Entering edit mode

Yes, and that is why I do not understand the real failure

ADD REPLY • link 12 weeks ago by frarodmar17 • 0

0

Entering edit mode

Perhaps the program is only looking to get lines with gene attribute. You could give the following a try, which only selects lines that have gene in column 3 of GTF.

zcat GCF_009914755.1_T2T-CHM13v2.0_genomic.gtf.gz | awk '$0 ~ /^#/ || $3 == "gene"' > genes_with_header.gtf

ADD REPLY • link 12 weeks ago by GenoMax 154k

0

Entering edit mode

I have just tried it but it returns the same error again

ADD REPLY • link 12 weeks ago by frarodmar17 • 0

0

Entering edit mode

perhaps long shot but your GTF file is tab-delineated, right? (you also did not open it in any windows/dos related software, for editing for instance?)

ADD REPLY • link 12 weeks ago by lieven.sterck 16k

0

Entering edit mode

I checked it and it is tab-lineated. I have also removed all the rows that did not contain 9 columns. However, the error persists.

ADD REPLY • link 12 weeks ago by frarodmar17 • 0

0

Entering edit mode

ok, that's one thing already

what do you mean with "all the rows that did not contain 9 columns" ?, they all should have 9 rows ?

Can you also post the exact command line you are trying to run?

ADD REPLY • link 11 weeks ago by lieven.sterck 16k

0

Entering edit mode

GTF correct format should contain rows with 9 columns (start, end, strand, attributes, etc). The code I am trying to run is the following, where "te" argument is the previosly mentioned GTF that is causing the error:

python3 chimTE_mode1.py 
  --genome      Genome in fasta
  --input       Paired-end files and their respective group/replicate
  --project     Directory name with output data
  --te          GTF file containing TE information
  --gene        GTF file containing gene information
  --strand      Define the strandness direction of the RNA-seq. Two options: "rf-stranded" OR "fwd-stranded"

ADD REPLY • link 11 weeks ago by frarodmar17 • 0

0

Entering edit mode

[I'm picking in on this level as otherwise we'll be running out of space soon ;-) ]

OK, If you can, do add the exact file(names) you are using in the cmdline, be as exactly as possible as if you would type it in in your terminal.

Other idea: can you run the 'default' dataset that comes with the tool. To test that it works

Also, but more difficult: try removing all entries that do not have CDS features assigned (eg. the first one is a lncRNA and thus has no CDS lines, perhaps tool is stumbling over that kind of 'genes' ...)

What you can also consider is running your GTF file through a tool such as AGAT, to double check (and perhaps correct) the structure of the GTF file

ADD REPLY • link 11 weeks ago by lieven.sterck 16k

0

Entering edit mode

I have just tried to run the code using the gtf included in the example dataset and I also run the code again with my GTF file. In this case, I used the last version of the software, and the error I found was the following:

print(str(f"ERROR: Bad GTF format\n{args.gene.name} does not contain coordinates of {feat}s! The 3rd column must contain \"{feat}\"\tExiting..."))
                                        ^^^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'name'

ADD REPLY • link 11 weeks ago by frarodmar17 • 0

0

Entering edit mode

did the analysis work with the included GTF file?

Perhaps it needs a 'name' tag in the latest column?

Did you run it through AGAT? what was the result of that?

ADD REPLY • link 10 weeks ago by lieven.sterck 16k

0

Entering edit mode

The analysis did not work with the included GTF file. I also ran AGAT but I did not get conclusions as the output only showed me the number of RNAs per gene, and the maximum and the minimum length of RNAs.

ADD REPLY • link 10 weeks ago by frarodmar17 • 0

0

Entering edit mode

hmm, if it also doesn't work with the included GTF I would contact the developers of the tool and ask them for input.

For AGAT: it has many different sub-commands so make sure you run the correct one ...

ADD REPLY • link 10 weeks ago by lieven.sterck 16k