Entering edit mode
19 days ago
frarodmar17
•
0
I am trying to run chimeraTE mode 1 using the T2T reference genome and its corresponding GTF annotations file, but I always find the same error:
ERROR: Bad GTF format: GTF does not contain coordinates of genes! The 3rd column must contain "gene" Exiting....
The truth is that I have tried to keep only features recognised as "gene" and assign this sequence type to all the features but the error continues appearing.
I would appreciate your feedback so much.
It always helps to add (at least) a few lines of the files you are working with, that way we can better spot potential problems.
Sorry I was having problems to upload the code lines. In this case, I have tried to run the software with this GTF: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.gtf.gz. I would like to send you the head but it is very difficult to understand in this box format.
I literally ran the chimeraTE mode 1 using this GTF as --gene argument. However, the error I found is the following:
Third column does contain
gene
in this file:Yes, and that is why I do not understand the real failure
Perhaps the program is only looking to get lines with
gene
attribute. You could give the following a try, which only selects lines that havegene
in column 3 of GTF.I have just tried it but it returns the same error again
perhaps long shot but your GTF file is tab-delineated, right? (you also did not open it in any windows/dos related software, for editing for instance?)
I checked it and it is tab-lineated. I have also removed all the rows that did not contain 9 columns. However, the error persists.
ok, that's one thing already
what do you mean with "all the rows that did not contain 9 columns" ?, they all should have 9 rows ?
Can you also post the exact command line you are trying to run?
GTF correct format should contain rows with 9 columns (start, end, strand, attributes, etc). The code I am trying to run is the following, where "te" argument is the previosly mentioned GTF that is causing the error:
[I'm picking in on this level as otherwise we'll be running out of space soon ;-) ]
OK, If you can, do add the exact file(names) you are using in the cmdline, be as exactly as possible as if you would type it in in your terminal.
Other idea: can you run the 'default' dataset that comes with the tool. To test that it works
Also, but more difficult: try removing all entries that do not have CDS features assigned (eg. the first one is a lncRNA and thus has no CDS lines, perhaps tool is stumbling over that kind of 'genes' ...)
What you can also consider is running your GTF file through a tool such as AGAT, to double check (and perhaps correct) the structure of the GTF file
I have just tried to run the code using the gtf included in the example dataset and I also run the code again with my GTF file. In this case, I used the last version of the software, and the error I found was the following:
did the analysis work with the included GTF file?
Perhaps it needs a 'name' tag in the latest column?
Did you run it through AGAT? what was the result of that?
The analysis did not work with the included GTF file. I also ran AGAT but I did not get conclusions as the output only showed me the number of RNAs per gene, and the maximum and the minimum length of RNAs.
hmm, if it also doesn't work with the included GTF I would contact the developers of the tool and ask them for input.
For AGAT: it has many different sub-commands so make sure you run the correct one ...