Question: Has anybody used TEToolkit successfully to quantify transposable elements?
gravatar for Anna S
3.8 years ago by
Anna S500
Anna S500 wrote:


I am trying to quantify transposable elements, but TEToolkit exits with an error, as seen below. Perhaps it does not like the transposable element GTF file that I created manually for the yeast? I know the syntax of this GTF file is correct since I have been able to build successful analogous GTF files for both the rabbit and mouse papilloma viruses which resulted in successful tophat and cufflink runs. I'm wondering about the contents of this GTF file however, that is, how should the transposon be specified? For example, in NCBI there is one full transposon available for the yeast, but the others are listed only as pieces flanking a gene. I was wondering if anyone has been able to run TEToolkit successfully who could shed some light on this question? Thanks a lot ! Anna

-bash-4.1$ ./TEtranscripts  --format BAM --mode multi -t ../../../../HuiLing_4567_030716/HLC1.trim.bam -c ../../../../HuiLing_4567_030716/HLC2.trim.bam --project TE_2v1  --GTF ../../../../ref/sacCerR64.gtf --TE ../../../../ref/sacCerR64_virusesalltransposons_only.gtf
INFO  @ Mon, 16 May 2016 15:01:03:
# name = TE_2v1
# treatment files = ['../../../../HuiLing_4567_030716/HLC1.trim.bam']
# control files = ['../../../../HuiLing_4567_030716/HLC2.trim.bam']
# GTF file = ../../../../ref/sacCerR64.gtf
# TE file = ../../../../ref/sacCerR64_virusesalltransposons_only.gtf
# multi-mapper mode = multi
# stranded = yes
# normalization = DESeq_default (rpm: Reads Per Million mapped; quant: Quantile normalization)
# FDR cutoff = 5.00e-02
# fold-change cutoff =  1.00
# read count cutoff = 1
# number of iteration = 10
# Alignments grouped by read ID = True

INFO  @ Mon, 16 May 2016 15:01:03: Processing GTF files ...

INFO  @ Mon, 16 May 2016 15:01:03: Building gene index .......

INFO  @ Mon, 16 May 2016 15:01:04: Done building gene index ......

INFO  @ Mon, 16 May 2016 15:01:04:
Building TE index .......

Error in building gene/TE index
transposable element • 2.5k views
ADD COMMENTlink modified 3.5 years ago by Devon Ryan94k • written 3.8 years ago by Anna S500

The github issue tracker would be a more appropriate place to ask for help.

ADD REPLYlink written 3.8 years ago by Matt Shirley9.2k
gravatar for SES
3.8 years ago by
Vancouver, BC
SES8.3k wrote:

The documentation says it relies on "specially curated GTF files" which they provide here for a few model species. Having looked at the files I can say it may be difficult to generate this format exactly, so I would post an issue on github as previously suggested. Posting this as an answer because you won't be able to find this from UCSC or elsewhere.

As an aside, I don't fully agree with using GTF for these purposes. GTF was meant to describe coding features of genes and be more stringent than GFF, so this is a bit odd. I'm sure they are using external tools that require GTF but there are a couple of issues. Creating new attributes doesn't bother me, it's using 'exon' to describe a transposon and incorrect use of TE classification terms. I'm not trying to be critical of this specific tool but we have to be careful how we extend tools/formats for other purposes. This raises some flags for me because it breaks from the specification, and does so in a way that doesn't describe the biology. To be fair, it is quite difficult to describe transposon properties with tools/formats not intended for that use originally, so some engineering is usually necessary.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by SES8.3k
gravatar for Anna S
3.8 years ago by
Anna S500
Anna S500 wrote:

The instructions say to use the UCSC RepeatMasker, which is not available for the yeast. Does anyone know if there is a RepeatMasker already done for the yeast and publicly available? Thanks!

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Anna S500
gravatar for CAnna
3.6 years ago by
CAnna20 wrote:

Hi !

I am having exactly the same problem. I build the GTF of the TE for the macaque, exactly like specified in Hammel lab website. It looks like exactly the same structure than those. I used repeat masker from UCSC, and even used exactly the same syntax for the transcript_ID to give unique TE names.

TEtranscript exits with the error "Error in building gene/TE index"

I can't figure out the problem.

Did you solve this issue in the end ?

Thank you, Camille

ADD COMMENTlink written 3.6 years ago by CAnna20
gravatar for nikulina
3.5 years ago by
nikulina280 wrote:


Do you have 'family_id' and 'class_id' in the 9th column of your gtf file? In my case adding those 'dummy' fields resolved the issue.

ADD COMMENTlink written 3.5 years ago by nikulina280
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 723 users visited in the last hour