3.6 years ago by
I can definitely help point you in the right direction but it would also help to know some background on what you are trying to accomplish. In general, TE-finding programs are based on some combination of 1) mathematical repeat patterns (k-mer frequency), 2), similarity to some reference database, 3) clustering based on a self-comparison of the data set, or 4) structural features (LTRs, TIRs, etc.). I would say those approaches are in order of complexity to perform, and also in order of how biologically relevant they are.
Transposome was designed for characterizing TE abundance/diversity from raw reads, and it performs very well in terms of accuracy on plant genomes (an example with maize is presented in the paper). I'm the author so I could answer any questions related to the usage. Transposome is based on a clustering approach with the annotations being assigned from a repeat database.
For identifying TEs from an assembled genome you need to think about what type of TE you are interested in. There are many different programs I use for this task with each program being designed for one specific type of TE (based on the structural features). Programs like Recon and RepeatModeler are based on k-mer frequencies, and the goal of RepeatModeler is to try and construct a TE from k-mers. The result is going to be a contig representing the most frequently occurring parts of the element in the genome. Usually this will be the internal coding region because this is more conserved than the flanking repeats. What you get is not a real transposon with single locus, rather it is just a representative of what repeats are found in the genome. This approach can still be useful if you know exactly what you are trying to find out (e.g., quick survey or quick comparison of species). If you want a high quality reference set of TEs for your genome, then I would strongly warn against this approach because the output is not composed of real transposons (and therefore not particularly useful for evolutionary analyses).
modified 3.6 years ago
3.6 years ago by
SES ♦ 8.2k