Question

RepeatExplorer from Galaxy

0

Entering edit mode

9.8 years ago

aleix.arnau1990 ▴ 10

Hi,

I've been using RepeatExplorer from Galaxy Environment to characterize repeat elements from the genus Begonia.s I'm looking for someone who has used that before or known well this tool. What I need is how can I get the "consensus sequence" from each cluster. I mean, Repeat Explorer provides clusters of repeat elements. I want to know how can I get the consensus sequence from each cluster which is used to identify the repeat element through repeatmasker.

The main clusters are not defined by repeat masker so I would like to get the consensus sequence from each cluster to see what they look like because we know that they probably are specific repeat elements from Begonia.

Thanks very much!

repeatExplorer repeat elements • 3.1k views

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by aleix.arnau1990 ▴ 10

0

Entering edit mode

Cross-posted: http://seqanswers.com/forums/showthread.php?t=32078

ADD REPLY • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by SES 8.6k

Ram · Answer 1 · 2014-06-25

I am very familiar with this tool and I have assessed the accuracy and performance of it in numerous ways. I have done so because we needed a critical assessment of how it compared to the software I wrote for annotating repeats from sequence reads (that software is called Transposome; ~~not yet published~~). That aside, if you want to use Galaxy then RepeatExplorer is the right tool for the job.

To answer your question, it is not really possible to get a consensus from a cluster, but there are approximations. For example, you can use centrality to pick the representative of a cluster, or you could try to assemble the sequences from the cluster into something most representative of the cluster and run that through RepeatMasker. Actually, RepeatExplorer already does this. The sequences from each cluster are assembled using cap3, and then RepeatMasker is run on the assemblies. The results are used to generate the graphs in the output. Check the results directories, you should be able to find those cap3 assemblies, which would be a good starting point.

EDIT: For reference, Transposome is published in Bioinformatics. Also, you can get the consensus of a cluster with vmatch, usearch, vsearch, etc. This will probably be added to Transposome because a couple of users have asked for it.