Question: How Do You Identify And Classify Novel Repetitive Elements In A Denovo Genome?
gravatar for Rob Syme
9.1 years ago by
Rob Syme540
Perth, Western Australia
Rob Syme540 wrote:

We have a denovo genome assembly, and are looking for repetitive elements (transposons, ideally) for submission to NCBI and RepBase. So far, the plan is:

  1. Mask known repeats in the genome with RepeatMasker and the RepBase libraries
  2. Denovo repeat finding on the masked genome with RepeatScout, including filtering out low complexity regions that RepeatMasker didn't pick up.
  3. Filter out repeats that have matches in gene regions (the sequences are likely to belong to a gene family, or be part of a conserved domain)
  4. Blast each of the repeat sequences identified by RepeatScout against NR, discarding sequences that match genes or previously identified transposons.
  5. Submit remaining sequences to RepBase and NCBI as unclassified repeats.

This process feels incomplete to me, and doesn't include any classification. Is there a formal process for identification and classification of repetitive elements in denovo genome assemblies?

ADD COMMENTlink modified 8.1 years ago by Casey Bergman18k • written 9.1 years ago by Rob Syme540
gravatar for Casey Bergman
8.1 years ago by
Casey Bergman18k
Athens, GA, USA
Casey Bergman18k wrote:

Try running REPCLASS or TEclass on the output of RepeatScout (or RECON) for classification of putative TEs.

REPCLASS uses both homology (HOM) and structural (STR) information in the input sequences, as well as a scan of the de novo genome assembly using the input library to find target site duplications (TSDs) that are characteristic of TE classes:

alt text

TEclass uses oligomer frequencies of known TEs to train classifiers of different sequence lengths that are applied in series as follows:

alt text

ADD COMMENTlink modified 9 months ago by RamRS22k • written 8.1 years ago by Casey Bergman18k
gravatar for Darked89
9.1 years ago by
Barcelona, Spain
Darked894.2k wrote:

It does not cover classification, but you may find this page useful.

ADD COMMENTlink modified 9 months ago by RamRS22k • written 9.1 years ago by Darked894.2k

The descriptions there don't go much further than what I had already outlined, but it did link me to the very comprehensive list of tools at the Bergman Lab, which led me to their review article. I might sketch out an answer based on the article later.

ADD REPLYlink written 9.1 years ago by Rob Syme540
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 904 users visited in the last hour