Question

How Do You Identify And Classify Novel Repetitive Elements In A Denovo Genome?

16

Entering edit mode

15.2 years ago

Rob Syme ▴ 540

We have a denovo genome assembly, and are looking for repetitive elements (transposons, ideally) for submission to NCBI and RepBase. So far, the plan is:

Mask known repeats in the genome with RepeatMasker and the RepBase libraries
Denovo repeat finding on the masked genome with RepeatScout, including filtering out low complexity regions that RepeatMasker didn't pick up.
Filter out repeats that have matches in gene regions (the sequences are likely to belong to a gene family, or be part of a conserved domain)
Blast each of the repeat sequences identified by RepeatScout against NR, discarding sequences that match genes or previously identified transposons.
Submit remaining sequences to RepBase and NCBI as unclassified repeats.

This process feels incomplete to me, and doesn't include any classification. Is there a formal process for identification and classification of repetitive elements in denovo genome assemblies?

repeats repeatmasker classification • 8.0k views

ADD COMMENT • link updated 14.2 years ago by Casey Bergman 18k • written 15.2 years ago by Rob Syme ▴ 540

Ram · Answer 1 · 2011-05-14

Try running REPCLASS or TEclass on the output of RepeatScout (or RECON) for classification of putative TEs.

REPCLASS uses both homology (HOM) and structural (STR) information in the input sequences, as well as a scan of the de novo genome assembly using the input library to find target site duplications (TSDs) that are characteristic of TE classes:

alt text

TEclass uses oligomer frequencies of known TEs to train classifiers of different sequence lengths that are applied in series as follows:

alt text

Ram · Answer 2 · 2010-05-17

3

Entering edit mode

15.2 years ago

Darked89 4.7k

It does not cover classification, but you may find this page useful.

ADD COMMENT • link updated 6.8 years ago by Ram 45k • written 15.2 years ago by Darked89 4.7k

0

Entering edit mode

The descriptions there don't go much further than what I had already outlined, but it did link me to the very comprehensive list of tools at the Bergman Lab, which led me to their review article. I might sketch out an answer based on the article later.

ADD REPLY • link 15.1 years ago by Rob Syme ▴ 540