Question: Finding % Of Genome Masked By Single Transposable Element Using Repeatmasker
gravatar for David M
8.3 years ago by
David M550
David M550 wrote:

I have a library of Transposable elements identified de novo using REPET. I would like to find out what percent of the genome each of these repeats masks (individually) using RepeatMasker.

I'm worried that running RepeatMasker with a single TE consensus sequence in a library will mask instances in the genome which are similar, but in fact belong to a different family (they have less than 80% similarity to the consensus). So, I masked the genome using the entire library of repeats.

The individual breakdown of hits/hsps (I'm using rmblast, rather than cross_match) is in the *.out file. I have tallied all hits for a given repeat in this file, and consider that to be the estimation of that repeat's distribution in the genome. When I sum all of these percents, however, I am given a number quite a bit larger than the % of the genome masked in the .tbl file (64% in the tbl file, 79% by summing the .out lines)

Where does this discrepancy arise? Is there a way to correct for it or get around it? Am I going about this all wrong?

Thanks in advance!

genome annotation repeatmasker • 2.1k views
ADD COMMENTlink modified 6.6 years ago by Biostar ♦♦ 20 • written 8.3 years ago by David M550

When using RepeatMasker, you can set allowed divergence from consensus sequence, e.g. -div 6.

ADD REPLYlink written 8.2 years ago by Biomonika (Noolean)3.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1675 users visited in the last hour