Question

Finding % Of Genome Masked By Single Transposable Element Using Repeatmasker

2

Entering edit mode

13.2 years ago

David M ▴ 580

I have a library of Transposable elements identified de novo using REPET. I would like to find out what percent of the genome each of these repeats masks (individually) using RepeatMasker.

I'm worried that running RepeatMasker with a single TE consensus sequence in a library will mask instances in the genome which are similar, but in fact belong to a different family (they have less than 80% similarity to the consensus). So, I masked the genome using the entire library of repeats.

The individual breakdown of hits/hsps (I'm using rmblast, rather than cross_match) is in the *.out file. I have tallied all hits for a given repeat in this file, and consider that to be the estimation of that repeat's distribution in the genome. When I sum all of these percents, however, I am given a number quite a bit larger than the % of the genome masked in the .tbl file (64% in the tbl file, 79% by summing the .out lines)

Where does this discrepancy arise? Is there a way to correct for it or get around it? Am I going about this all wrong?

Thanks in advance!

repeatmasker genome annotation • 3.1k views

ADD COMMENT • link updated 11.6 years ago by Biostar 20 • written 13.2 years ago by David M ▴ 580

0

Entering edit mode

When using RepeatMasker, you can set allowed divergence from consensus sequence, e.g. -div 6.

ADD REPLY • link 13.1 years ago by Biomonika (Noolean) 3.2k