Finding % Of Genome Masked By Single Transposable Element Using Repeatmasker
0
2
Entering edit mode
12.0 years ago
David M ▴ 580

I have a library of Transposable elements identified de novo using REPET. I would like to find out what percent of the genome each of these repeats masks (individually) using RepeatMasker.

I'm worried that running RepeatMasker with a single TE consensus sequence in a library will mask instances in the genome which are similar, but in fact belong to a different family (they have less than 80% similarity to the consensus). So, I masked the genome using the entire library of repeats.

The individual breakdown of hits/hsps (I'm using rmblast, rather than cross_match) is in the *.out file. I have tallied all hits for a given repeat in this file, and consider that to be the estimation of that repeat's distribution in the genome. When I sum all of these percents, however, I am given a number quite a bit larger than the % of the genome masked in the .tbl file (64% in the tbl file, 79% by summing the .out lines)

Where does this discrepancy arise? Is there a way to correct for it or get around it? Am I going about this all wrong?

Thanks in advance!

repeatmasker genome annotation • 2.9k views
ADD COMMENT
0
Entering edit mode

When using RepeatMasker, you can set allowed divergence from consensus sequence, e.g. -div 6.

ADD REPLY

Login before adding your answer.

Traffic: 3040 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6