How to calculate TE density from RepeatMasker result?
Entering edit mode
5 months ago
haruki ▴ 20

Hello, all

Is there any good way to calculate TE density separated by window size from RepeatMasker result, which I can use circos plot?

Can I count the number of masked bases per window from masked FASTA? ( I don't think this is a good way, because it includes not only TEs, but also Simple Repeats, etc.) Or should I just count the TE length per window from RepeatMasker result such as .out file?

repeatmasker circos elements transposable • 309 views
Entering edit mode

I think it is better to use the GFF file that repeatmasker outputs. It is easier to parse than the other output formats. The only difficulty will be its potentially large size. If loading the file works out, for example in R, then you could simply run coverage on a GenomicRanges object created from it. Or use BEDtools or similar software to calculate the coverage on a BED-file made from the GFF (with gff2bed).

Entering edit mode

Thank you for your reply!

In my GFF, there are some repeats that are not annotated as TEs, such as Simple_Repeat or Low_complexity. Should I remove these repeats if I want to calculate "TE density"(not repeat density) correctly ? In my results, the percentage of these repeats is less than 1%, so maybe I don't need to worry about them...


Login before adding your answer.

Traffic: 1566 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6