Does repeat masker remove all repeats or does it keep the first instance ?
Entering edit mode
7.6 years ago
Aurelie MLB ▴ 360


I was wondering if repeat masker really masks all the repeats or if it keeps the first one so that alignment tools still can find hits that could be relevant even in repeated regions.

On the Repeat Masker website), it is said

The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns).

But I still have a doubt because surely it could be useful to keep the first instance..

Many thanks

genome • 2.8k views
Entering edit mode
7.6 years ago

It'll hard or soft mask all of the repeats, the first one isn't spared. If you want to mask all but the first repeat, you'll have to write something up to do it (I'd recommend using biopython, since you could easily store the repeat types in a hash and use SeqIO to not have to write your own fasta parser). Keep in mind that doing this is probably not a good idea. Think of a case where you have a read that only partially overlaps a repeat (and not the first repeat of a class). The resulting alignment will then be funky.

Are you looking for enrichment (maybe via ChIPseq) or something like that in repeat regions?

Entering edit mode

Hi Ryan, Thanks a lot for your answer! It is really useful to know. I am trying to align short sequences (miRNA sized) with mismatches to the genome and I was wondering how much information I was missing by hard masking.


Login before adding your answer.

Traffic: 1696 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6