How to mask all repeats and low complexity regions using RepeatMasker?
3.2 years ago
zwz110 • 0

I have a genome sequence in fasta format. I want to have a soft-masked genomic DNA.

After Google, I find I should do the follow thing: All repeats and low complexity regions should be replaced with lower-cased versions of their nucleic base. I have installed the RepeatMasker in Linux. I'm new to RepeatMasker. RepeatMasker manual says " Default settings are for masking all type of repeats in a primate sequence.", but I'm not sure it suits me.

I'm so confused, and I don't know what should I do, so anyone can tell me how to do it? Thank you!

3.2 years ago
2nelly ▴ 310

Hi zwz110,

you can directly download any masked genome from UCSC or NCBI golden path

masked regions are represented with lower case.

for instance the masked human chromosome 1 of GRCh38 assembly is here:

Then see this post: Can I Convert Fasta Lowercase Bases To 'N'?

Thank you! I got it. And I want to know more detail information about it, for example how they do the soft-masking using RepeatMasker and what's the parameter they use. That's to say, I want to learn what happens when the sequence Oryza_sativa.IRGSP-1.0.dna.toplevel.fa.gzbecomes the sequence Oryza_sativa.IRGSP-1.0.dna_rm.toplevel.fa.gz. If you know, can you tell me?


