How to mask all repeats and low complexity regions using RepeatMasker?
Entering edit mode
3.2 years ago
zwz110 • 0

I have a genome sequence in fasta format. I want to have a soft-masked genomic DNA.

After Google, I find I should do the follow thing: All repeats and low complexity regions should be replaced with lower-cased versions of their nucleic base. I have installed the RepeatMasker in Linux. I'm new to RepeatMasker. RepeatMasker manual says " Default settings are for masking all type of repeats in a primate sequence.", but I'm not sure it suits me.

I'm so confused, and I don't know what should I do, so anyone can tell me how to do it? Thank you!

Repeatmasker • 1.7k views
Entering edit mode
3.2 years ago
2nelly ▴ 310

Hi zwz110,

you can directly download any masked genome from UCSC or NCBI golden path

masked regions are represented with lower case.

for instance the masked human chromosome 1 of GRCh38 assembly is here:

Then see this post: Can I Convert Fasta Lowercase Bases To 'N'?

Entering edit mode

Thank you! I got it. And I want to know more detail information about it, for example how they do the soft-masking using RepeatMasker and what's the parameter they use. That's to say, I want to learn what happens when the sequence Oryza_sativa.IRGSP-1.0.dna.toplevel.fa.gzbecomes the sequence Oryza_sativa.IRGSP-1.0.dna_rm.toplevel.fa.gz. If you know, can you tell me?


Login before adding your answer.

Traffic: 1459 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6