Hello,
I have a nucleotide sequence(fasta format) size limit of 20 kb. And also I have my own genome sequence file(repeat database, also fasta format, 200GB) on my local machine. I want to identify repetitive elements in genome sequence.
Questions: 1)Which software is best? I heard about RepeatMasker. 2)If RepeatMasker will be used, what kind of format for repeat library? I mean do I convert fasta format to some sort of format? 3)What is low-complexity DNA sequences and interspersed repeats?(off topic of course, you don't have to answer it)
Thanks.
please read the repeat masker help pages and see if this answers your question: http://www.repeatmasker.org/webrepeatmaskerhelp.html
Partly, I still don't know the format of Reference repeat databases. Is a huge fasta file okay?