Question

Workflow for annotating repeat elements

2

Entering edit mode

6.7 years ago

amy.bashir ▴ 110

Hello everyone!

I am doing repeat elements annotation for a new genome. From what I read online and in papers, the work flow is 1) Dustmasker, 2) Trf, 3) RepeatModeler and 4) RepeatMasker.

I just finished masking the low complexity regions using Dustmasker. Should I use the hard masked file as input for Trf, or use the original genome sequence file as input? Is it ever a good idea to use a masked sequence as input for another repeat masking program?

Thank you very much!

Repeat elements • 2.9k views

ADD COMMENT • link 6.7 years ago by amy.bashir ▴ 110

0

Entering edit mode

RepeatModeler uses Tandem Repeat Finder. Why are you using it prior to RepeatModeler? But I would wouldn't mask my data. Masked data means that repetitive terms are hidden away.

ADD REPLY • link 6.7 years ago by ggman ▴ 80

0

Entering edit mode

I saw that RepeatModeler uses Trf, but I got 0% for "simple repeats" and "low complexity", so I wondered if I do Trf analysis separately, I might see something different.

I am doing the repeat element annotation to see what percentage of the genome is repeat sequences, not to mask the sequence.

Also, it seems that most of the new whole genome analysis papers that I have come across use both RepeatModeler+RepeatMasker and Trf, so I was wondering if they do different things.

ADD REPLY • link 6.7 years ago by amy.bashir ▴ 110

0

Entering edit mode

i used following workflow to annotation of repetitive elements in my own work one new genome:

./BuildModeler -name your_desired_name input_genome.fa

./RepeatModeler -engine ncbi -pa 15 -database your_desired_name

./RepeatMasker -pa 16 -gff -xsmall -lib /path/to/conseni.fa.classified input_genome.fa -dir /path/to/output

ADD REPLY • link updated 6.7 years ago by WouterDeCoster 47k • written 6.7 years ago by reza ▴ 300