Question

Detect STRs in illumina library

0

Entering edit mode

7 months ago

kirillkirilenko ▴ 40

I have paired-end whole genome sequencing data, and I would like to try finding STRs (Short Tandem Repeats) in this data. What tools should I use? There is no assembled reference genome available for my species.

repeats annotation STR • 703 views

ADD COMMENT • link 6 months ago by kirillkirilenko ▴ 40

0

Entering edit mode

Have you checked these tools?

https://github.com/WGLab/repeathmm

https://github.com/WGLab/NanoRepeat

ADD REPLY • link 7 months ago by bk11 ★ 2.4k

0

Entering edit mode

Unfortunately these tools only work for long reads. I have Illumina short reads library.

ADD REPLY • link 7 months ago by kirillkirilenko ▴ 40

score 1 · Answer 1 · 2023-10-16

1

Entering edit mode

6 months ago

Brian Bushnell 20k

BBMask in the BBTools package can find short repeats, depending on the length you're interested in...

bbmask.sh in=reads.fq out=masked.fq maskrepeats minkr=1 maxkr=15 minlen=40 minrepeats=4 lowercase=t masklowentropy=f

That will mask (to lowercase) sequences with STRs with repeating subunits of length between 1 and 15. Then you can filter the reads with lowercase letters in them... I don't have a program for that though.

ADD COMMENT • link 6 months ago by Brian Bushnell 20k

0

Entering edit mode

Thank you. I've already found RepeatExplorer2 (https://www.nature.com/articles/s41596-020-0400-y) is the perfect tool for these purposes.

ADD REPLY • link 6 months ago by kirillkirilenko ▴ 40