I have paired-end whole genome sequencing data, and I would like to try finding STRs (Short Tandem Repeats) in this data. What tools should I use? There is no assembled reference genome available for my species.
I have paired-end whole genome sequencing data, and I would like to try finding STRs (Short Tandem Repeats) in this data. What tools should I use? There is no assembled reference genome available for my species.
BBMask in the BBTools package can find short repeats, depending on the length you're interested in...
bbmask.sh in=reads.fq out=masked.fq maskrepeats minkr=1 maxkr=15 minlen=40 minrepeats=4 lowercase=t masklowentropy=f
That will mask (to lowercase) sequences with STRs with repeating subunits of length between 1 and 15. Then you can filter the reads with lowercase letters in them... I don't have a program for that though.
Thank you. I've already found RepeatExplorer2 (https://www.nature.com/articles/s41596-020-0400-y) is the perfect tool for these purposes.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Have you checked these tools?
https://github.com/WGLab/repeathmm
https://github.com/WGLab/NanoRepeat
Unfortunately these tools only work for long reads. I have Illumina short reads library.