I have paired-end whole genome sequencing data, and I would like to try finding STRs (Short Tandem Repeats) in this data. What tools should I use? There is no assembled reference genome available for my species.
I have paired-end whole genome sequencing data, and I would like to try finding STRs (Short Tandem Repeats) in this data. What tools should I use? There is no assembled reference genome available for my species.
BBMask in the BBTools package can find short repeats, depending on the length you're interested in...
bbmask.sh in=reads.fq out=masked.fq maskrepeats minkr=1 maxkr=15 minlen=40 minrepeats=4 lowercase=t masklowentropy=f
That will mask (to lowercase) sequences with STRs with repeating subunits of length between 1 and 15. Then you can filter the reads with lowercase letters in them... I don't have a program for that though.
Thank you. I've already found RepeatExplorer2 (https://www.nature.com/articles/s41596-020-0400-y) is the perfect tool for these purposes.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Have you checked these tools?
Unfortunately these tools only work for long reads. I have Illumina short reads library.