Finding Minisatellite Repeat Motives In Dna
2
4
Entering edit mode
13.7 years ago

Hi,

I have contigs representing genes of interest that have been 454 sequenced from BAC libraries. The BAC inserts are about 100kb long and end up forming multiple contigs (about 50-100) after assembly. Among these contigs, I some contain the sequence of the gene of interest, including the introns.

I am interested in finding minisatellite repeat motifs, from 10 to 60 bp. I have tried SSR finder before (online: SSR Finder) but apparently it is only for microsatellites (2 to 5 bp). My aim is not to mask them, but to find their position and sequence.

What software would be a good choice in your opinion?

Many thanks

repeats • 4.3k views
ADD COMMENT
8
Entering edit mode
13.7 years ago

There are quite a few softwares out there - see an incomplete list at the bottom of this wiki page. I was quite happy with TRF (Tandem Repeats Finder) in the past. You might want to write a script to post-process the result table to get the range you want.

Did I mention that you can run TRF on a big sequence file, using Kent's tool. I'd create a bed file.

trfBig - Mask tandem repeats on a big sequence file.
usage:
   trfBig inFile outFile
This will repeatedly run trf to mask tandem repeats in infile
and put masked results in outFile.  inFile and outFile can be .fa
or .nib format. Outfile can be .bed as well

   -bed creates a bed file in current dir
   -bedAt=path.bed - create a bed file at explicit location
   -tempDir=dir Where to put temp files.
   -trf=trfExe explicitly specifies trf executable name
   -maxPeriod=N  Maximum period size of repeat (default 2000)
ADD COMMENT
1
Entering edit mode

Hi @Haibao,

Can you tell me where to find the description for bed output from trfBig. I am completely confused with the bed output because there is no header on the file.

ADD REPLY
0
Entering edit mode

Hi @Haibao, Thanks for the TRF suggestion. I also tried mreps from the wiki page you point to and will investigate a few more as necessary. Many thanks!

ADD REPLY
5
Entering edit mode
13.7 years ago

I recommend vmatch / reputer. I've had success repeat finding in bacterial genomes, but it scales very well to larger (much larger) sequences.

ADD COMMENT
0
Entering edit mode

Hi @Keith, Were you looking for minisats, as I am, or for microsats? Cheers

ADD REPLY
0
Entering edit mode

Minisatellites and larger. This was with the software in its reputer incarnation, which predates vmatch. I don't have any experience with vmatch's huge range of new parameters.

ADD REPLY

Login before adding your answer.

Traffic: 2763 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6