Find Motifs in Genome
1
I am wondering if a tool already exists to accomplish the following task...
First, the motifs of interest:
Second, the pattern to find in the genome:
any motif 6-7nt any motif
Finally, the expected output :
chr, start (of motif), gap (# nt between: 6 or 7)
For example:
chromosome: chr1
sequence: ATTCATGTGxxxxxxCATTTGCCG
output: chr1, 4, 6
I can write it myself but I would prefer not to reinvent the wheel. Thanks!
genome
• 2.1k views
My wheel, seqkit (please update to the v0.4.4 + version )
seqkit locate
(usage ) is used to locate subsequences/motifs.
Motifs could be EITHER plain sequence containing ACTGN
OR regular
expression (default) like A[TU]G(?:.{3})+?[TU](?:AG|AA|GA)
for ORFs.
Degenerate bases like RYMM..
are also supported by flag -d
.
$ cat motifs.fa
>CATGTG
CATGTG
>CATTTG
CATTTG
>CACGTG
CACGTG
>motif4
CATTTG.{6,7}CACGTG
$ cat seqs.fa
>seq1
tactgCATGTGactangcgang
>seq2
cccCATTTGttttttCACGTGttt
>seq3
cccCATTTGttttCACGTGttt
$ seqkit locate -i -f motifs.fa seqs.fa | column -t
seqID patternName pattern strand start end matched
seq1 CATGTG CATGTG + 6 11 CATGTG
seq2 CATTTG CATTTG + 4 9 CATTTG
seq2 CACGTG CACGTG + 16 21 CACGTG
seq2 CACGTG CACGTG - 16 21 CACGTG
seq2 motif4 CATTTG.{6,7}CACGTG + 4 21 CATTTGttttttCACGTG
seq3 CATTTG CATTTG + 4 9 CATTTG
seq3 CACGTG CACGTG + 14 19 CACGTG
seq3 CACGTG CACGTG - 14 19 CACGTG
Sorry, you must count the gaps by yourself.
Login before adding your answer.
Traffic: 2024 users visited in the last hour
can you see if one of these tools will do what you need?
fuzznuc
FIMO
HIMER