Find Motifs in Genome
1
0
Entering edit mode
7.2 years ago
fire_water ▴ 80

I am wondering if a tool already exists to accomplish the following task...

First, the motifs of interest:

  • CATGTG
  • CATTTG
  • CACGTG

Second, the pattern to find in the genome:

any motif 6-7nt any motif

Finally, the expected output:

chr, start (of motif), gap (# nt between: 6 or 7)

For example:

  • chromosome: chr1
  • sequence: ATTCATGTGxxxxxxCATTTGCCG
  • output: chr1, 4, 6

I can write it myself but I would prefer not to reinvent the wheel. Thanks!

genome • 2.1k views
ADD COMMENT
1
Entering edit mode

can you see if one of these tools will do what you need?

fuzznuc
FIMO
HIMER

ADD REPLY
1
Entering edit mode
7.2 years ago

My wheel, seqkit (please update to the v0.4.4+ version)

seqkit locate (usage) is used to locate subsequences/motifs. Motifs could be EITHER plain sequence containing ACTGN OR regular expression (default) like A[TU]G(?:.{3})+?[TU](?:AG|AA|GA) for ORFs. Degenerate bases like RYMM.. are also supported by flag -d.

$ cat motifs.fa 
>CATGTG
CATGTG
>CATTTG
CATTTG
>CACGTG
CACGTG
>motif4
CATTTG.{6,7}CACGTG

$ cat seqs.fa 
>seq1
tactgCATGTGactangcgang
>seq2
cccCATTTGttttttCACGTGttt
>seq3
cccCATTTGttttCACGTGttt

$ seqkit locate -i -f motifs.fa seqs.fa | column -t
seqID  patternName  pattern             strand  start  end  matched
seq1   CATGTG       CATGTG              +       6      11   CATGTG
seq2   CATTTG       CATTTG              +       4      9    CATTTG
seq2   CACGTG       CACGTG              +       16     21   CACGTG
seq2   CACGTG       CACGTG              -       16     21   CACGTG
seq2   motif4       CATTTG.{6,7}CACGTG  +       4      21   CATTTGttttttCACGTG
seq3   CATTTG       CATTTG              +       4      9    CATTTG
seq3   CACGTG       CACGTG              +       14     19   CACGTG
seq3   CACGTG       CACGTG              -       14     19   CACGTG

Sorry, you must count the gaps by yourself.

ADD COMMENT

Login before adding your answer.

Traffic: 2024 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6