Question

How to map motifs back to multiple sites in a reference sequence

0

Entering edit mode

7.3 years ago

chong.lua • 0

Hi all,

I am trying to align hundreds of transcription factor binding site sequences to my reference sequence. For example in the sequence below, in bold are the two TF binding sites it maps to. CTGGCGCGTGATCAACTGGCCAATCATGGCATCTGTCATTGTGAGTATAACCTCACACCCGTACTTCTAAACACACAGACCAGCCTCATACTGTATGCATTATGTCAGGCAGG GAGGGATTCTGCCAGCAAAGCAGACGAGGGGATGTGCTGAGTCTCACAGACACTTTCCTGGATAAGACATGAATGCAGGCATGTCAGGAAGAGCAAGCAAACACGCTGTCC

When I try to use the alignment function in snapgene, the output shows one site where the TF sequence is mapped to. However, when I manually ctrl+f to search for matches for my TF sequence, there are two sites (bold above) which match my sequence 100%, however it can only be automatically aligned or made into a feature by snapgene for the first matching site it comes across and does not do it for the second site. I was just wondering if there are any suggestions or platforms (E.g. python, R) to map multiple short sequences to a reference sequence, which has the capability to annotate all possible alignment sites on the reference sequence?

edit the ideal output would be a fasta sequence with the all possible alignments of the TF sequences mapped and onto my reference DNA. (E.g

Example sequence ATTATGTCAGGCAGGGAGGGATTCTGCCAGCAAAGCAGACGAGGGGATGTGCTGAGTCTCACAGACACTTTCCTGGATAAGACATGAATGCAGGCATGTCAGGA)

Warm regards, Joseph

alignment snapgene jaspar python R • 2.2k views

ADD COMMENT • link 7.3 years ago by chong.lua • 0

score 2 · Answer 1 · 2018-08-14

2

Entering edit mode

7.3 years ago

cpad0112 21k

try seqkit locate. Add some example data (for motifs) and expected output format.

ADD COMMENT • link 7.3 years ago by cpad0112 21k