Hi all,
I am trying to align hundreds of transcription factor binding site sequences to my reference sequence. For example in the sequence below, in bold are the two TF binding sites it maps to. CTGGCGCGTGATCAACTGGCCAATCATGGCATCTGTCATTGTGAGTATAACCTCACACCCGTACTTCTAAACACACAGACCAGCCTCATACTGTATGCATTATGTCAGGCAGG GAGGGATTCTGCCAGCAAAGCAGACGAGGGGATGTGCTGAGTCTCACAGACACTTTCCTGGATAAGACATGAATGCAGGCATGTCAGGAAGAGCAAGCAAACACGCTGTCC
When I try to use the alignment function in snapgene, the output shows one site where the TF sequence is mapped to. However, when I manually ctrl+f to search for matches for my TF sequence, there are two sites (bold above) which match my sequence 100%, however it can only be automatically aligned or made into a feature by snapgene for the first matching site it comes across and does not do it for the second site. I was just wondering if there are any suggestions or platforms (E.g. python, R) to map multiple short sequences to a reference sequence, which has the capability to annotate all possible alignment sites on the reference sequence?
edit the ideal output would be a fasta sequence with the all possible alignments of the TF sequences mapped and onto my reference DNA. (E.g
Example sequence ATTATGTCAGGCAGGGAGGGATTCTGCCAGCAAAGCAGACGAGGGGATGTGCTGAGTCTCACAGACACTTTCCTGGATAAGACATGAATGCAGGCATGTCAGGA)
Warm regards, Joseph