How to map motifs back to multiple sites in a reference sequence
1
0
Entering edit mode
7.1 years ago
chong.lua • 0

Hi all,

I am trying to align hundreds of transcription factor binding site sequences to my reference sequence. For example in the sequence below, in bold are the two TF binding sites it maps to. CTGGCGCGTGATCAACTGGCCAATCATGGCATCTGTCATTGTGAGTATAACCTCACACCCGTACTTCTAAACACACAGACCAGCCTCATACTGTATGCATTATGTCAGGCAGG GAGGGATTCTGCCAGCAAAGCAGACGAGGGGATGTGCTGAGTCTCACAGACACTTTCCTGGATAAGACATGAATGCAGGCATGTCAGGAAGAGCAAGCAAACACGCTGTCC

When I try to use the alignment function in snapgene, the output shows one site where the TF sequence is mapped to. However, when I manually ctrl+f to search for matches for my TF sequence, there are two sites (bold above) which match my sequence 100%, however it can only be automatically aligned or made into a feature by snapgene for the first matching site it comes across and does not do it for the second site. I was just wondering if there are any suggestions or platforms (E.g. python, R) to map multiple short sequences to a reference sequence, which has the capability to annotate all possible alignment sites on the reference sequence?

edit the ideal output would be a fasta sequence with the all possible alignments of the TF sequences mapped and onto my reference DNA. (E.g

Example sequence ATTATGTCAGGCAGGGAGGGATTCTGCCAGCAAAGCAGACGAGGGGATGTGCTGAGTCTCACAGACACTTTCCTGGATAAGACATGAATGCAGGCATGTCAGGA)

Warm regards, Joseph

alignment snapgene jaspar python R • 2.1k views
ADD COMMENT
2
Entering edit mode
7.1 years ago

try seqkit locate. Add some example data (for motifs) and expected output format.

ADD COMMENT

Login before adding your answer.

Traffic: 3436 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6