Question: Question about mapping motif binding sites to genome location
gravatar for kanwarjag
3.3 years ago by
United States
kanwarjag1.1k wrote:

I have a naïve but complex question. I used RSAT to get 5 genes -2000 bp upstream sequence of TSS. I used this FASTA file and binding motif (identified from my experiment) in FIMO to see where is the binding site of the identified motif. I know that protein of interest bound very close to TSS. I get following results from FIMO output:

                                            to get -2000 upstream TSS seq           
# motif_id  motif_alt_id    sequence_name   start   stop    strand  score   p-value q-value matched_sequence    Chr start   End Strand
2       D20_ENSG00000130164-LDLR-ENST00000557958    80  108 +   41.9286 1.14E-14    6.18E-10    CTCTGCCACCCAGGCTGGAGTGCAATGGC   chr19   11102268    11104267    D
2       D102_ENSG00000161048-NAPEPLD-ENST00000425379    106 134 -   37.6735 4.50E-13    1.15E-08    CTCTGTCACCCAGGCTGGAATACAGTGGC   chr7    103128761   103130760   R
2       D17_ENSG00000130164-LDLR-ENST00000558518    309 337 -   32.8367 1.19E-11    1.10E-07    CTCTGTCACCCAGGCTGGAGCGCAGTGAC   chr19   11130163    11132162    D

In the table above column 3 has gene/ transcript name (name Is trimmed from default names because FIMO will not expect long names) column 4 -6 show motif binding regions start, end and strand respectively. Last four columns are Chr, start end and strand corresponding to -2000 bp upstream FASTA that was used as input in FIMO. The problem is I could not figure out how should I location of column 4 and 5 (start and end of motif binding region) to column 12 and 13 that represent original FASTA coordinates.

In row 1: original FASTA (Column14) and motif binding (column 6) are both on forward strand. to get location of column 4 and 5 with in column 12 and 13, should I simply be doing 11102268 +80 and 11104267 – 108. But it does not give me the sequence and insert is also >29. Similarly, if in row 2 both binding motif and FASTA seq are in Reverse strand how should I map to coordinates in FASTA file.

genome • 1.0k views
ADD COMMENTlink modified 3.2 years ago by Biostar ♦♦ 20 • written 3.3 years ago by kanwarjag1.1k

Did you try using BLAT?

P.S. Actually, did you try Ctrl+F in some sequence editor? Sequence editors (e.g. APE) search sequence on either strand. Once you know the pattern, it will be easy in future.

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by Satyajeet Khare1.6k

That output is similar to FIMO output format as document here. But not identical. How exactly did you produce it.

Did you use retrieve-seq to get your upstream sequences? What did the first line in that file look like?

ADD REPLYlink written 3.2 years ago by Malcolm.Cook1.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1449 users visited in the last hour