I have a naïve but complex question. I used RSAT to get 5 genes -2000 bp upstream sequence of TSS. I used this FASTA file and binding motif (identified from my experiment) in FIMO to see where is the binding site of the identified motif. I know that protein of interest bound very close to TSS. I get following results from FIMO output:
to get -2000 upstream TSS seq # motif_id motif_alt_id sequence_name start stop strand score p-value q-value matched_sequence Chr start End Strand 2 D20_ENSG00000130164-LDLR-ENST00000557958 80 108 + 41.9286 1.14E-14 6.18E-10 CTCTGCCACCCAGGCTGGAGTGCAATGGC chr19 11102268 11104267 D 2 D102_ENSG00000161048-NAPEPLD-ENST00000425379 106 134 - 37.6735 4.50E-13 1.15E-08 CTCTGTCACCCAGGCTGGAATACAGTGGC chr7 103128761 103130760 R 2 D17_ENSG00000130164-LDLR-ENST00000558518 309 337 - 32.8367 1.19E-11 1.10E-07 CTCTGTCACCCAGGCTGGAGCGCAGTGAC chr19 11130163 11132162 D
In the table above column 3 has gene/ transcript name (name Is trimmed from default names because FIMO will not expect long names) column 4 -6 show motif binding regions start, end and strand respectively. Last four columns are Chr, start end and strand corresponding to -2000 bp upstream FASTA that was used as input in FIMO. The problem is I could not figure out how should I location of column 4 and 5 (start and end of motif binding region) to column 12 and 13 that represent original FASTA coordinates.
In row 1: original FASTA (Column14) and motif binding (column 6) are both on forward strand. to get location of column 4 and 5 with in column 12 and 13, should I simply be doing 11102268 +80 and 11104267 – 108. But it does not give me the sequence and insert is also >29. Similarly, if in row 2 both binding motif and FASTA seq are in Reverse strand how should I map to coordinates in FASTA file.