I have about 100 sequences of different transcription start sites with a SNP in them and I would like to know if this SNP is affecting the transcription factor binding site. These SNPs are already influencing the expression of these genes, so this would be interesting to show that these SNPs are the actual causal variant.
My first guess was to put both allele in LASAGNA 2.0 (http://biogrid-lasagna.engr.uconn.edu/lasagna_search/) and use jaspar CORE matrices "all vertebrates".
>seq_reference CCATCTTGCGTCGCTCTTGCTTGAAGGCCG >seq_alternative = higher expression CCATCTTGCGTCGCTGTTGCTTGAAGGCCG
seq_reference Name Sequence Position (0-based) Strand Score p-value E-value TFAP2A (MA0003.1) GCCTTCAAG 19 - 7.54 0.00085 0.0187 PBX1 (MA0070.1) CCTTCAAGCAAG 15 - 7.58 0.00065 0.0123 Pax6 (MA0069.1) TTCAAGCAAGAGCG 11 - 10.85 5.0E-5 0.00085 seq_alternative Name Sequence Position (0-based) Strand Score p-value E-value BRCA1 (MA0133.1) GCAACAG 13 - 6 0.001 0.0240 TFAP2A (MA0003.1) GCCTTCAAG 19 - 7.54 0.00085 0.0187 Pax6 (MA0069.1) TTCAAGCAACAGCG 11 - 9.78 0.0002 0.0034
But I dont see much difference and I dont really understand how to interpret this. Does anyone know if this is the right way to do it? Or have other ideas? Or is this the good way to do it only this is a bad example?