SNP in Transcription factor binding site analysis
6.6 years ago
Floris Brenk

Hi all,

I have about 100 sequences of different transcription start sites with a SNP in them and I would like to know if this SNP is affecting the transcription factor binding site. These SNPs are already influencing the expression of these genes, so this would be interesting to show that these SNPs are the actual causal variant.

My first guess was to put both allele in LASAGNA 2.0 (http://biogrid-lasagna.engr.uconn.edu/lasagna_search/) and use jaspar CORE matrices "all vertebrates".

>seq_reference
CCATCTTGCGTCGCTCTTGCTTGAAGGCCG
>seq_alternative = higher expression
CCATCTTGCGTCGCTGTTGCTTGAAGGCCG

output:

seq_reference
Name    Sequence    Position
(0-based)    Strand    Score    p-value    E-value
TFAP2A
(MA0003.1)    GCCTTCAAG    19    -    7.54    0.00085    0.0187
PBX1
(MA0070.1)    CCTTCAAGCAAG    15    -    7.58    0.00065    0.0123
Pax6
(MA0069.1)    TTCAAGCAAGAGCG    11    -    10.85    5.0E-5    0.00085

seq_alternative
Name    Sequence    Position
(0-based)    Strand    Score    p-value    E-value
BRCA1
(MA0133.1)    GCAACAG    13    -    6    0.001    0.0240
TFAP2A
(MA0003.1)    GCCTTCAAG    19    -    7.54    0.00085    0.0187
Pax6
(MA0069.1)    TTCAAGCAACAGCG    11    -    9.78    0.0002    0.0034

But I dont see much difference and I dont really understand how to interpret this. Does anyone know if this is the right way to do it? Or have other ideas? Or is this the good way to do it only this is a bad example?

6.6 years ago
Denise CS

You can use the Ensembl VEP to see if you SNPs map to regulatory regions in the human genome. The Ensembl Regulation team have annotated regulatory regions based on ChIP-Seq data (for TF and histone marks) and DNaseI-Seq. They also have incorporated the data from JASPAR, so when you enter your SNPs into VEP you can view if they fall in regions of the genome where Ensembl regulatory features and motif features have been annotated.

Thanks for your reply! Can I also see then whether the binding site is disrupted and so this SNP can be identified as the causal variant?

6.6 years ago
Ming Tang

you can have a look at this website http://regulome.stanford.edu/

it will tell you whether the SNP disrupts the TF binding or not.

Wow this site looks very interesting thanks! For interpretation of the scores (http://regulome.stanford.edu/help#score), the lower the score the more likely "damaging" I assume. What would be the difference between: matched TF motif, any motif and TF binding?

Examples:

score

1F -> rs4763879
2A -> rs907611
3A -> rs6451493
4 -> rs1738074
5 -> rs5029939
6 -> rs7665090

Score Supporting data
1a eQTL + TF binding + matched TF motif + matched DNase Footprint + DNase peak
1b eQTL + TF binding + any motif + DNase Footprint + DNase peak
1c eQTL + TF binding + matched TF motif + DNase peak
1d eQTL + TF binding + any motif + DNase peak
1e eQTL + TF binding + matched TF motif
1f eQTL + TF binding / DNase peak
2a TF binding + matched TF motif + matched DNase Footprint + DNase peak
2b TF binding + any motif + DNase Footprint + DNase peak
2c TF binding + matched TF motif + DNase peak
3a TF binding + any motif + DNase peak
3b TF binding + matched TF motif
4 TF binding + DNase peak
5 TF binding or DNase peak
6

other    the smaller the score is, more likely it will disrupt the binding. you may want to read their paper to have a better idea http://www.ncbi.nlm.nih.gov/pubmed/22955989