Prioritizing Tfbs Snps
4
2
Entering edit mode
12.9 years ago
Jiny ▴ 20

I have selected 147 functional SNPs using genomatrix in a set of genes and tried to analyze the polymorphic status of the SNPs. 47 were polymorphic and located in TFBS (Transcription factor binding site). Can anyone please suggest me methods of prioritizing the polymorphic SNPs using bioinformatics So that I will be able to reduce the number of SNPs for further high throughput genotyping.

snp transcription binding • 3.3k views
ADD COMMENT
0
Entering edit mode

What If we already have a TFBS (ChIP-Seq) dataset ? Can I use GATK ?

ADD REPLY
3
Entering edit mode
12.9 years ago

Montgomery et al in "A survey of genomic properties for the detection of regulatory polymorphisms" report that "distance to transcription start site, local repetitive content, sequence conservation, minor and derived allele frequencies, and presence of a CpG island" have discriminatory potential for identifying rSNPs.

ADD COMMENT
1
Entering edit mode
12.9 years ago
Dataminer ★ 2.8k

Have you tried MAPPER click here this might solve your problem to an extent... in my case it did.

ADD COMMENT
1
Entering edit mode
12.9 years ago

MAPPER is our tool of choice as well as it uses both TRANSFAC and JASPAR motifs. Here's how we've analyzed SNPs with MAPPER:

Take a 41-bp segment of the genome with your SNP at position 21. That is 20bp of genome seq on either side of the SNP. I use 20 because the biggest models MAPPER uses are about 15 bp. Copy this sequence and append it to the end of your 41 bp segment and place an N between the two concatenated sequences (I use the N as a spacer or punctuation mark). Put allele 1 at position 21 and allele 2 at position 63. You have a sequence of 83 bp in teh following format:

(20 bp of genome, or bases 1-20)-allele 1-(next 20 bp of genome, or bases 22-41)-N-(20 bp of genome, or 1-20)-allele 2-(next 20 bp of genome, or 22-41)

In this manner I can assay one sequence to cover both alleles. Other approaches will work as well - e.g. two queries each with a different allele. Do as you wish.

Run MAPPER and save your results. I filter the results by score and E-value to retain only the most likely predictions.

I then look at for allele-specific binding of transcription factors that are relevant to the phenotypes we're following. This last point means that I delete those predictions that are for plant and invertebrate TFs. I am also not interested in many TFs that do not have a role in our research topics (obesity, diabetes, e.g.). For me, the predictions by MAPPER must encompass the positions where the SNP alleles are in the query sequence - positions 21 and 63.

I can highly recommend this approach as it has given us many good associations, even several that show interactions with components of the environment that drive activation of the TFs predicted by MAPPER.

ADD COMMENT
0
Entering edit mode

Thanks Larry for providing such a descriptive answer :) , I could have also done that, but I wanted Jiny to work her/his way out with the tool. @Jiny: Larry has provided you the exact way to walk on a path.

ADD REPLY
0
Entering edit mode
10.8 years ago
mulin0424.li ▴ 120

Combining the genetic and epigenetic features by recent ENCODE project, a tool named GWAS3D can help you quit a lot on regulatory SNPs prioritization. Please visit this site: http://jjwanglab.org/gwas3d

ADD COMMENT

Login before adding your answer.

Traffic: 2520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6