Evaluating The Effect Of A Snp In A Tf Binding Site With The Ensembl Api
2
3
Entering edit mode
11.1 years ago

Hi all, good afternoon!

I have a somewhat naive question for those familiar with the Ensembl API, I hope somebody can help me. I am fetching transcription binding sites by genomic slice, and now I have got a list of them. The code basically looks like this:

my $slice_adaptor = $registry -> get_adaptor( 'human', 'core', 'slice' );
my $regfeat_adaptor = $registry -> get_adaptor( 'human', 'funcgen', 'regulatoryfeature' );

my $slice = $slice_adaptor -> fetch_by_region( 'chromosome', 5, 1200000, 1300000 );
my @reg_feats = @{ $regfeat_adaptor -> fetch_all_by_Slice( $slice ) };
foreach my $reg_feat ( @reg_feats ){
    my @motif_feats = @{ $reg_feat -> regulatory_attributes( 'motif' ) };
}

For these transcription binding sites, I am able to get a binding score and the sequence. All well so far, but now I'd like to see what effect a SNP or some variant in these sites would have on the binding, by seeing how it would affect the score. Is there a way to do this with the API? I was thinking the answer might lie with creating a new 'customized' slice object, but I am not sure if then it could be evaluated (and I am not quite sure it is possible either).

Any help would be greatly appreciated :)

Thanks a lot! Daniela

ensembl transcription-factor binding • 5.0k views
ADD COMMENT
2
Entering edit mode

I should also add that you could use the Variant Effect Predictor to do a similar task, although I believe this only shows whether the SNP is in a high information position and whether it increases or decreases the score wrt the consensus. This has the side benefit of providing much more useful information about variation consequences (e.g. sift, polyphen scores), and is not limited to the location of motif features:

http://www.ensembl.org/info/docs/variation/vep/index.html

Happy to help

Nath

ADD REPLY
4
Entering edit mode
11.0 years ago

Hi

You are very nearly there using the Ensembl API. Basically all you need to do is grab the sequence from the feature and substitute your base of interest, then pass it to the relative affinity method of the BindingMatrix assoicated with that particular motif. We have a very good example of this from our workshop questions (which I am covering at this very moment in a workshop!). This example also shows how to interrogate the gerp conservation scores and compare them to the frequencies from the binding matrix. It is available here:

http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-functgenomics/scripts/examples/solutions/features_4.pl?root=ensembl&view=markup

Or to apply it to you code above you would need something like this:

...

foreach my $reg_feat ( @reg_feats ){

my @motif_feats = @{ $reg_feat -> regulatory_attributes( 'motif' ) };

foreach my $mf(@motif_features){

  my $binding_matrix = $mf->binding_matrix;
  my $seq                 = $mf->feature_Slice->seq; 

  print "Original binding affinity:\t".$binding_matrix->relative_affinity($seq);

  my $new_seq = $seq;
  $new_seq =~ s///;#do you substitution here based on SNP and flanks or position.

  print "Ta da! New binding affinity:\t".$binding_matrix->relative_affinity($new_seq);

}

}

Apologies for odd formatting, couldn't figure out the right biostar voodoo.

Hope this helps.

Nath

ADD COMMENT
0
Entering edit mode

Hi Nath, Wow, this is exactly the answer I was looking for! Thanks very very much! Very useful. Just one very minor typo (if other people ever come across this question!), I guess is that the last line should say "$new_seq" instead of "$seq" :)

Again, thanks a lot - this really helps!

ADD REPLY
0
Entering edit mode

Amended, thanks for spotting that.

ADD REPLY
0
Entering edit mode

when I do:

my $rfa = $registry->get_adaptor('Human', 'funcgen', 'regulatoryfeature');

I get:

-------------------- WARNING ----------------------
MSG: Human is not a valid species name (check DB and API version)
FILE: Bio/EnsEMBL/Registry.pm LINE: 1187
CALLED BY: Bio/EnsEMBL/Registry.pm  LINE: 972
Date (localtime)    = Thu Apr  4 15:19:57 2013
Ensembl API version = 71
---------------------------------------------------

-------------------- EXCEPTION --------------------
MSG: Can not find internal name for species 'Human'
STACK Bio::EnsEMBL::Registry::get_adaptor
/vol2/home/brentp/src/perl/ensembl/modules/Bio/EnsEMBL/Registry.pm:974
STACK toplevel ens.pl:12
Date (localtime)    = Thu Apr  4 15:19:57 2013
Ensembl API version = 71
---------------------------------------------------

any ideas? I tried 'human' as well.

ADD REPLY
0
Entering edit mode

Hi Brent. You seem to have downloaded the Ensembl 71 API by accident. It's not accepting the species name because you don't have permission to connect to the version 71 database. We're currently on version 70, with version 71 due out next week.

Did you checkout the code using CVS? If you look at the links for the four modules, they're in this style: http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl.tar.gz?root=ensembl&only_with_tag=branch-ensembl-70&view=tar

The "only_with_tag=branch-ensembl-70" is vital as it makes you check out version 70. If you skip it, it will automatically go to the newest version which, as we're gearing up for our release next week, is 71.

We've edited the way the links come up in our latest version so in future we won't have any problems with this.

ADD REPLY
0
Entering edit mode

I used the links under 2. here: http://uswest.ensembl.org/info/docs/api/api_installation.html which all say 70. But my version does say 71.

edit: when I download the tar ball under 1. it shows 70, but fails when I do:

my @reg_feats = @{$regfeat_adaptor->fetch_all_by_Slice($slice)};

with error:

DBD::mysql::st execute failed: Unknown column 'ms.internal_schema_build' in 'where clause' at /vol2/home/brentp/src/perl/ensembl/modules/Bio/EnsEMBL/DBSQL/CoordSystemAdaptor.pm line 251.
ADD REPLY
1
Entering edit mode

To other Biostar users (not Brent who I've already spoken to via the Ensembl helpdesk).

I saw the full script and this issue was down to a different database version being specified in the script than the API version the script used.

Ensembl APIs are specific to their Ensembl database and will fail if you try to access a different database to the API. This is because we change our underlying databases, whilst trying to keep API commands constant. So a command will get exactly the same data between releases, but it might do so in a completely different way, accessing completely different mySQL tables. This means trying to access a database with a different API version will cause chaos.

ADD REPLY
0
Entering edit mode

thanks Emily. Here's a full version of the script for people like me who hadn't used the perl API before (genome coordinates changed to protect the guilty SNPs)

ADD REPLY
0
Entering edit mode

You just clicked on them and downloaded that way? Or did you download via the command line?

ADD REPLY
0
Entering edit mode

I used wget. and untar'ed each link.

I sent an email to helpdesk. Will follow-up there.

ADD REPLY
0
Entering edit mode

The danger with using wget is skipping the important end part of the URL. This will be fixed in the new release as the version will be specified in the first part of the URL.

ADD REPLY
2
Entering edit mode
11.0 years ago

I do not know how to do that using the Ensembl API but I would recommend you to read at least the following literature dedicated to this matter:

Hope it helps.

ADD COMMENT
0
Entering edit mode

Thank you, Anthony, these papers look very interesting, will definitely read :) However, my question was more towards the specific use to the Ensembl API to do this. I have little experience with TF binding sites, but I know you can use the score matrices (like those stored in JASPAR) to scan a sequence and report a score that predicts how likely it is that the TF will bind to it. I can easily get this for the reference human sequence in Ensembl but I want to see if there is a way to 'introduce' a SNP into this sequence and scan it somehow with this tool. Thank you! :)

ADD REPLY

Login before adding your answer.

Traffic: 2607 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6