VEP is returning MOTIF_NAME values not recognised by Ensembl
12 months ago
jeni ▴ 50

Hi!

I am using VEP to annotate a VCF and I am getting some results in the MOTIF_NAME field, which indicates if the variant occurs in a region containing a TF binding site.

But, those motif names are not beign recognised by Ensembl.

As an example those are some of the MOTIF_NAME values that I get:

ENSM00525642949 ENSM00525625129

Do you know where can I find the name of the factors corresponding to this binding sites?

Thanks!

Hello jeni!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/13460/where-to-find-ensembl-tfbs-names

This is typically not recommended as it runs the risk of annoying people in both communities.

Maybe this here helps: http://www.ensembl.info/2018/10/03/ensembl-94-is-out/

Transcription factor binding motifs We’ve updated our transcription factor binding motifs (TFBMs) pipeline in human and mouse. We are using TFBMs computed by the SELEX project, which is much broader than the JASPAR collection we previously used. Further we have altered our filtering process to align more closely with standard practice, to ensure that our calls capture the breadth of validated binding sites. This has resulted in over 200 million TFBMs across the human genome and 30 million in the mouse genome. As well as the motif positions, we have also annotated which cell lines the transcription factor is known to bind these positions in, based on matched ChIP-seq data. These data are available in our existing interfaces such as the region in detail view and the regulation tab as well as a new interface to display TFBMs. Because of the number of motifs, which cover a large proportion of the genome, we are no longer assigning motif feature consequences to variants, either for known variants or for VEP analysis. It will be possible to find the motif features a variant overlaps using VEP custom annotation with a BED file export of these data .

I ahven't used VEP is a while, check if this export option exists.

@ATpoint Thanks for your answer, but I've been checking VEP custom annotation and I think this is not what I am shearching. VEP is already returning the MOTIF_ID that exist in some of the coordinates where I found variants. The problem is that, this ID, which I think is in Ensembl format is not being recognised by Ensembl. And I don't know how to get this Motif name.

I cannot build a custom database of this names because this is precisely what I am looking for. Do you know where I could find the database from which VEP is geting this motif names? I was being looking in cache dir, but it is complex to find it there.

Hi, Im having the same issue. Have you found a solution to search for those motif based on VEP motif (ENSM..........). ?

12 months ago

Hi Jeni,

REST API currently supports the use case of providing a region of interest and retrieving a list of motif features that overlap the given region, ie. https://rest.ensembl.org/overlap/region/human/7:140624000-140624200?feature=motif;content-type=application/json It allows linking motif feature stable IDs (ENSM) to binding matrix stable IDs (ENSPFM). ENSPFM* IDs are recognised by BioMart and http://jaspar.genereg.net/

If you are specifically interested in retrieving motif feature information by motif feature stable ID then the only option is the Perl API: https://github.com/Ensembl/ensembl-funcgen/blob/release/100/modules/Bio/EnsEMBL/Funcgen/DBSQL/MotifFeatureAdaptor.pm#L667 Hope that helps.

Best wishes, Michal