Ensembl variant consequences and classification info in table format
Entering edit mode
6 months ago
rebeliscu ▴ 30

I want to get the information on these pages:

https://uswest.ensembl.org/info/genome/variation/prediction/predicted_data.html https://uswest.ensembl.org/info/genome/variation/prediction/classification.html#classes

...In table format to use in R. Is there a way to access this information (e.g. the term and corresponding SO ID) beyond manually copying and pasting what's on this page? I looked into using the Ensembl API, but it does not seem straight forward.


SO Ensembl API • 313 views
Entering edit mode
6 months ago

using xsltproc with the following stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>

<xsl:template match="/">
<xsl:apply-templates select="//table[@id='variation_classes']/tr"/>

<xsl:template match="tr">
<xsl:for-each select="th|td">
<xsl:value-of select="normalize-space(.)"/>
<xsl:text>  </xsl:text>



$ wget -q -O - "https://www.ensembl.org/info/genome/variation/prediction/classification.html#classes" | xsltproc --html transform.xsl - 2> /dev/null
*   SO term SO description  SO accession    Called for (e.g.)   
    SNV SNVs are single nucleotide positions in genomic DNA at which different sequence alternatives exist.SO:0001483   Variant     
    substitution    A sequence alteration where the length of the change in the variant is the same as that of the reference.   SO:1000002  Variant     
    Alu_deletion    A deletion of an Alu mobile element with respect to a reference.    SO:0002070  SV  
    Alu_insertion   An insertion of sequence from the Alu family of mobile elements.    SO:0002063  SV  
    HERV_deletion   A deletion of the HERV mobile element with respect to a reference.  SO:0002067  SV  
    HERV_insertion  An insertion of sequence from the HERV family of mobile elements with respect to a reference.   SO:0002187  SV      
    LINE1_deletion  A deletion of a LINE1 mobile element with respect to a reference.   SO:0002069  SV  
    LINE1_insertion An insertion from the Line1 family of mobile elements.  SO:0002064  SV      
    SVA_deletion    A deletion of an SVA mobile element.    SO:0002068  SV      
    SVA_insertion   An insertion of sequence from the SVA family of mobile elements.    SO:0002065  SV  
    complex_structural_alteration   A structural sequence alteration or rearrangement encompassing one or more genome fragments, with 4 or more breakpoints.    SO:0001784  SV      
    complex_substitution    When no simple or well defined DNA mutation event describes the observed DNA change, the keyword "complex" should be used. Usually there are multiple equally plausible explanations for the change.    SO:1000005  SV      
    copy_number_gain    A sequence alteration whereby the copy number of a given regions is greater than the reference sequence.    SO:0001742  SV      
    copy_number_loss    A sequence alteration whereby the copy number of a given region is less than the reference sequence.    SO:0001743  SV      
    copy_number_variation   A variation that increases or decreases the copy number of a given region.  SO:0001019  SV      
    duplication An insertion which derives from, or is identical in sequence to, nucleotides present at a known location in the genome. SO:1000035  SV      
    interchromosomal_breakpoint A rearrangement breakpoint between two different chromosomes.   SO:0001873  SV      
    interchromosomal_translocation  A translocation where the regions involved are from different chromosomes.  SO:0002060  SV      
    intrachromosomal_breakpoint A rearrangement breakpoint within the same chromosome.  SO:0001874  SV  
    intrachromosomal_translocation  A translocation where the regions involved are from the same chromosome.    SO:0002061  SV      
    inversion   A continuous nucleotide sequence is inverted in the same position.  SO:1000036  SV  
    loss_of_heterozygosity  A functional variant whereby the sequence alteration causes a loss of function of one allele of a gene. SO:0001786  SV      
    mobile_element_deletion A deletion of a mobile element when comparing a reference sequence (has mobile element) to a individual sequence (does not have mobile element).    SO:0002066  SV      
    mobile_element_insertion    A kind of insertion where the inserted sequence is a mobile element.    SO:0001837  SV      
    novel_sequence_insertion    An insertion the sequence of which cannot be mapped to the reference genome.SO:0001838  SV      
    short_tandem_repeat_variation   A variation that expands or contracts a tandem repeat with regard to a reference.   SO:0002096  SV      
    tandem_duplication  A duplication consisting of 2 identical adjacent regions.   SO:1000173  SV  
    translocation   A region of nucleotide sequence that has translocated to a new position. The observed adjacency of two previously separated regions.    SO:0000199  SV      
    deletion    The point at which one or more contiguous nucleotides were excised. SO:0000159  VariantSV       
    indel   A sequence alteration which included an insertion and a deletion, affecting 2 or more bases.    SO:1000032  VariantSV       
    insertion   The sequence of one or more nucleotides added between two adjacent nucleotides in the sequence. SO:0000667  VariantSV       
    sequence_alteration A sequence_alteration is a sequence_feature whose extent is the deviation from another sequence.    SO:0001059  VariantSV       
    probe   A DNA sequence used experimentally to detect the presence or absence of a complementary nucleic acid.   SO:0000051  CNV probe
Entering edit mode

wow, thank you!


Login before adding your answer.

Traffic: 2687 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6