Ensembl variant consequences and classification info in table format
1
0
Entering edit mode
6 months ago
rebeliscu ▴ 30

I want to get the information on these pages:

...In table format to use in R. Is there a way to access this information (e.g. the term and corresponding SO ID) beyond manually copying and pasting what's on this page? I looked into using the Ensembl API, but it does not seem straight forward.

Thanks.

SO Ensembl API • 313 views
0
Entering edit mode
6 months ago

using xsltproc with the following stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>

<xsl:template match="/">
<xsl:apply-templates select="//table[@id='variation_classes']/tr"/>
</xsl:template>

<xsl:template match="tr">
<xsl:for-each select="th|td">
<xsl:value-of select="normalize-space(.)"/>
<xsl:text>  </xsl:text>
</xsl:for-each>
<xsl:text>
</xsl:text>
</xsl:template>

</xsl:stylesheet>


usage:

\$ wget -q -O - "https://www.ensembl.org/info/genome/variation/prediction/classification.html#classes" | xsltproc --html transform.xsl - 2> /dev/null
*   SO term SO description  SO accession    Called for (e.g.)
SNV SNVs are single nucleotide positions in genomic DNA at which different sequence alternatives exist.SO:0001483   Variant
substitution    A sequence alteration where the length of the change in the variant is the same as that of the reference.   SO:1000002  Variant
Alu_deletion    A deletion of an Alu mobile element with respect to a reference.    SO:0002070  SV
Alu_insertion   An insertion of sequence from the Alu family of mobile elements.    SO:0002063  SV
HERV_deletion   A deletion of the HERV mobile element with respect to a reference.  SO:0002067  SV
HERV_insertion  An insertion of sequence from the HERV family of mobile elements with respect to a reference.   SO:0002187  SV
LINE1_deletion  A deletion of a LINE1 mobile element with respect to a reference.   SO:0002069  SV
LINE1_insertion An insertion from the Line1 family of mobile elements.  SO:0002064  SV
SVA_deletion    A deletion of an SVA mobile element.    SO:0002068  SV
SVA_insertion   An insertion of sequence from the SVA family of mobile elements.    SO:0002065  SV
complex_structural_alteration   A structural sequence alteration or rearrangement encompassing one or more genome fragments, with 4 or more breakpoints.    SO:0001784  SV
complex_substitution    When no simple or well defined DNA mutation event describes the observed DNA change, the keyword "complex" should be used. Usually there are multiple equally plausible explanations for the change.    SO:1000005  SV
copy_number_gain    A sequence alteration whereby the copy number of a given regions is greater than the reference sequence.    SO:0001742  SV
copy_number_loss    A sequence alteration whereby the copy number of a given region is less than the reference sequence.    SO:0001743  SV
copy_number_variation   A variation that increases or decreases the copy number of a given region.  SO:0001019  SV
duplication An insertion which derives from, or is identical in sequence to, nucleotides present at a known location in the genome. SO:1000035  SV
interchromosomal_breakpoint A rearrangement breakpoint between two different chromosomes.   SO:0001873  SV
interchromosomal_translocation  A translocation where the regions involved are from different chromosomes.  SO:0002060  SV
intrachromosomal_breakpoint A rearrangement breakpoint within the same chromosome.  SO:0001874  SV
intrachromosomal_translocation  A translocation where the regions involved are from the same chromosome.    SO:0002061  SV
inversion   A continuous nucleotide sequence is inverted in the same position.  SO:1000036  SV
loss_of_heterozygosity  A functional variant whereby the sequence alteration causes a loss of function of one allele of a gene. SO:0001786  SV
mobile_element_deletion A deletion of a mobile element when comparing a reference sequence (has mobile element) to a individual sequence (does not have mobile element).    SO:0002066  SV
mobile_element_insertion    A kind of insertion where the inserted sequence is a mobile element.    SO:0001837  SV
novel_sequence_insertion    An insertion the sequence of which cannot be mapped to the reference genome.SO:0001838  SV
short_tandem_repeat_variation   A variation that expands or contracts a tandem repeat with regard to a reference.   SO:0002096  SV
tandem_duplication  A duplication consisting of 2 identical adjacent regions.   SO:1000173  SV
translocation   A region of nucleotide sequence that has translocated to a new position. The observed adjacency of two previously separated regions.    SO:0000199  SV
deletion    The point at which one or more contiguous nucleotides were excised. SO:0000159  VariantSV
indel   A sequence alteration which included an insertion and a deletion, affecting 2 or more bases.    SO:1000032  VariantSV
insertion   The sequence of one or more nucleotides added between two adjacent nucleotides in the sequence. SO:0000667  VariantSV
sequence_alteration A sequence_alteration is a sequence_feature whose extent is the deviation from another sequence.    SO:0001059  VariantSV
probe   A DNA sequence used experimentally to detect the presence or absence of a complementary nucleic acid.   SO:0000051  CNV probe

0
Entering edit mode

wow, thank you!