Ensembl 91 has been released!

written 4 days ago by Ensembl Blog

Ensembl 91 is now live! The cat is truly out of the bag now, and we can safely say that there’s been no monkeying around this release; we’ve been very busy! Read on to discover the highlights of this new Continue reading Ensembl 91 has been released!→

2017 Nanopore Community Meeting: An Incomplete Summary

written 5 days ago by Omics! Omics! by Keith Robinson

The 2017 Nanopore Community Meeting was over a week ago back in New York City, so I'm grossly overdue in cobbling together some observations and opinion based on the tweet stream (I had a critical day job meeting at the same time and wasn't in New York). I did dash off the bit about SmidgION being potentially like the early Macs (though I got wrong the nomenclature, the original was the Mac 128K -- Mac Classic was a later model that resembled it). Oxford also deviated this autumn from the pattern of public information they had seemingly established, with major news at London Calling and smaller updates at the community meeting but also a pair of Clive Brown webcasts each falling roughly halfway between the two meetings. This fall, no webcast.Nanopore's have their own Day 1 and Day 2 writeups and an independent write-up from Arwyn Edwards.PlatformPer the usual pattern, Oxford showed off previously announced hardware but made no solid announcements. I've put together a Storify of relevant tweets which may hold further information.Flongle/SmidgIONSmidgION pumping out data with an attached Android phone calling the bases was a heavily tweeted and retweeted photo. Alas, Oxford apparently put release of the SmidgION/Flongle components into the second half of next year, so no SmidgIONs adorning Christmas trees this year while happy recipients sing Flongle Bells ("Oh what fun, it is to sequence, in a one horse open sleigh, hey!").Anxiously waiting for these little bad boys to develop! SmidgION for sequencing w cell phone ...

The #CommonsPilot kicks off!!

written 6 days ago by Living in an Ivory Basement by Titus Brown

The start of a new Data Commons effort!

On the Problem of Sequence Leakage

written 8 days ago by Omics! Omics! by Keith Robinson

I've been spending some time lately in an unfamiliar world: the eukaryotic section of NCBI's NR protein database. I've been almost exclusively a bacterial guy for six years, but the other side of starbase had an interest in find homologs of a particular protein so I went diving for some. That experience has reminded me of two serious issues with public sequence databases. Tonight I'll dash off a bit about one; expect the other complaint to show up in the not-so-distant future. And tonight's lament is the increasing dispersion of sequence respositories.Read more »


written 8 days ago by Kevin's GATTACA World

15 months ago bycshevlin • 40Spotted this ad in Biostars .. The IBM Watson Health business division is now looking for talented individuals destined to usher in the next era of healthcare. We live in a moment of remarkable change and opportunity. The convergence of data and technology is transforming healthcare and life sciences organizations in every way. New roles are being created that never existed before to meet the demands of this transformation.Link: are now looking for a Genomic Data Scientist to join our team.You will have an opportunity to work directly with the team building new healthcare solutions using genomic analytics and serving oncologists, pathologists and other specialists caring for patients. You will help define, design, and build those solutions and apply your expertise to work in different analytical and statistical models.Key Responsibilities: Develop tools to transform load and validate data Strategizes new uses for data and its interaction with data design Perform data studies of new and diverse data sources Find new uses for existing data sources Discover “stories” told by the data and presents them to other scientists and business managers Generate algorithms and create computer models**Ideal Candidates will possess the following:Candidates should foremost have a strong background in data mining and statistics. Hands-on background in programming and using databases and tools to mine data including practical experience in extracting, transforming and load data as well as developing statistical and analytical models. Candidates must have demonstrated capacity to adapt to demanding and high pressure ...

Getting to know us: Thomas from Regulation

written 8 days ago by Ensembl Blog

This December, we’re meeting Thomas Juettemann, who is part of our Regulation team. What is your job in Ensembl? I am working in the Regulation team. Our main task is to predict regulatory regions like promoters and enhancers in the Continue reading Getting to know us: Thomas from Regulation→

How can UniProtKB help the gene regulation community?

written 11 days ago by Inside UniProt

This question was asked at a recent meeting of a group discussing the availability of information about the regulation of gene expression ( The first thing most researchers in this field ask for is simply a list of known transcriptional regulators. These can easily be retrieved from, for example, the human proteome by using the Advanced Search to specify the Keyword as “Transcription regulation” and Organism as “Homo sapiens”.Adding an additional keyword to the search “DNA-binding” will limit the search to entries annotated as DNA-binding transcription factors. Selecting ‘Reviewed’ entries using the filters on the left-hand side bar to restrict the results to those entries in UniProtKB/Swiss-Prot, will complete your search.If you are just interested in the list of UniProtKB accessions or protein names, you can export it using the download functionality and selecting your favourite format (select “List” for just getting the accession numbers). However, if you want to review information about any of these entries, for example human TP63 (UniProt Accession Q9H3D4), clicking on the accession number will enable you to access a wealth of protein information. For example, you may wish to identify the DNA-binding region of the protein. The “Display” menu on the left hand side of the UniProtKB entry offers options to see the protein sequence features in a tabular view via the Feature table. or in a graphical view with the ProtVista feature viewer (accessible via the ‘Feature viewer’ link). From this, the “Variants” track can be expanded to show the individual single nucleotide ...

SmidgION: Mac Classic for the 21st Century?

written 12 days ago by Omics! Omics! by Keith Robinson

Apple launched the Macintosh computer with a famous television ad playing on the launch year, 1984. What emerged was what we now know as the Mac Classic. What may be less known is why the Mac Classic had that distinctive shape: it was intended to be backpack-portable, as Apple had a deal with a consortium of top U.S. universities to sell Macintoshes to their students. Perhaps even more forgotten is that one of those schools, Drexel University in Philadelphia, made owning a Macintosh a requirement for students.Read more »

Four steps in five minutes to deploy a Carpentry lesson for a class of 30

written 15 days ago by Living in an Ivory Basement by Titus Brown

Binders full of Carpentry!

Why are taxonomic assignments so different for Tara bins? (Black Friday Morning Bioinformatics)

written 23 days ago by Living in an Ivory Basement by Titus Brown

A more refined taxonomic analysis

run gistic2 with sequenza segmentation output from whole exome sequencing

written 24 days ago by Diving into Genetics and Genomics

Convert sequenza output to gistic inputGistic was designed for SNP6 array data. I saw many papers use it for whole exome sequencing data as well.I have the segment files from sequenza and want to convert them to the gistic input.Input format for gistic:segment file:(1) Sample (sample name)(2) Chromosome (chromosome number)(3) Start Position (segment start position, in bases)(4) End Position (segment end position, in bases)(5) Num markers (number of markers in segment)(6) Seg.CN (log2() -1 of copy number)see a link;utm_source=footer#!msg/gistic-forum/yYxIe58qLkA/4dXWAPuMEgAJThe conversion should be log2 (logarithm base 2) - 1, so that copy number 2 is 0.Every segment start and end in the segments file should appear in the markers file, not the other way around.when the copy number is 0 (a homozygous deletion of both copies). You can’t do a log2(0)-1, just put a small number e.g. -5marker file:!searchin/gistic-forum/marker$20file/gistic-forum/Vq9WWDiy7jU/BSFg2zmBZ1EJ(1) Marker Name(2) Chromosome(3) Marker Position (in bases)Note gistic2 does not require a marker file anymore.output of sequenzasequenza gives a segment file. Segmentation was done by copynumberbioconductor package.13 columns of the *segments.txt file"chromosome" "start.pos" "end.pos" "Bf" "N.BAF" "sd.BAF" "depth.ratio" "N.ratio" "sd.ratio" "CNt" "A" "B" "LPP"We only need the chromosome, start.pos, end.pos, N.BAF and depth.ratiocolumns.The depth.ratio column is the GC content normalized ratio. a depth ratio of 1 means it has copy number of 2 (the same as the normal blood control in my case).To convert to gistic input, I have to do log2(2^depth.ratio) - 1 = depth.ratio -1I have a bunch of sgement files in the same folder.only retain the first header ...

Twitter coverage of the Australian Bioinformatics &amp; Computational Biology Society Conference 2017

written 26 days ago by What You're Doing Is Rather Desperate by Neil Saunders

You know the drill by now. Grab the tweets. Generate the report using RMarkdown. Push to Github. Publish to RPubs. This time it’s the Australian Bioinformatics &amp; Computational Biology Society Conference 2017, including the COMBINE symposium. Looks like a good time was had by all in Adelaide. A couple of quirks this time around. First, … Continue reading Twitter coverage of the Australian Bioinformatics &amp; Computational Biology Society Conference 2017

A compilation of conversion tools for BED, SAM/BAM, psl, pslx, blast tabular and blast xml

written 4 weeks ago by Bioinformatics I/O

A wide range of formats exist for representing the comparisons of different sequences to each other: blast tabular, blast xml, psl, pslx, SAM/BAM, BED Most of these formats can be converted from one format to another. Sometimes the format is lossless allowing for the original data to be perfectly converted without the loss of information. […]

Mapping data using R and leaflet

written 4 weeks ago by What You're Doing Is Rather Desperate by Neil Saunders

The R language provides many different tools for creating maps and adding data to them. I’ve been using the leaflet package at work recently, so I thought I’d provide a short example here. Whilst searching for some data that might make a nice map, I came across this article at ABC News. It includes a … Continue reading Mapping data using R and leaflet

Getting to know us: Uma from EnsemblProtists

written 5 weeks ago by Ensembl Blog

This month we’re getting to know Uma Maheswari who works in the Protist branch of Ensembl Genomes. What is your job in Ensembl? I am a Bioinformatican in the Ensembl Genomes team. As the team motto goes “Extending Ensembl across the taxonomic Continue reading Getting to know us: Uma from EnsemblProtists→

Visualising protein interactions in UniProt

written 5 weeks ago by Inside UniProt

The UniProtKB entries include an Interaction section, which details the protein’s binary interactions with other proteins, using a high-quality dataset supplied by the IMEx Consortium.You can now view the binary interactions in a graph that shows the interaction partners of your protein and also shows which of those partners interact with each other. For example, here is the interaction matrix for the human E3 ubiquitin-protein ligase parkin protein.Dots dots dotsEach interaction edge is represented by a dot, of which the intensity represents the number of experiments supporting the interaction. Hovering over the dot highlights both partners.Information on clickClicking on an interaction dot brings up a popup window with details about the interaction.This window contains more information about the interacting partners:NamesIdentifiers and link to UniProt entryList of diseases, and link to the relevant section of the UniProt entrySubcellular locationNumber of experiments, and link to IntActFiltering the displayWe currently have two filters which allow users to filter out data from the graph. They apply if any of the partners in the interaction satisfy the selected criteria.The two filters are:Subcellular location: this is a tree-based selection menu which allows users to filter proteins based on their location within the cellDisease: only show proteins which are involved in the specified disease(s)We are working on enhancing this view further. Are there any more filters or other improvements that you would like to suggest? Let us know!

The Vital Role of Genetic Counselors

written 5 weeks ago by MassGenomics by Dan Koboldt

In September of this year, genetic counselors from all over the United States descended on Columbus for the annual meeting of NSGC, the National Society of Genetic Counselors. One of the major outcomes of that meeting was the announcement that November 9th (today) would be Genetic Counselor Awareness Day. I’m not a genetic counselor, nor […]

How specific are k-mers for taxonomic assignment of microbes, anyway?

written 5 weeks ago by Living in an Ivory Basement by Titus Brown

K-mers are pretty specific at the genus level.

A Nucleotide Mixture-Based Error Correcting Short Read Chemistry

written 5 weeks ago by Omics! Omics! by Keith Robinson

Sometimes polony-style short read sequencing seems like old news. The underlying technology has been commercially available for over a decade. I focus much of my attention to gains in long read technologies, though incremental improvements to read lengths or polony densities still appear. Now in Nature Biotechnology a group from Peking University has published a new twist on sequencing-by-synthesis that is claimed to offer significant improvements on read accuracy.Read more »

Lior Pachter’s Keynote at Genome Informatics 2017 #GI2017

written 6 weeks ago by Next Gen Seek

[View the story “Lior Pachter’s Keynote at Genome Informatics 2017 #GI2017” on Storify]

AlphaGo &amp; Biology

written 6 weeks ago by Omics! Omics! by Keith Robinson

A comment was left on an early piece suggesting I comment on the recent AlphaGo paper and the possible applicability of this approach to biomedical sciences. I'm not sure I have anything terribly original to say, but who can refuse a request?Read more »

Visualising sub-cellular locations in UniProt

written 6 weeks ago by Inside UniProt

The UniProt Knowledgebase provides protein entries covering key aspects of protein biology divided into sections that group related information.One of the sections on the protein entry pages is Subcellular Location. This section provides information on the location and the topology of the mature protein in the cell. You can now visually explore the subcellular location in UniProtKB entries. The visualisation presents image templates from COMPARTMENTS combined with protein location data from UniProt (expert annotation, rule-based automatic annotation) and imported from Gene Ontology (GO) annotation. The figure below shows the subcellular location view from the Human Copper-transporting ATPase 2 protein.Colour-coded by evidenceThe subcellular locations in which the protein is found are shown using colours and titles for the compartments. The colours can be gold which indicates 'Manual annotation' and blue which indicates 'Automatic computational assertion'. These colours are also reflected in the clickable evidence tags on the right hand side in the tabs showing the text annotation.Source-based annotation tabsThere are two tabs based on the sources of annotation, one for UniProt annotation and one for GO (Gene Ontology) annotation. You can click on the tabs to view the specific annotation from that source. The image on the left hand side will update to reflect the annotation tab that you are on.Click to highlightYou can also click on a coloured subcellular location compartment to quickly highlight the corresponding annotation on the right hand side. Try it out and let us know what you think! What else would you like to see ...

For want of $2.41... some background on reimbursements.

written 6 weeks ago by Living in an Ivory Basement by Titus Brown

I hate reimbursements

The 2017 binder workshop!

written 6 weeks ago by Living in an Ivory Basement by Titus Brown

We had a workshop! On binder!

Classifying genome bins using a custom reference database - maybe this time it'll work?

written 7 weeks ago by Living in an Ivory Basement by Titus Brown

Classifying genome bins with a custom database! Another try!
