Ensembl 88 has been released!

written 1 day ago by Ensembl Blog

Ensembl 88 is now live. Read on to find out what’s new, and join us for a tour during our release webinar, at 16.00 BST on Tuesday, April 4. Ensembl 88 updates our gene annotations, variation databases, comparative genomics analyses and Continue reading Ensembl 88 has been released!→

Differential Mammalian Toxicity: Why Do Some Human Foods Kill Dogs?

written 2 days ago by Omics! Omics! by Keith Robinson

I've been contemplating this post for a while, but it can be seen as another angle on my recent post on the challenges of drug discovery, so it finally left the mental queue. We often use other mammalian species in drug development to predict human toxicity. We know animals aren't the same as people, but lacking a better alternative that's what we do. Now, as regular readers know I keep company with a dog, and that sometimes has me wondering: how well do we understand the cases of things we can eat but which are dangerous for our canines?Read more »

Targets: Drugability Revisited

written 4 days ago by Omics! Omics! by Keith Robinson

My correspondent @datarade shot a tweet my way on his quest to understand drug discovery. He does this despite the fact I've promised posts on previous tweets that are submerged in my mental queue. But the best part of teaching is forcing yourself to rethink what you think you know, so I'm going to actually take this one on in the space of "what is a target, how do we pick them and how do we drug them". Which I've found to be enlightening and frustrating. It's a messy space because so much is empirical, and I keep devising and then discarding taxonomies and explanatory approaches because they all seem unsatisfactory.Read more »

Contacting the Ensembl Helpdesk? Let us know how to reach you.

written 6 days ago by Ensembl Blog

Lately we’ve noticed a few users sending us queries without providing contact details. Please enter your email address in our web form if you require a reply! If you typically reach out to us via our web “Contact us” form, Continue reading Contacting the Ensembl Helpdesk? Let us know how to reach you.→

Friday SNPpets

written 6 days ago by The OpenHelix Blog

This week’s tips contain quite a range of things, from patent battles to drying tardigrades (probably somebody patented this?). I put in the goat genome again because I like goats. We have precision medicine, and mutants asking to not be discriminated against. Some interesting tools this week too. Welcome to our Friday feature link collection: […]

Obviousness: Rarely Obvious

written 8 days ago by Omics! Omics! by Keith Robinson

Pacific Biosciences has made new thrusts in their ongoing intellectual property action against Oxford Nanopore, adding two recently issued patents to the fray. Oxford has publicly brushed these off as "another pore excuse for a lawsuit", but certainly the battle is not over. One of these patents, 9,542,527 "Compositions and methods for nucleic acid sequencing", appears to concern using hairpin linkages to read both strands, much like the 9,404,146 "Compositions and methods for nucleic acid sequencing" patent that PacBio led with. Since Oxford has announced they will abandon their "2D" methods that use such hairpins, this angle would seem to be soon irrelevant (as I predicted back when PacBio originally attacked). But the other, US 9,546,400 "Nanopore sequencing using n-mers" covers basecalling methods, which is a new twist. A route to challenge any patent is to identify "prior art", information which was publicly available at the time of the patent filing which impinges on the claims in the patent application. Not only can exact matches to prior art be an issue, but also anything which would be "obvious" to a skilled practitioner. And that can certainly be a can of wormsRead more »

Extraction of FASTA sequences from Oxford Nanopore fast5 files – a comparison of tools

written 8 days ago by Bioinformatics I/O

The ONT produces results from sequencing run in the FAST5 format which is a variant of HDF5. “HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is […]

The joy of (blogging) Bach

written 8 days ago by Bits of DNA by Lior Pachter

According to Bach’s first biographer, when the great composer was asked to explain the secret of his success he replied “I was obliged to be industrious. Whoever is equally industrious will succeed equally well”. I respectfully disagree. First, Bach didn’t have to drive the twenty kids he fathered over his lifetime to soccer practices, arrange their playdates, prepare and cook […]

Diverse Approaches to Rare Disease Genetics

written 10 days ago by MassGenomics by Dan Koboldt

Among the many fields of research and medicine aided by next-generation sequencing, few have seen a greater impact than the study of rare inherited diseases. Hundreds (certainly) or thousands (probably) of new disease-gene relationships have been reported in the past several years, causing exponential growth in databases like HGMD and ClinVar. The growth is also evident […]

plexWell: Illumina Libraries by the Plateload

written 10 days ago by Omics! Omics! by Keith Robinson

The advent of so-called next generation sequencers, particularly those from Illumina, have brought the price of sequence data down dramatically. However, there is a catch: the cost of preparing DNA to go into the sequencer, the process known as library preparation, has glided downwards on a much shallower trajectory. This means that for projects wishing to sequence very large numbers of small genomes or large constructs the cost of library preparation can be similar to or even exceed the cost of data generation. A small company north of Boston called seqWell Inc™ has a new approach to Illumina library generation which they are on the cusp of making widely available, and not only does this bring the cost per well down but it is designed to yield normalized libraries from relatively unnormalized samples.Read more »

Friday SNPpets

written 13 days ago by The OpenHelix Blog

This week we find that all biology is computational biology. And that coding is missing. And I loved the knitted example of chromosomes–knitting is code. Also, some new misuse of data, and new appropriate uses. Get a fungus mug. Patients are going to be getting data, but nobody in the public knows about it. It’s […]

Registration reminder for our two-week summer workshop on high-throughput sequencing data analysis!

written 14 days ago by Living in an Ivory Basement by Titus Brown

Our two-week summer workshop (announcement, direct link) is shaping up quite well, but the application deadline is today! So if you're interested, you should apply sometime before the end of the day. (We'll leave applications open as long as it's March 17th somewhere in the world.) Some updates and expansions on the original announcement -- we'll be training attendees in high-performance computing, in the service of doing bioinformatics analyses. To that end, we've received a large grant from NSF XSEDE, and we'll be using JetStream for our analyses. we have limited financial support that will be awarded after acceptances are issued in a week. Here's the original announcement below: ANGUS: Analyzing High Throughput Sequencing Data June 26-July 8, 2017 University of California, Davis Zero-entry - no experience required or expected! Hands-on training in using the UNIX command line to analyze your sequencing data. Friendly, helpful instructors and TAs! Summer sequencing camp - meet and talk science with great people! Now in its eighth year! The workshop fee will be $500 for the two weeks, and on-campus room and board is available for $500/week. Applications will close March 17th. International and industry applicants are more than welcome! Please see for more information, and contact if you have questions or suggestions. --titus

ONT Updates: GridION X5, PromethION, 1D^2, Scrappie, FPGAs and More

written 15 days ago by Omics! Omics! by Keith Robinson

Clive Brown gave a webcast today with updates on a number of Oxford Nanopore topics, but clearly the flagship announcement was a new instrument, GridION X5. Due to the raging snowstorm in the Boston area I was home with my teammate and we've been doggedly going through the tweets (now storified) and my notes (plus David Eccles' nice set) to retrieve the juiciest bones therein. Blog team member intently watching @Clive_G_Brown webcast - now must confer &amp; write-up impressions— Keith Robison (@OmicsOmicsBlog) March 14, 2017 Read more »

An update to the nhmrcData R package

written 15 days ago by What You're Doing Is Rather Desperate by Neil Saunders

Just pushed an updated version of my nhmrcData R package to Github. A quick summary of the changes: In response to feedback, added the packages required for vignette building as dependencies (Imports) – commit Added 8 new datasets with funding outcomes by gender for 2003 – 2013, created from a spreadsheet that I missed first … Continue reading An update to the nhmrcData R package

Down for essential maintenance

written 16 days ago by Ensembl Blog

All Ensembl websites (the main site at, the GRCh37 site, Pre! and archives) will be down for around 30-45 min at 14.00 GMT on the 15th March 2017. This is to allow for essential database migration. The mirror sites Continue reading Down for essential maintenance→

Oxford Nanopore Updates on GridION and PromethION storified

written 16 days ago by Next Gen Seek

[View the story “”GridION X5 – The Sequel” by Clive G Brown, CTO, Oxford Nanopore Technologies” on Storify]

Friday SNPpets

written 20 days ago by The OpenHelix Blog

This week I gave a pub talk on the UCSC Genome Browser. It was the first time I’d tried a more general-public version of this. It was huge fun. And it was great timing to have this example of 5000+ samples from autism families to make the case about how hard it is to visualize […]

A draft bit of text on open science communities

written 21 days ago by Living in an Ivory Basement by Titus Brown

This is early draft text that Anita and I put together from a bunch of brainstorming done at the Imagining Tomorrow's University workshop. Comments welcome! Communities are the fabric of open research, and serve as the basis for development and sharing of best practices, building effective open source tools, and engaging with researchers newly interested in practicing open research. Effective communities often emerge from bottom up interactions, and can serve as a support network for individual open researchers. A few points: These communities can consist of virtual clusters of likeminded individuals; they can include scholars, librarians, developers and tech staff or open research advocates at all levels of experience and with different backgrounds; the communities themselves can be short-lived and focused on a specific issue, tool, or approach, or they can have more long-term goals and aspirations. A key defining feature of these groups is that the principles of open science permeate their practice, meaning they are inherently inclusive, and aim to open up the process of scholarly exploration to the widest possible audience. We recommend that all stakeholders take steps to create an ecosystem that encourages these communities to develop. This means supporting common standards, funding "connective tissue" between different efforts, and sharing practices, tools, and people between communities After collecting a series of narratives on effective and intentional approaches to creating, growing, and nurturing such communities, we recommended the following actions for different stakeholders to support the formation of adaptive and organic, bottom-up, distributed and open research communities: ...

A new 1st semester bachelor course “Introduction to Computational Modelling for the Biosciences” ​

written 22 days ago by In between lines of code by Lex Nederbragt

As part of the bachelor-studies reform here at the University of Oslo, the Institute of Biosciences, where I work, is reorganising its bachelor curriculum. One exciting part is the implementation of the Computing in Science Education (CSE) project, into the different subjects. The goal of the CSE project is to make calculations/computing an integral part […]

The nhmrcData package: NHMRC funding outcomes data made tidy

written 22 days ago by What You're Doing Is Rather Desperate by Neil Saunders

Do you like R? Information about Australian biomedical research funding outcomes? Tidy data? If the answers to those questions are “yes”, then you may also like nhmrcData, a collection of datasets derived from funding statistics provided by the Australian National Health &amp; Medical Research Council. It’s also my first R package (more correctly, R data … Continue reading The nhmrcData package: NHMRC funding outcomes data made tidy

MinION Leviathan Reads: An Update

written 22 days ago by Omics! Omics! by Keith Robinson

Last week I posted a piece on some amazing new nanopore data, only to be red-faced to discover the next morning that I had misread the axes. So I re-posted the piece with the offending data and subsequent analysis in strike-thru font. After I did that, I was informed that the same dataset actually did have leviathan reads, bigger than my misinterpretation.Read more »

HTML vignettes crashing your RStudio? This may be the reason

written 24 days ago by What You're Doing Is Rather Desperate by Neil Saunders

Short version: if RStudio on Windows 7 crashes when viewing vignettes in HTML format, it may be because those packages specify knitr::rmarkdown as the vignette engine, instead of knitr::knitr and you’re using rmarkdown v1. Longer version with details – read on. update: looks like this issue relates to the installed version of rmarkdown (1.3 in my … Continue reading HTML vignettes crashing your RStudio? This may be the reason

Instructor training at the 2017 Data Intensive Biology Summer Institute at UC Davis

written 24 days ago by In between lines of code by Lex Nederbragt

[Adapted from Titus Brown’s blog post] Titus Brown has been so kind as to invite me to co-instruct this week-long workshop (thanks!). So I thought to make a bit of a commercial for it: Are you interested in Getting started with, or getting better at, teaching the Analysis of High Throughput Sequencing Data Hands-on training in […]

New Insights into Long Noncoding RNAs

written 27 days ago by MassGenomics by Dan Koboldt

As long-time readers of MassGenomics probably know, I’m fascinated by studies that interrogate functional elements of the human genome. The ENCODE Project is perhaps the most visible consortium effort in the United States, employing a variety of high-throughput genomic technologies such as RNA sequencing (expression), DNase I sequencing (open chromatin), and CHiP-Seq (DNA-protein interactions). However, […]

Friday SNPpets

written 27 days ago by The OpenHelix Blog

This week’s list of interesting tidbits is a mix of the promise and the peril. So much data, so much drama on how to move forward with it. But, as our focus is largely software tools, there are some useful tools as well. It’s not obvious from the tweet, but the “Evolving Health Care” article […]
