425 results • Page 2 of 15
La Cava W, Williams H, Fu W, Vitale S, Srivatsan D, Moore JH. Evaluating recommender systems for AI-driven biomedical informatics. Bioinformatics. 2020 Aug 7:btaa698. doi: 10.1093/bioinformatics/btaa698. Epub ahead of print. PMID: 32766825. [PubMed] [Bioinformatics]AbstractMotivation: Many researchers with domain expertise are unable to easily apply machine learning to their bioinformatics data due to a lack of machine learning and/or coding expertise. Methods that have been proposed thus far to automate machine learning mostly require programming experience as well as expert knowledge to tune and apply the algorithms correctly. Here, we study a method of automating biomedical data science using a web-based platform that uses AI to recommend model choices and conduct experiments. We have two goals in mind: first, to make it easy to construct sophisticated models of biomedical processes; and second, to provide a fully automated AI agent that can choose and conduct promising experiments for the user, based on the user's experiments as well as prior knowledge. To validate this framework, we experiment with hundreds of classification problems, comparing to state-of-the-art, automated approaches. Finally, we use this tool to develop predictive models of septic shock in critical care patients.Results: We find that matrix factorization-based recommendation systems outperform meta-learning methods … go to blog
Li R, Chen Y, Ritchie MD, Moore JH. Electronic health records and polygenic risk scores for predicting disease risk. Nat Rev Genet. 2020 Aug;21(8):493-502. doi: 10.1038/s41576-020-0224-1. Epub 2020 Mar 31. PMID: 32235907. [PubMed] [Nature Reviews]AbstractAccurate prediction of disease risk based on the genetic make-up of an individual is essential for effective prevention and personalized treatment. Nevertheless, to date, individual genetic variants from genome-wide association studies have achieved only moderate prediction of disease risk. The aggregation of genetic variants under a polygenic model shows promising improvements in prediction accuracies. Increasingly, electronic health records (EHRs) are being linked to patient genetic data in biobanks, which provides new opportunities for developing and applying polygenic risk scores in the clinic, to systematically examine and evaluate patient susceptibilities to disease. However, the heterogeneous nature of EHR data brings forth many practical challenges along every step of designing and implementing risk prediction strategies. In this Review, we present the unique considerations for using genotype and phenotype data from biobank-linked EHRs for polygenic risk prediction. go to blog
Almost done with my J.P. Morgan summaries -- this will be the last focused on a specific company: nanoString. They wish to emphasize that they are becoming the company for spatial analysis of DNA, RNA and proteins in biological samples. They also want us to differentiate that space into two segments: profiling and imaging. Profiling gathers spatial information from regions of multiple cells; imaging in their lingo covers spatial techniques with single cell or subcellular localization. In both cases nanoString is betting heavily on oligo-tagged antibodies to enable deep multiplexing of protein detection to be integrated with RNA and DNA detection. Read more » go to blog
Genapsys' J.P. Morgan presentation by CEO Hesaam Esfandyarpour focused on their story of delivering a compact sequencer based on electronic detection that offers low capital, low cost sequencing. There were two bits of specific product news, but mostly general painting of a rosy picture. Read more » go to blog
snakemake checkpoints r awesome go to blog
PacBio CEO Christian Henry’s presentation at J.P. Morgan wasn't rich in technical specifics. But he gave a very bullish portrait of a company aiming for the stars. A conflict reminder: he’s a member of the Board of the Strain Factory that employs me, though I haven’t yet had the pleasure of meeting him.The biggest news is a broad partnership with Invitae four clinical human genome sequencing. The only specific here is that this is not the whole enchilada; platform development will take place both within the Invitae collaboration and outside it. What might that development be?Between Henry’s comments in the Q&A and a few info crumbs on slides there will be pushed to further tune all the canister. Her mentioned efforts on dyes and further improving SMRTcell loading efficiency. There was chatter on Twitter about an overdue update to improve HiFi yields.Henry talked of the importance of increasing ZMW packing, but gave no specifics other than to suggest this is more "development" than "innovation" -- this was in response to a question asking if technical breakthroughs are required. But we are left wondering on a timetable as well as what the next density might be; four-fold to 32M wouldn’t be … go to blog
We have noted that the time between new datasets appearing on SRA and being processed by DEE2 has been about 3 to 6 months. Our dream is to shrink this down to two weeks, but we simply do not have access to that much compute power at the moment. To address this we have devised an "on-demand" feature so that you can request certain datasets to be processed rapidly. We think this is a great feature because it serves the main mission of the DEE2 project which is to make all RNA-seq data freely available to everyone. Here's how to use it: 1. Visit http://dee2.io/request.html and you will be greeted with a webform. Select the organism of interest.2. Provide the SRA project accession number of the dataset. These numbers begin in SRP/ERP/DRP. If you have a different type of accession such as GEO Series (GSE) or Bioproject (PRJNA) then you will need to navigate NCBI to find the SRP number. 3. Check that the SRP number is in the standard DEE2 queue. To do that, follow the link provided above the request web form, click on the queue that corresponds to the organism of interest and use ctrl+f to search … go to blog
As I attempt to collate various incomplete thoughts about the J.P. Morgan presentations I have read and listened to from genomics instrument shops, one thing stands out about 10X Genomics: they actually announced new gadgets and kits! I should thank the company for supplying the slides after I snarked on Twitter about how they weren't archived in the J.P. Morgan webcast -- but now it is there. So either my eyes failed again or I had a personal IT failure (I think the website doesn't like iOS and I may have forgotten that). The slides were presented by CEO Serge SaxonovRead more » go to blog
Illumina presented at J.P. Morgan on Monday, reminding us that they aren't just a sequencing instrument company but an interlocking set of businesses focused on genomics. CEO Francis deSouza spent much of his time discussing the Grail acquisition and some of the other ways in which Illumina is pushing rapidly to become an essential part of clinical medicine, but there was one slide on future improvements to sequencing technology and a few on the lineup of existing sequencers. Reminder: I'm working off public sources, as during the day we work closely with Illumina and they even sunk some serious cash into my employer last May.Read more » go to blog
The J.P. Morgan Healthcare Conference has started this morning in virtual form, so I'd really better get this draft cleaned up and out (indeed, Roche is presenting as I hurriedly type, though about pharma not diagnostics). 2021 already feels like a darker continuation of 2020, between the appalling putsch attempt in my nation's center of government last Wednesday and the still buggy roll-out of the coronavirus vaccine. As I noted in my piece on the Oxford Nanopore Community Meeting, the many disruptions of 2020 make grading the progress of companies essentially impossible: many were disrupted by lockdowns, supply chain issues and the general distraction from the year of doomscrolling. Read more » go to blog
Honestly, I didn’t know what minigraph would be good for when I was writing the code. When I was writing the paper, I pitched minigraph as a fast caller for structural variations (SVs). However, except performance and convenience, minigraph is not that special. In fact, in the paper, minigraph is not as good as read-based SV callers because it randomly misses one parental allele when most assemblies in the paper are not phased. My exploration took a turn when one anonymous reviewer asked me to check the LPA gene. It was not in the graph because the gene was collapsed or missed in all input assemblies. Fortunately, I had several phased hifiasm assemblies at hand. LPA is there and minigraph generates a complex subgraph (figure below) far beyond the capability of VCF. Then I realized what minigraph is truly good for: complex SVs. With the current SV calling pipelines, we typically map reads or an assembly against a reference genome, call SVs and then merge pairwise SV calls into a multi-sample call set. This sounds simple but doesn’t work well for complex events. First, the position of an SV may be shifted by small variants. We have to heuristically group … go to blog
Bit of a thread for some updates to the DEE2 data set. It's a resource of uniformly processed RNA-seq data free to use under a permissive GPL3 licence. Find it at http://dee2.ioYesterday a batch of 117k human runs were uploaded. This brings the total number of runs to 1,298,581. To my knowledge this is the largest such data set in the world. This is 10x larger than our first release in 2015! (125k runs)The number of SRA projects with completed data analysis bundles is 32692 Accesible here: http://dee2.io/huge/The getDEE2 package is the recommended way to access this data if you are familiar with R. You can access individual runs or enrire data bundles with the various functions. The pkg is part of the latest BioC release out today https://www.bioconductor.org/packages/devel/bioc/html/getDEE2.htmlThe button for redirecting DEE2 data directly to Degust is broken and we are looking for a fix. For now you will need to download the data to the disk and upload to the Degust webpage: https://degust.erc.monash.edu/Search capability has always been a bit underwhelming with the DEE2 web interface so we are working on a more modern approach to this in the next couple of months.We always hoped that DEE2 data could … go to blog
A bunch of coding types at the Strain Factory participated in The Advent of Code, a clever 24-day set of programming challenges that runs each year before Christmas. Each day a new two=part programming challenge was posted. Technically it is a speed contest, but you won't find me on the public leaderboard as I'm not nearly quick enough to ever rate a point there. One of my major official activities last month was contributing towards screening candidates for three different computational positions, one of which we threw open to general data science experience. As a result, I've been thinking far too much about the FizzBuzz problem and my prejudices towards it.Read more » go to blog
Ever since the community meeting I've been toying with an idea, then never quite trying to code it. So on New Year's Eve I started getting the dataset together and reducing it to a bunch of dataframes, and today I pushed that a bit further and started graphing some of it. It's very much a rough project -- some of the dataframes have some issues I'm still chasing down with redundant data not being initially collapsed, but I think the data is accurate. I also think I have my conventions consistent -- at one point confused myself into inverting the labels on the plots! In other words, ApG would be labeled GpA -- not good! There's already some intriguing patterns, which are presumably the sort of signal tools like Medaka use to polish assemblies from FASTQ data aligned to draft references.Read more » go to blog
Why? It is usually easy to evaluate the contiguity of a de novo assembly – just compute N50. It is much harder to evaluate the correctness. We typically identify misassemblies by aligning contigs to a reference genome. However, it is tricky to interpret the results. In case of human, there are thousands of structural variations (SVs) between the reference and the sample being assembled. Alignment-based evaluation often mistakes these SVs as misassemblies. For example, QUAST identifies >10,000 “misassemblies” in the T2T assembly when compared to GRCh38. We can’t reliably tell misassemblies from SVs which leads to overestimated misassembly rate. A second problem with reference-based alignment is that most alignment differences come from complex regions such as centromeres and subtelomeres. It fails to evaluate gene regions we are mostly interested in; on the contrary it penalizes an assembly that represents these complex regions better. How? Most assembly problems are caused by repetitive or paralogous regions. When an assembler cannot resolve such a region, it either creates an assembly gap or forces through the region with a misassembly. To probe these issues, we can align a multi-copy gene to the assembly and see if it remains multi-copy. More precisely, we do the … go to blog
Sometimes we need to extract data from an Excel spreadsheet for analysis. Here is one approach using the ssconvert tool.If this isnt installed on your linux machine then you most likely can get it from the package repository.$ sudo apt install ssconvert Then if you want to extract a spreadsheet file into a tsv it can be done like this:$ ssconvert -S --export-type Gnumeric_stf:stf_assistant -O 'separator="'$'\t''"' SomeData.xlsx SomeData.xlsx.tsvYou will notice that all the sheets are output to separate tsv files. This approach is nice as it can accommodate high throughput screening, as I implemented in my Gene Name Errors paper a while back.Here is an example of obtaining some data from GEO.$ #first download$ curl 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE80251&format=file&file=GSE80251%5Fprocessed%5FRNA%5Fexpression%5Fmnfyap%2Exlsx' > GSE80251.xlsx$ #now extract$ ssconvert -S --export-type Gnumeric_stf:stf_assistant -O 'separator="'$'\t''"' GSE80251.xlsx GSE80251.xlsx.txt$ head -5 GSE80251.xlsx.txt.0 | cut -f-4gene Untreated.AVE Erythromycin.AVE Clindamycin.AVEaaaD 464.2738789413636 216.166053873777 908.8802442142787aaaE 864.2227500897734 561.77470127908 662.8782261191445aaeA 4380.496862018132 2618.7642171454263 5816.7951770285545aaeB 7846.3239697416175 7792.316476955105 10312.333492435688Further reading: ReadXL is an R package designed to import Excel data into Rhttps://readxl.tidyverse.org/ go to blog
Researching your family history is a fascinating experience. Even if you find out that your ancestors led very normal lives. Just being able to see the name and life milestones of your great-great-great-great-great (etc) grandmother is exciting. It’s amazing to be able to track your ancestry. But it can also be a confusing process. This … Best Genealogy Software Read More » go to blog
I realized a few Oxford Nanopore announcements too late that I should have tried to log all their predictions with a date made so I could track carefully any delays or quiet disappearances from the new feature lineup. If I had done that, this year would have presented an even worse conundrum: how do you score progress in a year of constant disruptions? Like many companies in the sequencing field, at least some of that disruption has been a diversion of attention and resources to fighting the pandemic. For ONT that is largely supporting the ARTIC viral genome sequencing and also developing LamPORE diagnostics.Read more » go to blog
In May 2019, it was reported in an article that over ten thousand sequence mismatches were observed between messenger RNA and DNA from the same individuals. More recently, three technical comments were published by Science surrounding this article. It was concluded that at least 90% of the Li et al. RDD sites are technical artifacts. … Questioning the Evidence for Non-Canonical RNA Editing in Humans Read More » go to blog
A little while back, Razib Khan used data from 23andMe to explore his family’s genetic history. He previously published his findings and summarized them. Today, I’m going to fill you in on what he had to say! Khan was interested in genetics, anthropology, and history, mainly how we have changed the way lineages are marked … My Personal Genome: What Razib Khan Had to Say Read More » go to blog
Alzheimer’s disease is a form of dementia that impacts the brain and results in memory loss. In recent years we have been able to study it and its risk factors to calculate the chances of you being affected by the disease. Today we are going to look at how that risk is calculated. Most people … Calculating your Alzheimer’s Risk Read More » go to blog
It’s been over eight years since Oxford Nanopore presented the first ever nanopore sequencing data at the AGBT conference in February 2012, where they provided an overview of the hardware and software behind the GridION and MinION systems. Even today Oxford Nanopore could be seen as a dark horse. Their GridION platform is used … Cluster Sequencing with Oxford Nanopore’s GridION System Read More » go to blog
At the annual Advances in Genome Biology and Technology (AGBT) conference held in Florida during 2012, there were many exciting announcements and developments in the world of DNA sequencing technology. An especially cool piece of news came from the team at Oxford Nanopore, the stars of our piece on cluster sequencing, about their (then) brand … Making Sequencing Simpler With Nanopores Read More » go to blog
It is usually thought that we can confidently say that if our genotyping results say that we carry a certain genetic variant, that we actually do carry that variant. However, why does this not mean that we can be confident about the prediction about disease risks? There are many risks and benefits associated with population … How Well Can a Screening Test Predict Disease Risk? Read More » go to blog
The common notion running through molecular biology is that the information present in DNA is transferred to RNA and then to protein. Back in 2010, researchers made a potentially ground-breaking observation. They found that within any given individual, there are tens of thousands of places where transcribed RNA does not match the template … Notes on the Evidence for Extensive RNA Editing in Humans Read More » go to blog
I agreed to make my 23andMe genotyping results available publicly as part of GNZ because I knew that the results were slightly dull, and I’m not majorly at high or low risk for any diseases. I was also very unsurprised to find out that I have blue eyes and that I was identified to … Testing Possibilities About My Ancestry Read More » go to blog
This was once a guest post by Karol Estrada, who was a postdoctoral research fellow in the Analytic and Translational Research Unit at Massachusetts General Hospital and the Broad Institute of MIT and Harvard. It was written in memory of Laura Riba. We have briefly summarised her thoughts and findings from that post below. Karol … A Rare Variant in Mexico with Far-reaching Implications Read More » go to blog
I reported on the AGBT 2020 final talk a few centuries ago -- or at least it seems like that given how quickly the world went to hell just after that -- by BGI in which Rade Drmanac showed off a system which I described as a deconstructed sequencer -- an integrated set of plate handling robots, liquid handlers and imagers which dipped the slides into reservoirs of reagents instead of flowing them through a flowcell. Now BGI has a preprint on BioRxiv which takes this idea a bit further, changing out the reagent tanks for a polymer film on which a thin layer of reagent is distributed, which is then pressed gently against the slide surface to deliver the reagent to the DNA Nanoball (DNB) array. The preprint is filled with eye-popping numbers -- Petabase sequencing!Read more » go to blog
1Engineering student #1: “Where did you get this great bike?”Engineering student #2: “Well, I was walking yesterday, minding my own business, when a beautiful woman rode up on this bike, threw it to the ground, took off all her clothes and said, ‘Take what you want’.”Engineering student #1: "Good choice: The clothes probably wouldn’t have fitted you anyway.”2A priest, a physician, and an engineer are waiting for a particularly slow group of golfers to finish their game.All: “What’s up with those guys? We’ve been waiting for a very long time!”Golf Official: “That’s a group of blind firemen. They lost their sight saving our clubhouse from a fire last year, so we always let them play for free anytime!“The group fell silent for a moment.Priest: "That’s so sad. I’ll say a special prayer for them tonight.”Physician: “I’ll contact my ophthalmologist and see if there’s anything she can do for them.”Engineer: “Why don’t they play at night?”3An engineer was crossing a road one day, when a frog called out to him and said, “If you kiss me, I’ll turn into a beautiful princess.”He bent over, picked up the frog, and put it in his pocket.The frog spoke up again and said, “If … go to blog
425 results • Page 2 of 15
Traffic: 1838 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6