The Pine Biotech team has been working on several educational projects that follow a publication and extract a portion of the raw data that anyone can analyze. Based on our beta version of the T-BioInfo platform, one can run the whole process in just under an hour and experience the power of machine learning for bioinformatics data analysis.
We have four featured projects on our website, which can be viewed at https://edu.t-bio.info/projects/
PDX Models: Tumor-stroma interaction
This project is based on the results reported in Bradford et al., 2016, “Whole transcriptome profiling of patient-derived xenograft models as a tool to identify both tumors and stromal specific biomarkers”. The tumor stroma, or the microenvironment around a tumor, is a complex system made up of endothelial cells, fibroblasts, immune cells and more. Its interactions with the tumor can enable many hallmarks of cancer such as resisting cell death and metastasis. This project’s approach focused on comparing several different breast cancer types using RNA-seq and machine learning methods, and included 79 PDX mouse models with human primary tumors. PDX models are human tumor samples implanted into immune-deficient mice. The T-BioInfo platform was used to perform transcriptomic analysis of raw data sequences and a number of machine learning methods.
The identification of tumor-stroma crosstalk, or the complex cell signaling that occurs between tumor cells and its outside microenvironment, is a challenging problem. The experimental study performed by Bradford et al. provided the data which allowed an essential step forward in this direction. However, the analysis presented by Bradford et al. was limited to a one-to-one correlations study. Furthermore, the study was compromised by uncorrected batch effect. Subsequently, alternative analysis of experimental data provided deeper insight into the problem and identified new biologically meaningful group-wise associations between tumor and stroma genes. A group of tumors were identified which appeared to be enriched by immune processes, and it was hypothesized that they were lymphomas that were associated with Epstein-Barr virus.This study is the first comprehensive analysis across PDX models that focused on identifying the specific stromal cell type, investigated the relationship between human tumor and mouse stroma, and identified specific biomarkers for both tumor and stroma.
Cell Lines: multi-omics network of associations to model precision treatment
Breast Cancer can be subdivided into a number of subtypes. Six major subtypes, previously identified and documented, are considered particularly useful for prognosis and treatment strategy. These subtypes respond differently to chemotherapy and hormone treatments. Currently, doctors only test for a handful of molecular signatures and over 40% of those patients’ cancers do not fit into those categories. Cell lines are often used in research for pre-clinical models, as they mirror many of the molecular characteristics of tumors.
Cell lines are used to study cancer in a lab without human or animal subject involvement, modeling interactions between the sample and various drugs and therapeutics. Breast Cancer cell lines mirror breast cancer in a number of ways, such as the cellular and molecular characteristics.
This project was inspired by Daemon et al., 2013, “Modeling precision treatment of breast cancer”, which focuses on over 70 different Breast Cancer cell lines and over 90 different therapeutic agents. The project includedSNP Array (a type of microarray), RNA-seq (which looks at the whole transcriptome), exome-seq (exome capture, which looks at all of the expressed genes at a given point in time), genome-wide methylation (epigenics), and as well as integrating a number of algorithmic methods to identify molecular features,using advanced machine learning algorithms.The Biassociation algorithm was used to integrate a number of different omics data types, including RNA expression, cell mutations, and drugs to find relationships and better understand how medications affect the breast cancer cells.This work was able to develop predictive drug response signatures and this research can be built upon with future clinical models. One issue with this study is a cell panel does not capture features such as tumor microenvironment, which is critical to understanding tumors.
CirSeq: Study of Evolution in Virology
Circular sequencing (CirSeq) is a genetic sequencing technique which was developed at UCSF by the Andino lab, and has become the preeminent sequencing technique due to its high accuracy. CirSeq provides accurate detection of virus genome mutations by identifying and correcting sequencing errors. CirSeq allows for the identification of individual viral strains within a population, through highly accurate sequence data. In parallel, the utilization of our innovative sequencing platform allows for a new genetic approach to study the evolution of viruses within the context of their host. CirSeq works by creating a large number of copies of each virus through rolling circle replication, a process in which hundreds of copies are made of each individual virus. In this way each individual virus is represented by a large number of copies, which allows researchers to see if that individual virus had a mutation. (If only one virus in a population has a mutation according to sequencing data, it is not clear whether that mutation is real or perhaps a sequencing error. If 100 copies of the virus all have that mutation, it is clear that it is indeed a mutation and not a sequencing error.)
Frequencies of the detected mutations across polio virus passages were mapped over time, allowing the viruses to infect multiple cells and then estimating the rate and type of mutations that occurred. From the clustering methods, the 2B segment of the polio genome looks to have mutations that are highly associated with fitness (better ability to survive, reproduce, etc.). Other clusters of mutations were found in the protease and polymerase region of the polio genome, which are also important regions.
In the big scheme this could mean that changes in certain regions of the virus could prevent it from properly infecting a cell. Understanding the mutations that occur and how those different viruses are able to adapt give researchers an understanding of areas for potential drug or vaccine targets.
Microbiome of the “American Gut”:
The human body is home to a diverse ecosystem of trillions of bacteria. In fact, your digestive tract alone — including your small and large intestine — houses about 99% of your entire microbiome. In fact, the human body has more microbes than human cells. Some of these bacteria can make you sick, but the overwhelming majority are important for keeping you healthy and fending off infections. Sometimes, the same bacteria can do both. Recently, scientists began studying the correlation between the health of the gut microecosystem, and other health issues, and found that gut bacteria are closely tied to other aspects of health. The american gut microbiome project used over 11,000 16s rRNA samples from people across the United States in their analyses.
Phenotypic attributes and geolocation allowed for examination of the association between the human microbiome and external environmental factors like age, race, diet, and place of residence. From this large database, a subset of patient samples was selected. Each sample included several types of samples: gut, mouth, or skin. From the samples, known microbial metagenomes were mapped to a database to generate abundance tables, which provide the number of times a specific bacterial sequence in the sample was found in the database, for two sets of samples: Skin and Stool, and Stool and Mouth.
To learn more about the educational initiative at Pine Biotech, or if you have an idea for a public domain data project, email us: firstname.lastname@example.org