Hail on Cloudera QuickStart
Unlock the value of your genomics data with big data capabilities
With the rise of more affordable cloud storage and the advent of federally backed and funded initiatives like All of Us (previously known as the PMI - Precision Medicine Initiative), Health and Life Sciences organizations such as biotechs, pharmaceuticals, research institutions, and providers are increasingly looking for solutions to process and analyze massive amounts of genomic data. However, legacy approaches to genomic processing, data integration, and analysis can be very costly, time-intensive and cumbersome for processing data at this scale.
To address this need, MetiStream in partnership with Cloudera, has developed an implementation offering for Hail (https://hail.is/ and https://github.com/hail-is/hail) built on Cloudera’s CDH big data platform which incorporates the power of Apache Spark. Hail is an open-source genomic processing tool designed by a team of researchers from the Broad Institute of MIT and Harvard and the Analytic and Translational Genetics Unit of Massachusetts General Hospital. Featuring a suite of built-in tools for quality control, genomic annotations, and statistical analysis, Hail enables you to quickly glean insights from massive amounts of data. Our solution also provides customers with the ability to pass their genomic data through to Cloudera’s Spark environment, making available the host of Machine Learning (ML) packages designed for Apache Spark. As a result, this service offering provides customers with a comprehensive, fast and cost efficient approach and framework for supporting the downstream whole genome pipeline process.