Lynn Langit is a consulting cloud architect who holds recognitions from all three major cloud vendors on her contributions to their respective communities. On today’s podcast, Wes talks with Lynn about a concept she calls 25% time and a project it led her to become involved within genomic research. 25% time is her own method of learning while collaborating with someone else for a greater good. A recent project leads her to become involved with the Commonwealth Scientific and Industrial Research Organisation (CSIRO) in Australia. Through cloud adoption and some lean startup practices, they were able to drop the run time for a machine learning algorithm against a genomic dataset from 500 hours to 10 minutes.
Why listen to this podcast:
- 25% time is a way to learn, study, or collaborate with someone else for a greater good. It’s unbilled time in the service of offers. Using the idea of 25% time along with some personal events that occurred in her life, Lynn became involved with genomic researchers in Australia.
- Price of genomic sequencing has dropped. The price drop has enabled researchers to create huge repositories of genomic data; however, it was mostly on-prem. The idea of building data pipelines was pretty new in the genome community. Additionally, the genome itself is 3 billion data points. A variant of as little at 10-15 variants can be statistically significant.
- The challenge was to leverage cloud resources. To gain a quick win and buy-in for Commonwealth Scientific and Industrial Research Organisation (or CSIRO an independent Australian federal government agency) for cloud adoption, a first step was to capture interest in the idea. So the team stored their reference data in the cloud and enabled access via a Jupyter Notebook.
- They demonstrated a use case against the genomic data set leveraging a synthetic phenotype (or a fake disease) called hipsterdom. The solution became a basis for global discussion that got more people involved in the community.
- By leveraging cloud resources, the CSIRO was able to get a run their dataset that took 500 hours against an on-prem Spark cluster to 10 minutes.
- Learning new programming language has unseen benefits. For example, Ballerina (a language written as an integration language between APIs) interested Lynn because of its live visual diagrams; however, benefited her with some of the cloud pipelines because of its ability to produce YAML files.
You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq
Like InfoQ on Facebook: bit.ly/2jmlyG8
Follow on Twitter: twitter.com/InfoQ
Follow on LinkedIn: www.linkedin.com/company/infoq
Check the landing page on InfoQ: https://bit.ly/2T2LZBQ