Author: John O’Brien, CEO, Radiant Advisors
Editor: Lindy Ryan, Research Director, Radiant
The Definitive Guide to the Data Lake
It would be an understatement to say that the hype surrounding the data lake is causing confusion in the industry. Perhaps, this is an inherent consequence of the data industry’s need for buzzwords: it’s not uncommon for a term to rise to popularity long before there is clear definition and repeatable business value. We have seen this henomena many times when concepts including “big data,” “data reservoir,” and even the “data warehouse” first emerged in the industry. Today’s newcomer to the data world vernacular—the “data lake”—is a term that has endured both the scrutiny of pundits who harp on the risk of digging a data swamp and, likewise, the vision of those who see the potential of the concept to have a rofound impact on enterprise data architecture. As the data lake term begins to come off its hype cycle and face the pressures of pragmatic IT and business stakeholders, the demand for clear data lake definitions, use cases, and best practices continues to grow.
This paper aims to clarify the data lake concept by combining fundamental data and information management principles with the experiences of existing implementations to explain how current data architectures will transform into amodern data architecture. The data lake is a foundational component and common denominator of the modern data architecture enabling, and complementing specialized components, such as enterprise data warehouses, discovery-oriented environments, and highly-specialized analytic or operational data technologies within or external to the Hadoop ecosystem. Therefore, the data lake has become the metaphor for the transformation of enterprise data management, and will continue to evolve the data lake definition according to established principles, drivers, and best practices that will quickly emerge as hindsight is applied at companies.