Palestra: Building a Successful Data Lake

Track: Engenharia aplicada a Machine Learning

Sala: Sala 4

Horário: 4:05pm - 4:50pm

Dia da semana: Segunda-feira

Nível: Intermediário

Persona: Arquiteto(a), Cientista de Dados, Desenvolvedor(a) Programador(a), Desenvolvedor(a) Sênior, Líder Técnico(a)

Apresentação em Inglês

Share this on:

Pontos Principais

  • What makes a successful data lake 
  • Organizing a data lake: enabling analytics self-service, governing the lake: access control, de-identification, regulatory compliance (GDPR, CCPA, etc.), keeping track of what’s in the lake: data catalogs, lineage, access control
  • Architecting a data lake - cloud, on-premise, hybrid and logical


Companies are investing in building data lakes to support analytics and data science initiatives, but many of these lakes end up as data swamps – expensive, yet largely unused and unusable. This talk assumes basic knowledge and understanding of data lake and big data principles and will focus on how to avoid building a data swamp by applying best practices for enabling a governed self-service. It is based on a recent O’Reilly book “Enterprise Big Data Lakes – Delivering on the Promise of Big Data and Data Sciences”, and discussions with dozens of data lake teams on what worked and what did not work for them.

Palestrante: Alex Gorelik

Founder and CTO at Waterline Data

Alex is founder and CTO of Waterline Data - a developer of AI driven data catalog. Prior to Waterline Data, Alex served as SVP and General Manager of Informatica’s Data Quality Business Unit, driving R&D, Product Marketing, and Product Management for an $80M business. Alex joined Informatica from IBM, where he was an IBM Distinguished Engineer for the Infosphere team. IBM acquired Alex’s second startup, Exeros (currently marketed as Infosphere Discovery), where he was founder, CTO and VP of Engineering. 
Previously, Alex was co-founder, CTO and VP of Engineering at Acta Technology, a pioneering ETL and EII company acquired by Business Objects and now marketed as SAP Data Service. Prior to founding Acta, Alex managed the development of Replication Server at Sybase and worked on Sybase’s strategy for enterprise application integration (EAI).
Alex is a frequent speaker at industry conferences and an author of “Enterprise Big Data Lake” book published by O’Reilly.
Alex holds a B.S. in Computer Science from Columbia University School of Engineering and a M.S. in Computer Science from Stanford University.

Find Alex Gorelik at

Tracks 2019