LinkedIn has announced that it is opening up its control layer for table management in data lake deployments.
The tool, the so-called Open house, has been used on LinkedIn for the past year. The company currently has 3,500 OpenHouse desks in production.
It is designed to offer self-service table management in open data lakes. According to LinkedIn, it ran into internal challenges because it didn’t have a well-managed experience for running a data warehouse, which meant end users often faced low-level infrastructure issues, taking time away from the time they should have been spending working on their products.
“Overall, since launching OpenHouse, we’ve seen a drastic reduction in operational work for data infra teams, an improved developer experience for data infra users, and improved LinkedIn data management,” Sumedh Sakdeo, senior software engineer at LinkedIn and creator of OpenHouse, wrote in blog post.
OpenHouse consists of a declarative catalog and a package of data services. The catalog includes table definitions, their schemas, and associated metadata, and integrates with Apache Spark. It supports standard syntax such as SHOW DATABASE, SHOW TABLES, CREATE TABLE, ALTER TABLE, SELECT FROM, INSERT INTO and DROP TABLE. The catalog is also where users can specify retention, replication, and sharing policies for a table.
Another key element of OpenHouse is that it reconciles the observed state of the table with its desired state, and this is where data services are called. Data services are responsible for orchestrating table maintenance jobs.
According to LinkedIn, the goal was always to open source the project at some point, and so it was designed to allow connections to storage, authentication, authorization, database, and job dispatch services.
“Now that we’ve reached an open source milestone, we invite you to explore OpenHouse and give us your valuable feedback. We are interested in working with customers to understand how OpenHouse works in different environments, whether it is integrated into a cloud infrastructure or adapted to their preferred table formats,” Sakdeo wrote.