Creating an AI Data Analyst with DBeaver

Business people use data to make decisions, from public sources or from their company’s databases. However, when internal data is needed, most lack expertise in SQL, Cypher, or other database-specific languages. This creates a gap that is bridged by data analysts who act as interpreters between the business and the database, translating human questions into a language the database understands.

These days, large language models (LLM) like GPT or Mistral can fulfill some of the tasks performed by data analysts. For example, LLMs can:

  • Understand user queries in various languages ​​(eg English, Spanish, Chinese).
  • Generate SQL queries based on those questions.
  • Use tools and APIs to execute these SQL queries against databases to retrieve relevant business data.

In this article, I will walk you through the steps of creating an AI data analyst using DBeaver. You will see that it is already quite simple “embark” junior AI specialist to their team, who can act as that interpreter between business people and the database, allowing human data analysts to focus on more sophisticated tasks.

Exploring AI Chat in DBeaver

If you regularly work with data, DBeaver should be no stranger to you. It is a popular tool that can connect to hundreds of databases, allowing you to query or manipulate data. The tool is used by developers, administrators, analysts and everyone who works with data.

DBeaver has a broad ecosystem of releases, features, and extensions, making it a versatile tool for a variety of use cases. AI Chat is one of its newest features. With AI Chat, you can type questions in plain English and let DBeaver generate and then execute SQL queries against your database.

As you might guess, AI Chat is a core capability for AI data analysts. So, let’s try.

I connected DBeaver to my YugabyteDB database cluster, which stores a Netflix dataset of over 8,800 movies and shows:

Connected DBeaver to the YugabyteDB database cluster, which stores the Netflix dataset of more than 8,800 movies and shows

Next, to enable AI Chat, I jumped to the Window->Settings->AI (GPT) and configured the following settings:

Window->Settings->AI (GPT): configured the following settings”/></p>
<ul>
<li dir=Checked Enable smart completion framework to activate DBeaver’s AI capabilities

  • Turned it on The Execute SQL Now setting to allow a future AI data analyst to execute queries independently (we’ll discuss how to set data access permissions for the analyst later)
  • I gave my OpenAI key in API token field
  • Note: I’m using DBeaver Enterprise Edition which comes with built-in AI support. If you are using DBeaver Community Edition, you need to install the AI ​​(GPT) extension first.

    After applying the configuration settings, I opened AI Chat and allowed the database metadata (table and column names) to be shared with OpenAI.

    Allow sharing database metadata (table and column names) with OpenAI

    By sharing metadata with LLM, effectively “you include” AI analyst to your team. The analyst will learn about the existing tables, columns and their dependencies by analyzing the metadata.

    Finally, I tested the chat interface by asking the following question:

    • In which years did we release the most action movies? Take the best three years for me.

    Chat interface test question and results

    The OpenAI GPT model successfully generated a valid SQL request using the PostgreSQL dialect (recognizing that YugabyteDB is a distributed version of Postgres), and then DBeaver fetched the data for me.

    AI Data Analyst role setup

    DBeaver’s AI Chat interface will be a communication channel between business users and AI data analysts. However, chat alone is not enough. I want the AI ​​to work autonomously without my constant supervision. For this to happen, I first need to create a role with a precise set of permissions and, second, have insight into the queries generated by the AI ​​(in case I want to share more knowledge with the AI ​​or fine-tune its behavior).

    I was able to solve these two requirements with another solution from the DBeaver ecosystem, Team Edition, which is an advanced data management tool for teamwork. With Team Edition, you can configure a role for an AI data analyst and monitor its behavior.

    First, I installed the Team Edition server in my local Docker environment. You need a server to manage user-level access to data and track how data is used across teams.

    Team Edition server installed in a local Docker environment

    Then I created Streaming platform Netflix project to be shared with the AI ​​analyst:

    Create a Netflix Streaming Platform project

    After that I jumped to Access control menu and created ai_analysist role:

    Access Management Menu: Create ai_analyst role

    Team Edition supports different roles for team members with different skills and responsibilities. I have chosen Manager an AI analyst role as long as the role is suitable for professionals who are well skilled in writing SQL queries and need to have direct access to the database.

    After the role was created, I granted it access Streaming platform Netflix project:

    Granted role access to the Netflix Streaming Platform project

    As a next step, I configured the AI-related settings on the server side:

    • It is enabled AI services:
      Enable AI services
    • I selected the LLM provider with the model and provided my API key:

      Select the LLM provider with the model and specify the API key

    I then completed the configuration on the Team Edition side by adding a YugabyteDB connection to the Streaming platform Netflix project, which will allow the AI ​​data analyst to directly access the database using the specified link.

    Add a YugabyteDB connection to the Netflix Streaming Platform project

    Finally, I used this new database connection to make it read-only ai_analyst database level role:

    Create a read-only ai_analyst role at the database level

    How does this database-level role differ from the one we created earlier with Team Edition? Both roles have different purposes:

    • The Team Edition level role used to authenticate with the Team Edition server and to gain access to a list of predefined projects, connections, and datasets. Business users will use this role to “activation” AI Analysts by logging into the DBeaver Team Edition Client (discussed below) and asking questions via the AI ​​Chat interface.
    • The role at the database level is used to control the types of queries that the AI ​​analyst is allowed to perform against the database. It is possible that, in response to user queries, the AI ​​analyst will generate queries that could modify the data. The role created ensures that the AI ​​can only SELECT data and cannot perform any manipulations on database objects.

    Using AI Data Analyst to get things done

    Business users will “to cooperate” with AI data analyst via the Team Edition Desktop client. Let’s explore what that experience looks like.

    After installing the Desktop client, “activated” the analyst himself by logging in using his role:

    • Authenticated with the Team Edition server using ai_analyst the role.
      Authenticate to the Team Edition server using the ai_analyst role
    • I then connected to the database using the read-only database level role.
      Connect to the database using the read-only database-level role

    After that, I went to the AI ​​Chat interface and asked the AI ​​analyst to answer the following questions:

    • In which years did we release the most action movies? Take the best three years for me.
    • What movies did Adam Sandler star in?
    • I need to know the category and release dates of those movies.

    Asking questions of AI analysts

    The AI ​​analyst handled my questions easily.

    Note: If you don’t see the AI ​​Chat tab in Team Edition Desktop, you may need to sign in with your Team Edition administrator account. Go to Window->Preferences->AI (GPT) and check the Enable Smart Completion box.

    Next, I wanted to ensure that the AI ​​analyst would not be able to update or delete my data. To test this, I asked the AI ​​to perform the following task for me:

    • Please do me a favor. Delete all movies starring Adam Sandler.

    Looking for an AI analyst to complete a task

    As you can see, the AI ​​analyst generated a valid one DELETE command, but the database prevented the request from being executed by throwing the following exception:

    SQL Error [42501]: ERROR: permission denied for table netflix_shows

    Finally, if you recall, I wanted to be able to see the history of queries generated by the AI ​​analyst and adjust the analyst’s behavior if necessary. You can find that history under Query Manager dashboard tab for Team Edition administration:

    History located under the Query Manager tab in the Team Edition administration dashboard

    Job done! Now my AI data analyst is ready “to join” team and contribute.

    Video

    Watch the video below to see the AI ​​analyst in action:

    In summary

    Despite their early development stage, you can already use large language models (LLM) to create AI experts. These AI teammates allow us to focus on sophisticated tasks while dealing with the mundane. Moreover, the creation process does not necessarily involve coding or advanced fine-tuning. It will often be more than enough to use existing tools like DBeaver Team Edition to get an AI expert on board in less than an hour.

    Source link

    Leave a Reply

    Your email address will not be published. Required fields are marked *