Why do we need databases and SQL?

SQL has a long and proven history. He survived the NoSQL scramble. Even if it’s not perfect, it’s proven to be the best data language available. This is no surprise! The story begins in the 1960s with the development of databases — an era marked by the introduction of the Integrated Data Store (IDS) at General Electric. However, it was Edgar Codd’s relational model that revolutionized data handling. His model, which turned data into a series of tables (or, more strictly, relations), has influenced database systems ever since. This era also saw the birth of SQL (Structured Query Language), which became the standard language for interacting with relational databases, including MariaDB and others.

Usefulness of relational database systems

So why do we need all these databases? Let’s imagine you’re building an app, perhaps a simple to-do list to track daily tasks. At first you might think, “Why not just save each task directly to a file?” After all, my programming language has constructs and libraries for saving and reading data from disk. Also, the implementation of this seems simple: create a task, write it to a file, delete the task and remove it from the file. These are good points; however, as your app becomes more popular, users begin to accumulate and suddenly you have thousands of users trying to add, delete, and modify tasks simultaneously. At this point the simplicity of the files becomes fragile. Imagine one user updating a task at the exact moment another is trying to delete it. Or maybe two users are editing the same task at the same time. With a simple file system, you are likely to end up with corrupted or lost data because there is no inherent mechanism to resolve such conflicts.

Databases handle these situations gracefully through Properties of ACIDS. Essentially, the set of principles ensures that even if your application crashes in the middle of an update, the data remains consistent and no half-finished tasks are left hanging. Going back to the task app example, imagine you’re trying to move your “Buy Groceries” task from pending to completed which also requires a change to last_updated property, but your app crashes right in the middle. With a relational database, it’s all or nothing—or the task is marked complete and last_updated property reflects the new time value, or it’s as if you never tried to update it, avoiding those incorrect half-states.

Let us now consider data relationships. In your application, tasks can belong to different categories or users. In a file system, maintaining these relationships is cumbersome. You might end up with a separate file for each category or user, but then how do you quickly find all the tasks in the categories or ensure that no two users end up with the same task ID? Databases have the ability to manage complex relationships, making it easy to query all tasks for a specific user or category, or even more complex queries like “show me the number of completed tasks for user U grouped by category C over the last month.”

Safety is another big thing. In a file system, if someone gets access to your files, they have your data. Databases offer strong security features, such as access control and encryption, protecting your data from unauthorized views.

And there is also the issue of growth. Your simple to-do app could eventually evolve into a complex business project management tool. With a file system, every change can look like renovating a building that still has people in it. Databases are built to be flexible and scalable, meaning they’re designed to grow with your needs, whether you’re adding new features or handling multiple users.

In the end, choosing a database over a simple file system means setting yourself up for success while standing on solid ground. It’s about making sure that as your application grows, your data remains secure, consistent and manageable, and your users happy. After all, no one likes to lose their to-do list due to a random crash or wait forever for their tasks to load because the system is stuck resolving conflicts and searching!

A bit of history

It was Edgar Codd who proposed the relational model for databases, and being a mathematician, he formalized the concepts by creating what is called relational algebra and relational calculus. This was all theoretical until IBM and others began implementing the concepts in academic and research projects. They also wanted to devise a standard language for querying data in relational databases. At first they invented QUEL (Querying Using the English Language) at the University of California, Berkeley. At IBM, researchers wanted to design their own language and started a project that I see more as a game between colleagues called SQUARE (Specifying Queries Using a Relational Environment). This led to a questionable language that had scientific notation with subscripts and superscripts, which was difficult to type on computer keyboards. To solve this, they redefined the language to use only standard characters and, in a clever and probably friendly mocking way, named it SEQUEL. This name, however, was trademarked in Great Britain, which prevented them from using it. They removed the vowels in CONTINUATION, and boom! SQL was born. By 1986, SQL would become an ISO and ANSI standard.

As an interesting historical note, even though their inventors had to rename SEQUEL to SQL, they continued to call it “sequel”. Even today, many software developers and IT professionals continue to pronounce it “continuation”. The name Structured Query Language (SQL) will appear later.

Usefulness of SQL

SQL is a declarative language, which means you specify what you want to get, not how to get it. The database is in charge of doing whatever it takes to get the requested data. SQL isolates the complexity of the database. A database is a complex software with many algorithms implemented in it. These algorithms deal with different ways to store data on disk or in memory. Different algorithms are more efficient in different circumstances involving different queries and different datasets.

For example, in MariaDB, a component called the query optimizer is responsible for deciding which algorithms to use given an SQL query and statistics collected on real data. The query optimizer analyzes the SQL query, data structures, database schema and statistical distribution of the data. It then decides whether to use an index, which join algorithm is best, and how to sequence the operations. This process involves an incredible amount of complexity and mathematical precision, all of which is abstractly managed by the database for you. As a developer, you only need to worry about creating queries to get the data you need and let the database decide whether to use an index (with some datasets, not using an index can be faster), B-trees, hash tables, etc. even whether to add data to the in-memory cache, as well as many other things.

SQL also allows you to handle writes, that is, creating and updating data. It also allows you to define the database schema, or in short and oversimplified, the tables and their column structure. In fact, SQL allows you to do much more, and its functionality can be divided into four categories:

Data Definition Language (DDL): Creating and manipulating a schema.
Data Manipulation Language (DML): Inserting, updating and deleting data from the database.
Data Query Language (DQL): Retrieving data from the database.
Data Control Language (DCL): Working with rights and permissions over the database and its objects.

In my 15+ years of industry experience, I have rarely seen the previous categories used in a work environment, with the exception of DDL, which refers to activities related to handling database schema updates. These categories are useful mainly in academia or in teams implementing relational database management software. However, it is good to know that these terms exist and are used by others as they help in database technology discussions. With that in mind, let me touch briefly on one such debate.

Some would say that developers only need to deal with DML and DQL, while DDL and DCL are the concern of DBAs. In practice, this division is not so easy to make. Developers need to understand how database objects (such as tables and columns) are created and how access to those objects is managed. However, the truth is that developers spend most of their time writing SQL statements to modify and query data. You’ll find that this book focuses on DML and DQL while explaining other categories as needed. On the other hand, DBAs are experts in all things database—from infrastructure and general database management to SQL query optimization and migration, a DBA is always a valuable brain for your team.

Conclusion

In conclusion, databases solve real problems that application developers face, thanks to their ability to ensure data integrity through ACID properties, manage complex relationships, and provide robust security features. I’ve only scratched the surface here, but this should be enough to give novice IT professionals a quick refresher on the importance of relational databases and SQL.

Source link