Featured image of post I'm still writing SQL

I'm still writing SQL

Code generation with SQLC

Lire en Français

Foreword

Today, I’d like to tell you about a proof-of-concept that I’ve just recently implemented in my work. But first, I’d like to tell you how I came to be interested in the subject I’m about to talk about.

I’ve always loved SQL. This language, both simple and elegant, allows you to perform queries within a database as if you were speaking to it directly.
When I was at school, it was perhaps the first time I felt like I was speaking another language in computing. A database will always give you the information you want, as long as you know how to ask for it.

I then became a developer, and I remembered that if I had learned to communicate with a database, it was above all to apply it within my own programs. Including SQL queries in source code, that is. Like any student, I experienced the joy of writing SELECT * and INSERT INTO in strings concatenated to values I passed in forms, and the wonder of seeing my database data displayed in my web pages.

Like many students, I put my first websites online to show them off to everyone.
Then, like everyone else who’d done the same thing as me, I discovered the joy of getting my database pwned with SQL injections.
What, it’s wrong to write SQL concatenated strings?

I then discovered how to do query preparation.
We all ran into the same problems:

  • It’s tedious
  • It’s easy to make mistakes
  • The fear of letting an injection slip through is always present
  • You have to map your objects carefully

For this last point, I speak of “object” in the general case and philosophical sense of the term, although I’m about to illustrate the problem with the Go language, which is not object-oriented.

I wasn’t the only one to notice the need to map structures/classes to database tables. It was these people who created ORM (Object-Relational Mapping), enabling us to meet this need.

The advantages of ORMs are immense:

  • Direct manipulation of objects
  • Totally abstracts the SQL layer, no need to write SQL at all
  • Greatly facilitates serialization / deserialization
  • Simplified CRUD operations
  • Simplified relationship management
  • Compatibility between several databases (PostgreSQL, MySQL, etc.)

There are also unavoidable disadvantages:

  • Loss of performance
  • More limited control
  • Debugging sometimes more complicated

While these disadvantages exist, the advantages are far too obvious to ignore. The ORM principle is widely adopted in just about every programming language.

I used to love SQL, but now I have to face facts: writing SQL in this day and age is obsolete. Nobody does it anymore. Or so I thought.
I’ve been talking to my databases via ORMs for years, and I’ve come to terms with it. But I must confess I’m nostalgic for the days when I spoke the language of my databases.

Recently at work, we’ve been confronted with a number of problems linked to the lack of control that ORMs sometimes entail, particularly when we need to change the database schema.
That’s when a colleague came to introduce us to a technology I hadn’t heard of before: SQLC.

SQLC ans the ORMs

SQLC is a tool that automatically generates code from manually written SQL queries.
It was created to combine the security and performance of raw SQL with the ease of use of an ORM, while avoiding the processing overload and loss of control that ORMs can introduce.

Initially written in Go for the Go language, SQLC now supports several other languages.
Here, we’ll talk about it mainly in Go, and use the GORM ORM for comparison.

Let’s take a very simple database schema:

1
2
3
4
5
6
7
8
+---------------------+          +------------------+
|         book        |          |      author      |
+---------------------+          +------------------+
| ID         (PK)     |     +--> | ID      (PK)     |
| Title               |     |    | Email            |
| PublicationDate     |     |    | Bio              |
| AuthorID   (FK)     | ----+    |                  |
+---------------------+          +------------------+

At first glance, it’s a bit of a step backwards: we’re back to using prepared queries written directly in our source code.

In the case of ORMs, SQL is totally invisible.
Adding a row to a database literally takes a line of code:

1
2
3
func CreateAuthor(db *gorm.DB, author *models.Author) error {
    return db.Create(author).Error
}

However, as mentioned above, we want to free ourselves from the limitations of ORMs. If manipulating databases with an ORM is easy with trivial examples, what about when we want to execute complex queries?
It’s not necessarily impossible, but we sometimes have to make concessions (splitting up queries to delegate some of the calculations to the programming language, creating views, etc.). Finally, aren’t there other ways of dealing with the disadvantages of manually prepared queries?

SQLC takes the problem the other way round: no dependencies in the code, but a local tool that generates code compatible with the standard library.
Note that code generation to other libraries is also possible, but here it’s the fact of having no dependencies on the standard library that interests us.
The result:

  • SQL code is written again
  • 100% standard library-compatible code ready for use

Just add a comment above the queries to specify the desired function name:

author.sql

1
2
3
4
5
6
7
8
9
-- name: createAuthor :one
INSERT INTO author (email, bio)
VALUES ($1, $2)
RETURNING id;

-- name: getAuthor :one
SELECT *
FROM author
WHERE id = $1;  

auhor.sql.go (generated file)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import (
    "context"
    "database/sql"
)

const createAuthor = `-- name: createAuthor :one
INSERT INTO author (email, bio)
VALUES ($1, $2)
RETURNING id
`

type createAuthorParams struct {
    Email string
    Bio   sql.NullString
}

func (q *Queries) createAuthor(ctx context.Context, arg createAuthorParams) (int32, error) {
    row := q.db.QueryRowContext(ctx, createAuthor, arg.Email, arg.Bio)
    var id int32
    err := row.Scan(&id)
    return id, err
}

const getAuthor = `-- name: getAuthor :one
SELECT id, email, bio
FROM author
WHERE id = $1
`

func (q *Queries) getAuthor(ctx context.Context, id int32) (Author, error) {
    row := q.db.QueryRowContext(ctx, getAuthor, id)
    var i Author
    err := row.Scan(&i.ID, &i.Email, &i.Bio)
    return i, err
}

It’s another way of looking at the problem: no more writing boilerplate, no more fear of potential injections due to human error.
In short, the benefits are very similar to those provided by ORMs, but with control over queries and no loss of performance.
You can find a more complete example on the SQLC Playground.

Writing SQL is cool.

You’ll notice that the SELECT * has been transformed into SELECT id, email, bio in the generated code.
Indeed, SQLC replaces SELECT * with an explicit list of columns to guarantee the stability of the generated code, optimize performance and avoid errors linked to schema changes.

You’ll then notice that the comparison between the use of functions provided by an ORM and those generated by SQLC is quite similar.

GORM

1
2
3
4
5
6
7
func GetAuthor(db *gorm.DB, id int64) (*models.Author, error) {
    var author models.Author
    if err := db.First(&author, id).Error; err != nil {
        return nil, err
    }
    return &author, nil
}

SQLC

1
2
3
4
5
6
7
func (q *Queries) GetAuthor(ctx context.Context, id int32) (*Author, error) {
    author, err := q.getAuthor(ctx, id)
    if err != nil {
        return nil, err
    }
    return &author, nil
}

In the case of SQLC, we use a structure of type *Queries, which is an encapsulation of DBTX, an interface that defines a contract for executing SQL queries. Both are generated by SQLC.

We’re free to use them as we wish. For my part, I’ve chosen to make the generated functions internal (hence the fact that they start with a lowercase), in order to encapsulate them in exported functions of the same name that I define as *Queries methods.

And that’s all there is to it! Just as easy to use as before, but with many advantages.

Besides, there are other advantages. I’d like to draw attention to two of them that I really appreciate.
Firstly, the templates are also generated by SQL scripts.

Model generation

Here are the Go models generated by my SQL scripts:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
type Author struct {
    ID    int32
    Email string
    Bio   sql.NullString
}

type Book struct {
    ID              int32
    Title           string
    PublicationDate time.Time
    AuthorID        int32
}

Now you’re beginning to understand the concept of SQLC, these models have obviously been written by SQL scripts.
So here’s an SQL script that would allow the generation of these templates:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
CREATE TABLE author (
    id SERIAL PRIMARY KEY,
    email TEXT NOT NULL UNIQUE,
    bio TEXT
);

CREATE TABLE book (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    publication_date DATE NOT NULL,
    author_id INT NOT NULL,
    FOREIGN KEY (author_id) REFERENCES author(id) ON DELETE CASCADE
);

It works, but in the corporate world, database tables are rarely static. They evolve. That’s why you’ll usually find migration scripts.
This is where it gets interesting: if SQLC is able to generate our models via SQL, it can then do so via migration scripts directly.

Here are the scripts I actually used to obtain the same result:

v0001.sql

1
2
3
4
5
6
7
8
9
-- +migrate Up
CREATE TABLE author (
    id SERIAL PRIMARY KEY,
    email TEXT NOT NULL UNIQUE,
    bio TEXT
);

-- +migrate Down
DROP TABLE author;

v0002.sql

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
-- +migrate Up
CREATE TABLE book (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    publication_date DATE NOT NULL,
    author_id INT NOT NULL,
    FOREIGN KEY (author_id) REFERENCES author(id)
);

-- +migrate Down
DROP TABLE book;

v0003.sql

1
2
3
4
5
6
7
-- +migrate Up
ALTER TABLE book DROP CONSTRAINT book_author_id_fkey;
ALTER TABLE book ADD CONSTRAINT book_author_id_fkey FOREIGN KEY (author_id) REFERENCES author(id) ON DELETE CASCADE;

-- +migrate Down
ALTER TABLE book DROP CONSTRAINT book_author_id_fkey;
ALTER TABLE book ADD CONSTRAINT book_author_id_fkey FOREIGN KEY (author_id) REFERENCES author(id);

To my work, database management is handled by a dedicated team. They have their own Git repositories, and we do PRs with them to evolve our schemas.
Thanks to SQLC, we can now add these repos directly via Git submodules to ensure that SQLC fetches the scripts from these repos to generate our models.
Before, we had to replicate the data models within our code, with the human errors that this could entail.

This method provides a real and reliable link between the code of our microservices and that of the database migrations.
The database team thus becomes the real source of truth.

As an example, I’ve reproduced this proof-of-concept at home: https://github.com/sebferrer/poc-sqlc ;
and stored my migration scripts in a dedicated repo: https://github.com/sebferrer/poc-sqlc-db

Mocking

I’d like to mention another advantage that stood out for me. Generating your code via SQLC greatly simplifies the mock, and for a good reason: we’re the ones who wrote the SQL script, so we know exactly what to expect.

This may seem trivial, but remember that if via an ORM we have no control over the SQL code that is actually generated behind it, it can make mocks particularly tedious to write.
I want to mock the SELECT query generated behind my Get, what should I expect as a result? What is the exact query I need to intercept?

Let’s take GORM as an example:

1
mock.ExpectQuery(regexp.QuoteMeta(`SELECT * FROM "author" WHERE "author"."id" = $1 ORDER BY "author"."id" LIMIT $2`))

It works, that’s cool. But I had to understand a little about how GORM works to know that it’s specifically this query that should be expected if I want to mock my Get function.

Does my ORM generate keywords in lowercase or uppercase? Does it use quotes or double-quotes? Does it systematically generate the table name before the associated column? Etc…
To answer these questions, you’ll need to read the documentation carefully, or even go straight to the ORM source code. You can also use the error messages of the SQL mock library you’re using, but you have to admit that this requires a bit of tinkering.
And that goes for every SQL query involved in your mocks.

When switched to SQLC, the mock will look like this:

1
2
3
mock.ExpectQuery(regexp.QuoteMeta(`INSERT INTO author (email, bio)
                                   VALUES ($1, $2)
                                   RETURNING id`))

In other words, exactly the query I wrote myself (or almost, see the SELECT * example, but the generated query is directly accessible).

When it comes to mocking up a whole battery of SQL queries for our unit tests, I’ve discovered how much more convenient it is to just have a simple copy-and-paste of my queries to do.
In my opinion, a solution like SQLC adds real value to the use of SQL mock libraries.

Some subtleties

Parameter structures

SQLC generates Go structures to encapsulate SQL query parameters, but only when there is more than one parameter. If the SQL query has only one parameter, SQLC does not use a structure and passes the corresponding value directly to the function.

In the example above, a structure has been generated to pass the necessary parameters so that it can be passed to the function that was also generated:

1
2
3
4
5
6
type createAuthorParams struct {
    Email string
    Bio   sql.NullString
}

func (q *Queries) createAuthor(ctx context.Context, arg createAuthorParams) (int32, error)

It would have been the same for the SELECT query if it had needed more than a single parameter.
On the other hand, if only one parameter is needed for the query, SQLC won’t bother creating a structure for nothing:

1
2
3
-- name: deleteAuthor :exec
DELETE FROM author
WHERE id = $1
1
2
3
4
func (q *Queries) deleteAuthor(ctx context.Context, id int32) error {
    _, err := q.db.ExecContext(ctx, deleteAuthor, id)
    return err
}

Return structure

How does SQLC manage the return type to return the selected data? It’s pretty straightforward.

If the SELECT includes all table columns, like a SELECT *, it will directly use the type gerenated before to define the data model.

1
2
3
4
-- name: getAuthor :one
SELECT *
FROM author
WHERE id = $1
1
func (q *Queries) getAuthor(ctx context.Context, id int32) (Author, error)

Otherwise, it generates an intermediate structure, leaving us free to choose how to handle it.

1
2
3
4
-- name: getAuthor :one
SELECT id, email
FROM author
WHERE id = $1;
1
2
3
4
5
6
type getAuthorRow struct {
	ID    int32
	Email string
}

func (q *Queries) getAuthor(ctx context.Context, id int32) (getAuthorRow, error)

To wrap up this post, if you’d like to know a little more, I’ll leave you with a few final notes including some good-to-know subtleties I’ve noticed about SQLC.

Other subtleties in bulk

  • NULL fields in SQL are represented with sql.NullXxx in Go: this lets you handle NULL values without runtime errors.
  • SQLC does not generate transactions automatically: you need to explicitly handle BEGIN and COMMIT in Go functions if you want to execute several queries in one transaction.
  • The INSERT ... RETURNING queries are used to obtain the ID of a new record directly: SQLC translates them into QueryRowContext and uses Scan to retrieve the returned value.

Note also that SQLC does not only generate code using the standard database/sql library but also supports pgx/v4 and pgx/v5.

Conclusion

I think I’ve pretty much covered what I wanted to talk about.
I hope I’ve been able to convey why I became interested in this approach, which is a little different from what many of us commonly use.

In conclusion, ORMs like GORM offer a robust abstraction, facilitate migrations, and provide an intuitive CRUD interface. However, they require some learning investment (albeit minimal) and can introduce performance overhead, especially with large databases and complex operations.

On the other hand, raw SQL offers greater control over queries and can be more efficient for large databases, but it requires strong SQL expertise and involves a longer implementation process in Go code.

So, I would say: if simplicity and rapid development are the priority, an ORM like GORM is a great option. If optimization and precise control over queries are essential, then I would recommend using raw SQL.

The point isn’t to say that one approach is better than the other.
I love ORMs and continue to use them in many of my projects, but I wanted to introduce you to this concept, which has been one of my favorites this year.

SQLC

GORM

Comparison / Benchmark

My proof-of-concept

Many thanks to Gab who introduced me to this technology and got me interested!

Built with Hugo
Theme Stack designed by Jimmy