- Querying Markdown tables with SQLite

I keep a personal wiki which acts as a personal journal and record of things that I do. It's a collection of markdown files on my computer. I also run, and rather than use a fitness tracker, I record my runs in my wiki. Recently, I had a pretty good run. I thought it was a personal best, but I had to check my wiki to be sure.

After I run, I track the run in a page of my wiki that looks kind of like this:

# Exercise

Some paragraph text. A description of my current workout.

## Log

| Date         | Duration   | Distance   | Notes                      |
|--------------|------------|------------|----------------------------|
| 2025-02-09   | 38:40.00   | 4.45       |                            |
| 2025-02-07   | 39:40.00   | 4.45       |                            |
| 2025-02-05   | 41:10.00   | 4.45       | Had to stop to tie my shoe |

Table: Tracking my runs.

In order to check whether I had achieved a personal best, I really wanted to execute this SQL query against my running table:

SELECT *
FROM runs
WHERE Distance = '4.45'
ORDER BY Duration
LIMIT 1;

Here's what I did to accomplish this.

Table metadata

First, I needed a way to associate a SQLite table name with the Markdown table. I decided to surround the table in a fenced div with an attribute block indicating my desired table name on the SQL side.

::: { sqlite_table_name="runs" }

| Date         | Duration   | Distance   | Notes                      |
|--------------|------------|------------|----------------------------|
| 2025-02-09   | 38:40.00   | 4.45       |                            |
| 2025-02-08   | 39:40.00   | 4.45       |                            |
| 2025-02-07   | 41:10.00   | 4.45       | Had to stop to tie my shoe |

Table: Tracking my runs.

:::

When I eventually switch things over to djot, this should carry over nicely.

Extracting from Markdown

Second, I needed to extract the tables from my wiki. pandoc is already the backbone of my wiki, so it was natural for me to write this as a pandoc filter.

The filter needs to do two things. First, it needs to remove everything that isn't a table. A pandoc filter is a Lua program with functions for each type of node in the syntax tree we're interested in modifying. In our case, we want to drop all syntax tree nodes except for Div nodes with the attribute sqlite_table_name.

function only_sqlite_tables(elem)
  if not elem.attributes["sqlite_table_name"] then
    return {}
  end

  return elem, false
end

return {
  -- remove everything except tables
  {
    traverse = 'topdown',
    Div = only_sqlite_tables,
    Header = function(elem)
      return {}
    end,
    Para = function(elem)
      return {}
    end,
    Table = function(elem)
      return {}
    end,
    -- All the other node types...
  },
}

We need to return elem, false because we know that when we encounter a Div node with the correct attribute, we want to stop processing and keep the whole node. This allows us to delete other Table nodes that may not be decorated with the right attribute. This allows us to incrementally add SQLite table names to tables without requiring all tables to be exported. For this same reason, we set traverse = 'topdown' on the first filter.

Second, it needs to put those tables into a SQLite database for me to query. In order to do this, let's walk over the table cells and build a tab-separated table, then use SQLite's .import to import the table.

function filter_main(doc)
  for i,div in ipairs(doc.blocks) do
    local t = { }
    io.output(string.format("db/%s.tsv", div.attributes["sqlite_table_name"]))
    for i,tble in ipairs(div.content) do
      local head = { }
      for i,cell in ipairs(tble.head.rows[1].cells) do
        head[#head+1] = cell.contents[1].content[1].text
      end
      t[#t+1] = table.concat(head, "\t")
      for i,row in ipairs(tble.bodies[1].body) do
        local r = { }
        for i,cell in ipairs(row.cells) do
          if cell.contents[1] then
            local c = { }
            for i,content in ipairs(cell.contents[1].content) do
              c[#c+1] = content.text
            end
            r[#r+1] = table.concat(c, " ")
          else
            r[#r+1] = ''
          end
        end
        t[#t+1] = table.concat(r, "\t")
      end
    end
    io.write(table.concat(t, "\n"))
  end
  os.exit(0)
end

return {
  -- remove everything except tables
  -- { -- the first filter we wrote
  -- ...
  -- },
  -- save tables as files in `db/`
  {
    Pandoc = filter_main,
  }
}

Please forgive the horror of this function. By returning a list of filters at the end, we ask pandoc to run the first filter then pass its results to the second. In this way, the second filter will only be operating on Div nodes with the special sqlite_table_name attribute. We write each markdown table formatted as a tab-separated table into a file named db/<table name>.tsv. In this example, we're only dealing with one table, but my actual wiki has multiple tables over multiple files.

Import into SQLite

Finally, I needed to import these TSV files into a SQLite database. The root of my wiki has a justfile for maintenance tasks, so this makes sense as a just recipe:

sql:
  #!/usr/bin/env fish
  rm db/*
  pandoc -f markdown --lua-filter=extract-tables.lua *.md
  for table in db/*.tsv
    set -l table_name (basename -s '.tsv' $table)
    sqlite3 -tabs db/db.sql ".import $table $table_name"
  end
  sqlite3 -cmd ".mode markdown" db/db.sql

This recipe clears out old TSV files (so the database is generated on each run), calls pandoc with the filter, then iterates over the TSV files in db/ and imports them into SQLite. Finally, it launches a SQLite shell with the new database. Launching SQLite in Markdown mode isn't technically required, but I do think it looks a little nicer.

Here's the final result:

$ just sql
SQLite version 3.48.0 2025-01-14 11:05:00
Enter ".help" for usage hints.
sqlite> SELECT *
FROM runs
WHERE Distance = '4.45'
ORDER BY Duration
LIMIT 1;
|    Date    | Duration | Distance | Notes |
|------------|----------|----------|-------|
| 2025-02-09 | 38:40.00 | 4.45     |       |

Thoughts

Instead of maintaining these tables in Markdown, I could maintain them in a SQLite database and write the reverse of the filters above, extracting all my runs when I convert the Markdown to HTML. That seemed like it would be a little less ergonomic for me, but I'll think on it.

I think it would be very ergonomic to also put some metadata on the column headers indicating my desired datatype on the SQL side. I think I could use bracketed spans to accomplish this, but I didn't try it.

Contents

Table metadata

Extracting from Markdown

Import into SQLite

Thoughts