Querying Markdown tables with SQLite
I keep a personal wiki which acts as a personal journal and record of things that I do. It's a collection of markdown files on my computer. I also run, and rather than use a fitness tracker, I record my runs in my wiki. Recently, I had a pretty good run. I thought it was a personal best, but I had to check my wiki to be sure.
After I run, I track the run in a page of my wiki that looks kind of like this:
# Exercise
Some paragraph text. A description of my current workout.
## Log
| Date | Duration | Distance | Notes |
|--------------|------------|------------|----------------------------|
| 2025-02-09 | 38:40.00 | 4.45 | |
| 2025-02-07 | 39:40.00 | 4.45 | |
| 2025-02-05 | 41:10.00 | 4.45 | Had to stop to tie my shoe |
Table: Tracking my runs.
In order to check whether I had achieved a personal best, I really wanted to execute this SQL query against my running table:
SELECT *
FROM runs
WHERE Distance = '4.45'
ORDER BY Duration
LIMIT 1;
Here's what I did to accomplish this.
Table metadata
First, I needed a way to associate a SQLite table name with the Markdown table. I decided to surround the table in a fenced div with an attribute block indicating my desired table name on the SQL side.
::: { sqlite_table_name="runs" }
| Date | Duration | Distance | Notes |
|--------------|------------|------------|----------------------------|
| 2025-02-09 | 38:40.00 | 4.45 | |
| 2025-02-08 | 39:40.00 | 4.45 | |
| 2025-02-07 | 41:10.00 | 4.45 | Had to stop to tie my shoe |
Table: Tracking my runs.
:::
When I eventually switch things over to djot
, this should carry over
nicely.
Extracting from Markdown
Second, I needed to extract the tables from my wiki. pandoc
is already the
backbone of my wiki, so it was natural for me to write this as a pandoc
filter.
The filter needs to do two things. First, it needs to remove everything that
isn't a table. A pandoc
filter is a Lua program with functions for each type
of node in the syntax tree we're interested in modifying. In our case, we want
to drop all syntax tree nodes except for Div
nodes with the attribute
sqlite_table_name
.
function only_sqlite_tables(elem)
if not elem.attributes["sqlite_table_name"] then
return {}
end
return elem, false
end
return {
-- remove everything except tables
{
traverse = 'topdown',
Div = only_sqlite_tables,
Header = function(elem)
return {}
end,
Para = function(elem)
return {}
end,
Table = function(elem)
return {}
end,
-- All the other node types...
},
}
We need to return elem, false
because we know that when we encounter a Div
node with the correct attribute, we want to stop processing and keep the whole
node. This allows us to delete other Table
nodes that may not be decorated
with the right attribute. This allows us to incrementally add SQLite table names
to tables without requiring all tables to be exported. For this same reason, we
set traverse = 'topdown'
on the first filter.
Second, it needs to put those tables into a SQLite database for me to query. In
order to do this, let's walk over the table cells and build a tab-separated
table, then use SQLite's .import
to import the table.
function filter_main(doc)
for i,div in ipairs(doc.blocks) do
local t = { }
io.output(string.format("db/%s.tsv", div.attributes["sqlite_table_name"]))
for i,tble in ipairs(div.content) do
local head = { }
for i,cell in ipairs(tble.head.rows[1].cells) do
head[#head+1] = cell.contents[1].content[1].text
end
t[#t+1] = table.concat(head, "\t")
for i,row in ipairs(tble.bodies[1].body) do
local r = { }
for i,cell in ipairs(row.cells) do
if cell.contents[1] then
local c = { }
for i,content in ipairs(cell.contents[1].content) do
c[#c+1] = content.text
end
r[#r+1] = table.concat(c, " ")
else
r[#r+1] = ''
end
end
t[#t+1] = table.concat(r, "\t")
end
end
io.write(table.concat(t, "\n"))
end
os.exit(0)
end
return {
-- remove everything except tables
-- { -- the first filter we wrote
-- ...
-- },
-- save tables as files in `db/`
{
Pandoc = filter_main,
}
}
Please forgive the horror of this function. By returning a list of filters at
the end, we ask pandoc
to run the first filter then pass its results to the
second. In this way, the second filter will only be operating on Div
nodes
with the special sqlite_table_name
attribute. We write each markdown table
formatted as a tab-separated table into a file named db/<table name>.tsv
. In
this example, we're only dealing with one table, but my actual wiki has multiple
tables over multiple files.
Import into SQLite
Finally, I needed to import these TSV files into a SQLite database. The root of
my wiki has a justfile
for maintenance tasks, so this makes sense as a
just
recipe:
sql:
#!/usr/bin/env fish
rm db/*
pandoc -f markdown --lua-filter=extract-tables.lua *.md
for table in db/*.tsv
set -l table_name (basename -s '.tsv' $table)
sqlite3 -tabs db/db.sql ".import $table $table_name"
end
sqlite3 -cmd ".mode markdown" db/db.sql
This recipe clears out old TSV files (so the database is generated on each run),
calls pandoc
with the filter, then iterates over the TSV files in db/
and
imports them into SQLite. Finally, it launches a SQLite shell with the new
database. Launching SQLite in Markdown mode isn't technically required, but I do
think it looks a little nicer.
Here's the final result:
$ just sql
SQLite version 3.48.0 2025-01-14 11:05:00
Enter ".help" for usage hints.
sqlite> SELECT *
FROM runs
WHERE Distance = '4.45'
ORDER BY Duration
LIMIT 1;
| Date | Duration | Distance | Notes |
|------------|----------|----------|-------|
| 2025-02-09 | 38:40.00 | 4.45 | |
Thoughts
Instead of maintaining these tables in Markdown, I could maintain them in a SQLite database and write the reverse of the filters above, extracting all my runs when I convert the Markdown to HTML. That seemed like it would be a little less ergonomic for me, but I'll think on it.
I think it would be very ergonomic to also put some metadata on the column headers indicating my desired datatype on the SQL side. I think I could use bracketed spans to accomplish this, but I didn't try it.