SQL:DeclarativeQuery
SQL is not a general-purpose programming language — it's a declarative query language. On this site it lives in the "script / query" bucket alongside Bash, not next to C / Rust / Java. Codd published the relational model in 1970, Chamberlin & Boyce drafted SEQUEL at IBM in 1974, Oracle sold the first commercial RDBMS in 1979. Fifty-six years later, the data layer's default is still SQL.
relational model · CACM
1970 → 2026
PG / MySQL / SQLite / DuckDB / ...
SQLite · every phone / browser
What is SQL
SQL = Structured Query Language. Designed in 1974 by Donald Chamberlin and Raymond Boyce inside IBM's System R project as SEQUEL, renamed over a trademark conflict. It's not a Turing-complete general-purpose language (standard SQL isn't) — it's a declarative query language for relational databases: you write "give me these rows", the engine decides how to fetch them.
You state what (SELECT ... WHERE), you don't write the iteration, the index choice, or join order. The optimiser bets for you — fifty years of database research live in that translation.
The underlying model comes from Codd's 1970 paper: relations (tables), tuples (rows), and five algebraic operations — projection / selection / join / union / Cartesian product. SQL is syntactic sugar over that algebra.
The ANSI / ISO standard runs from SQL-86 to SQL:2023, but no engine implements 100% of any version. CTEs, window functions, JSON, arrays — each vendor moves at its own pace. This is the real day-to-day pain of SQL engineering.
SQL itself doesn't write web pages, drive hardware, or train ML. It sits beside Bash in the "script / query" bucket — single-purpose, 50 years steady. Staying perfectly still for half a century is its own kind of skill.
postgres=> SELECT name, country, single FROM wca_results WHERE event_id = '333' AND single IS NOT NULL ORDER BY single LIMIT 5;
| name | country | single |
|---|---|---|
| Yiheng Wang (王艺衡) | CN | 3.05 |
| Max Park | US | 3.13 |
| Xuanyi Geng (耿暄一) | CN | 3.47 |
| Asher Kim-Magierek | US | 3.66 |
| Tymon Kolasiński | PL | 3.79 |
History : Timeline
Fifty-six years of SQL: from a 1970 paper at IBM San Jose, through the Oracle / Sybase / MSSQL commercial wars, the PostgreSQL / MySQL / SQLite open-source troika, the NoSQL revolt, and the Snowflake / BigQuery / ClickHouse / DuckDB analytics resurgence — the language barely moved, the engines underneath rotated entirely.
- 1970·06
Codd publishes the relational model
At IBM San Jose, Edgar F. Codd publishes "A Relational Model of Data for Large Shared Data Banks" (CACM, June 1970). Eleven pages lift data from hierarchical / network models into relational algebra. No SQL yet — but this is its seed.
- 1973—74
System R · SEQUEL is designed
Inside IBM's System R project, Donald Chamberlin and Raymond Boyce design SEQUEL (Structured English Query Language) — a language to let non-mathematicians query relational data. Renamed SQL over a trademark clash. Boyce dies in 1974 at age 35.
- 1979
Oracle V2 — the first commercial RDBMS
Larry Ellison's Relational Software Inc. (later renamed Oracle) ships the first commercial relational database. IBM wrote the paper, Oracle sold the product — a lesson in software commercialisation that echoed for decades.
- 1986
ANSI SQL-86 standard
SQL gets its first formal spec — SQL-86 ratified by ANSI, then ISO the same year. Standardisation only on paper — vendor dialects had already forked; the standard codifies the lowest common denominator.
- 1989
Microsoft SQL Server 1.0 (a Sybase fork)
Microsoft partners with Sybase to port Sybase SQL Server to OS/2; that becomes Microsoft SQL Server. It moves to Windows NT in 1993 and gradually splits from Sybase into its own T-SQL line. It will lock up US enterprise SQL for the next three decades.
- 1992
SQL-92 — the baseline most people still learn
SQL-92 is the most consequential version in history: explicit
JOINsyntax (INNER / LEFT / RIGHT / FULL),VARCHAR,DATETIME, three-valuedNULLlogic all settle here. Even in 2026, most textbooks still teach this version. - 1996·07
PostgreSQL 1.0
Michael Stonebraker's POSTGRES project at UC Berkeley (1986) gains a SQL frontend in 1995 — renamed Postgres95 — and ships as PostgreSQL 1.0 in July 1996. Open source, BSD-style licence. Thirty years later it is the de-facto king of open-source SQL.
- 1995
MySQL — Monty's database
Sweden's Michael "Monty" Widenius names the database after his daughter My. Early MySQL is fast but lacks full transactions / ACID, yet it fuses with PHP through the Web-1.0 years. The "M" in LAMP is this.
- 1999
SQL:1999 — CTEs, triggers, OLAP, regex
A big version: recursive CTEs, triggers, OLAP aggregates (
ROLLUP/CUBE), regex, object types. Vendors had most of it; the standard merely caught up. But recursive CTE was a genuine novelty and defined a whole shape of query. - 2000·08
SQLite 1.0 (D. Richard Hipp)
D. Richard Hipp at Hwaci writes an embedded SQL engine for the US Navy's DDG-79 destroyer. No server, one file per database, public-domain licence. It quietly becomes the most-deployed database on Earth — every phone, every browser, every aircraft avionics box.
- 2003
SQL:2003 — window functions
Window functions (
ROW_NUMBER() OVER ...,RANK,LAG,LEAD,SUM() OVER) enter the standard. This is SQL's single most modernising step: expressing "look at this row plus its neighbours" without breaking relational algebra. XML also joins the spec the same year. - 2008
Sun acquires MySQL · the NoSQL wave
Sun buys MySQL AB for $1B. Two years later Oracle buys Sun. Monty walks and forks MariaDB (2009). Meanwhile NoSQL takes off: MongoDB / Cassandra / Couchbase / Redis — "Schemas are dead" was the slogan.
- 2012
Google Spanner — distributed SQL
Google publishes the Spanner paper: globally distributed, strongly consistent (TrueTime API), SQL frontend. It debunks the "distributed means NoSQL" axiom. CockroachDB / TiDB / YugabyteDB are all the open-source response to this thread.
- 2014
Snowflake — the cloud-warehouse era
Snowflake goes public-cloud GA. Storage / compute separated, billed per second, SQL data warehouse — unlike legacy Teradata / Vertica, it's cloud-native from day one. IPOs at roughly $70B. BigQuery is maturing inside Google in the same window.
- 2016
SQL:2016 + ClickHouse open-sourced
Standards side: SQL:2016 adds JSON path and row-pattern matching. Open-source side: Yandex open-sources ClickHouse — a columnar OLAP engine that will dominate logs, time series and realtime analytics in the years to come.
- 2018
DuckDB 1.0 — the pandas killer
Mark Raasveldt and Hannes Mühleisen at CWI Amsterdam release DuckDB: an embedded columnar analytical SQL engine — SQLite-shaped, but built for analytics. "Replace most pandas workflows with DuckDB" becomes the biggest 2024-2026 consensus shift in data tooling.
- 2020
NoSQL → "let's add SQL back"
The 2009-2014 "schemaless" pitch ages badly — engineers realise "no schema means schema scattered across application code". MongoDB adds an Aggregation Pipeline (SQL-shaped); Cassandra ships CQL; Kafka ships ksqlDB. SQL didn't win — it just refused to die.
- 2023
The LLM era — text → SQL
GPT-4 / Claude / Gemini bring "natural-language to query" close to free. BI tools (Mode, Hex, ChartDB) plug LLMs in everywhere. But "let the LLM write production SQL" is still handled gingerly — schema leakage, wrong joins, runaway cost are all unresolved trade-offs.
- 2026
SQL at 56 — still the default of the data layer
State of the union, 2026: OLTP runs on PostgreSQL; analytics runs on ClickHouse / DuckDB / Snowflake / BigQuery; embedded runs on SQLite; enterprise legacy runs on Oracle / MS SQL. SQL the language never changed; the engines underneath rotated entirely — that is how it survived 56 years.
Language Essentials : SqlAlphabet
Eight cards covering the core structures every SQL writer needs: the declarative essence of SELECT, the four joins, the two-step aggregate, window functions, CTEs, the NULL three-valued-logic trap, transactions, indexes. Master these eight and 90% of daily queries are within reach.
SELECT — the declarative core
SQL is declarative: you state what you want, not how to fetch it. The query optimiser turns SELECT into an execution plan — 50 years of database research piled into that translation.
SELECT name, country, single
FROM wca_results
WHERE event_id = '333'
AND single IS NOT NULL
ORDER BY single ASC
LIMIT 10;JOIN — the meat of relational algebra
Four joins (INNER / LEFT / RIGHT / FULL) plus SQL-92's explicit syntax. A relational database's power is in the join: normalised tables, cross-table queries, every metric ultimately rides on it.
SELECT p.name, c.name AS comp
FROM persons p
JOIN results r ON r.person_id = p.id
JOIN competitions c ON c.id = r.comp_id
WHERE p.country = 'CN';GROUP BY + HAVING
Aggregation in two steps: GROUP BY groups, then HAVING filters those groups (distinct from WHERE, which filters rows). This is the pair beginners conflate most often.
SELECT country, count(*) AS n
FROM persons
GROUP BY country
HAVING count(*) > 1000
ORDER BY n DESC;Window functions — the gift of SQL:2003
OVER (PARTITION BY ... ORDER BY ...) makes "look at self plus neighbours" a one-liner. Ranks, running totals, moving averages, previous/next rows — all collapse to one expression. SQL's single most modernising feature.
SELECT name, event, single,
ROW_NUMBER() OVER (
PARTITION BY event
ORDER BY single ASC
) AS rk
FROM wca_results;CTEs — WITH ... AS
Pulls subqueries into named steps, chains them readably, and supports recursion (SQL:1999). "Five levels of nested subquery" is a code smell — rewrite it as five named CTE blocks.
WITH top10 AS (
SELECT id, single FROM results
ORDER BY single LIMIT 10
)
SELECT p.name, t.single
FROM top10 t
JOIN persons p ON p.id = t.id;NULL — the three-valued logic trap
NULL means "unknown", not "empty". NULL = NULL is not true — it's UNKNOWN. WHERE x = NULL always returns no rows; you must use IS NULL. The single largest source of SQL bugs.
-- Wrong: returns 0 rows
SELECT * FROM t WHERE x = NULL;
-- Correct:
SELECT * FROM t WHERE x IS NULL;Transactions / ACID
SQL databases offer real transactions: BEGIN / COMMIT / ROLLBACK. ACID (atomic / consistent / isolated / durable) is the SQL camp's moat — and the direct counterpoint to early NoSQL's "eventually consistent is good enough".
BEGIN;
UPDATE accounts SET bal = bal - 100
WHERE id = 'A';
UPDATE accounts SET bal = bal + 100
WHERE id = 'B';
COMMIT;Indexes — the actual database magic
B-tree / hash / GIN / BRIN / columnar / bitmap. The same SQL is 1000× faster with the right index. The most valuable thing to learn beyond SQL itself — EXPLAIN is the DBA's stethoscope.
CREATE INDEX idx_results_event_single
ON results (event_id, single)
WHERE single IS NOT NULL;
EXPLAIN ANALYZE
SELECT ... -- see the planSQL itself isn't Turing-complete — but PL/pgSQL is
Pure SQL (SQL-92 core) is not Turing-complete: no loops, no recursion (recursive CTE arrived in SQL:1999 in a restricted form). But every engine ships a procedural extension: PL/pgSQL (PG), T-SQL (MSSQL), PL/SQL (Oracle), SQLite's WITH RECURSIVE — and these are Turing-complete. When you "write SQL", you almost always write some dialect mix of these.
"SQL is the standard; dialects are the reality — and the gap between them is every SQL engineer's daily job."
The Dialect Zoo : DialectMatrix
"I write SQL" — really you write some engine's dialect. Ten major dialects, six features, what each one ships. The SQL standard is the map; engine dialects are the terrain.
| Window fn | CTE | Recursive CTE | JSON | Arrays | JIT | |
|---|---|---|---|---|---|---|
| ANSI SQL:2016 | ✓ | ✓ | ✓ | ✓ | ✓ | — |
| PostgreSQL | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| MySQL 8.0+ | ✓ | ✓ | ✓ | ✓ | — | — |
| MariaDB 10.6+ | ✓ | ✓ | ✓ | ✓ | — | — |
| SQLite | ✓ | ✓ | ✓ | ✓ | — | — |
| MS SQL Server | ✓ | ✓ | ✓ | ✓ | — | — |
| Oracle | ✓ | ✓ | ✓ | partial | — | ✓ |
| BigQuery | ✓ | ✓ | ✓ | ✓ | ✓ | — |
| Snowflake | ✓ | ✓ | ✓ | ✓ | partial | — |
| DuckDB | ✓ | ✓ | ✓ | ✓ | ✓ | — |
| ClickHouse | ✓ | ✓ | partial | ✓ | ✓ | — |
How to read: "yes" = supported in standard syntax; "partial" = supported but with non-standard syntax or limits; "—" = not natively (often emulable via JSON etc.). The JIT column means engine-level LLVM-class compilation (PostgreSQL 11+, Oracle 12c+ ship it). State as of 2024-2026 — vendors are still catching up and rows will shift.
Famous Anti-Patterns : DontDoThis
Fifty years on, SQL anti-patterns are more stable than its good patterns — every new engineer rediscovers them. These six are the ones that make every DBA reach for the off switch.
SELECT *Fine for ad-hoc; fatal in production code, views, or reports. A new column breaks downstream consumers; the optimiser loses column pruning; covering indexes go to waste. Listing columns is the floor.
"For each row, run another query". 1 + N round-trips; at N=10000, latency alone kills you. The single most common ORM-induced bug — fix with one JOIN or a batched WHERE id IN (...).
OR in WHEREWHERE a = 1 OR b = 2 — many optimisers can't use two indexes at once and fall back to a full scan. Rewriting as UNION ALL of two halves, or adding a composite (a, b) index, often unlocks speedup.
WHERE id = '123' when id is INT: the engine often casts the column (cast(id AS TEXT)) and the index dies. Same trap with dates, charsets, collations. Match types literally — don't gamble on the optimiser.
GROUP BY 1, 2, 3Using positional ordinals instead of column names — change the column order and the grouping silently shifts. Painful to review. Fine for one-off ad-hoc; never in production code, reports, or views.
NULL treated as emptyCOUNT(col) ignores NULL but COUNT(*) doesn't. SUM silently skips NULLs, but col + 1 on a NULL gives NULL. Three-valued logic is the most invisible bug source in SQL; explicit COALESCE(col, 0) is the antidote.
OLTP vs OLAP — two workloads : OneSqlTwoWorlds
One SQL, two utterly different engines. OLTP (online transactional) is write-heavy, single-row, millisecond-scale; OLAP (online analytical) is read-heavy, aggregation-heavy, scans gigabytes. The engines diverge wildly, the SQL stays the same — this is what makes SQL genuinely remarkable.
Online Transactional
For applications. "Give me user 1234's cart". A query touches a handful of rows; rate is 10⁴-10⁶ QPS; strong consistency; strict ACID.
- Storage — row-store (rows contiguous)
- Indexes — B-tree
- Representative — PostgreSQL, MySQL, Oracle, MSSQL, SQLite
- Per-query size — rows < 1000
- Latency target — < 10ms
- SQL — SELECT single-table + LIMIT, lots of INSERT/UPDATE
Typical query: SELECT * FROM users WHERE id = ?. An index pinpoints one row — the DB may be TBs but no single query scans the whole thing.
Online Analytical
For analysts, reports, ML features. "Total sales by country, last 30 days". A query scans billions of rows; rate is 10⁰-10² QPS; eventual consistency is acceptable.
- Storage — column-store (columns contiguous)
- Indexes — zone maps / bitmaps / data skipping
- Representative — ClickHouse, Snowflake, BigQuery, DuckDB, Redshift
- Per-query size — rows 10⁶-10¹¹
- Latency target — seconds to minutes
- SQL — SELECT multi-table + heavy GROUP BY, occasional bulk INSERT
Typical query: SELECT country, SUM(amount) FROM sales WHERE date >= ... GROUP BY country. No index helps — but columnar storage means "read only the country and amount columns" costs 1/100 of a row-store scan.
In one line: For OLTP, reach for PostgreSQL / MySQL / SQLite; for OLAP, reach for ClickHouse / DuckDB / Snowflake / BigQuery. The wrong engine costs you 100× in performance; the right one runs the same SQL you already wrote.
Why SQL Won't Die : WhySqlLasts
The past 25 years contain at least four predictions of SQL's imminent replacement: object databases, NoSQL, GraphQL, ORM abstractions. None landed — this section is about why SQL keeps surviving.
"SQL doesn't survive because it's good — it survives because it's good enough and stable enough. For 25 years engines moved from single-node to cloud to columnar to distributed, and the SQL interface barely flinched. Replacing the language costs 100× more than replacing the engine, and the whole data stack knows this.
Declarative — let the optimiser bet for you
You say "give me these rows"; the optimiser decides join order, index choice, memory vs spill. Fifty years of query-plan research live behind that single SELECT — which is precisely why imperative APIs never managed to replace SQL.
-- Same SQL, different engines:
-- PostgreSQL: nested loop + index
-- ClickHouse: parallel column scan
-- DuckDB: vectorized hash join
-- Your SQL doesn't change.The relational model — the only paradigm that lasted
Hierarchical / network / object / document / graph databases all came and went; relational + SQL is the only paradigm intact across 50 years. The reason: it's a mathematical definition (relational algebra), not one product's implementation. After the NoSQL wave receded, the consensus left behind was "SQL is non-optional".
-- Codd 1970: relations + tuples
-- ↓ 50 years
-- 2026: still relations + tuplesOne language, N engines
The same SELECT runs against PostgreSQL (row-store OLTP) / ClickHouse (column-store OLAP) / SQLite (embedded) / Spanner (distributed) — the engine changes, the SQL doesn't. This is SQL's real power as a universal interface: swap engines without rewriting application code.
-- portable across:
-- · PostgreSQL (row OLTP)
-- · ClickHouse (column OLAP)
-- · SQLite (embedded)
-- · DuckDB (analytical)
-- · BigQuery (cloud DW)NoSQL's quiet reversal
The 2009-2014 NoSQL wave promised "death to schemas". A decade later the consensus: no schema doesn't mean no schema — it means the schema is scattered across application code. MongoDB ships an Aggregation Pipeline, Cassandra ships CQL, ksqlDB lets you SELECT against Kafka — SQL didn't win the argument; it just refused to be left behind.
-- MongoDB Aggregation Pipeline
-- looks suspiciously like SQL:
db.x.aggregate([
{ $match: { y: { $gt: 10 } } },
{ $group: { _id: "$z", n: { $sum: 1 } } },
{ $sort: { n: -1 } }
])"ORMs will replace SQL" → "SQL replaced the ORM"
In the 2010s everyone tried Hibernate / ActiveRecord / Django ORM / Eloquent — "finally no more SQL". In the 2020s everyone is writing raw SQL again, just type-checked: sqlc (Go), sqlx (Rust), Drizzle (TS), jOOQ (Java). The tide came in twice; the SQL was still there.
-- 2014 (Hibernate):
session.createCriteria(User.class)...
-- 2026 (Drizzle / sqlc):
-- write the SQL · get typed rows
SELECT id, name FROM users WHERE ...Window functions
Before 2003, "top 3 cubers per country" required a correlated subquery plus self-join — thirty lines of SQL that ran slowly. Window functions collapse it to one line: ROW_NUMBER() OVER (PARTITION BY country ORDER BY single) with WHERE rk <= 3. SQL's single most modernising step.
ROW_NUMBER()— row index, ties get different numbersRANK()— ties get same rank, with gapsDENSE_RANK()— ties get same rank, no gapsLAG / LEAD— previous / next rowSUM() OVER (...)— running totals / moving averagesFIRST_VALUE / LAST_VALUE— window-edge values
-- top-3 333 singles per country
WITH ranked AS (
SELECT
name, country, single,
ROW_NUMBER() OVER (
PARTITION BY country
ORDER BY single ASC
) AS rk
FROM wca_results
WHERE event_id = '333'
AND single IS NOT NULL
)
SELECT * FROM ranked
WHERE rk <= 3
ORDER BY country, rk;The Ten Engines : SqlEngines
SQL's real strength isn't the language — it's the engine diversity. From SQLite at tens of megabytes embedded in every iPhone, to Snowflake and BigQuery spanning multi-region clouds, the same SQL covers ten orders of magnitude of data scale.
2026 toolchain / clients / frameworks
SQL vs Python pandas vs MongoDB : HowItStacks
SQL's real competition isn't another SQL — it's different paradigms for getting at data. Python pandas is imperative dataframes; MongoDB is document queries. Three styles, the same "fetch data" task.
| SQL | Python pandas | MongoDB | |
|---|---|---|---|
| Paradigm | Declarative · relational | Imperative · dataframes | Document + Aggregation Pipeline |
| Origin | IBM · 1974 | Wes McKinney · 2008 | 10gen / MongoDB Inc · 2009 |
| Data structure | Tables (row × column × type) | DataFrame | BSON documents (JSON-like) |
| Schema | Enforced (DDL-declared) | Inferred dynamically | Optional / weak |
| JOIN | First-class · 4 semantics | merge() | $lookup (added late · slow) |
| Aggregation | GROUP BY + HAVING | groupby().agg() | $group |
| Transactions | Native ACID | None (in-process memory) | 4.0+ multi-doc transactions |
| Data scale | bytes → many TB | RAM-bound (~10⁸ rows) | GB → TB · shardable |
| Interop | Every language has a client | Deep in Python ecosystem | JS / Python / Java widely |
| 2026 usage | OLTP + OLAP + embedded + warehouse | Data-science exploration | Logs / IoT / config / cache |
| Trend | Steady · 50-year default | ↘ DuckDB eating into it | Steady · added SQL-ish interface |
Outlook : NextFiftyYears
SQL at 56 won't change its syntax — but the surroundings keep moving. LLMs help humans write queries; distributed SQL reshapes the ops model; DuckDB pulls analytics back to the laptop. SQL stays still, everything around it moves.
LLM-written SQL — still not "production-ready"
Since 2023, LLMs write SQL surprisingly well, and every BI tool (Hex / Mode / ChartDB) plugs one in. But "let the LLM write production SQL unattended" hasn't landed — three blockers: schema injection (context windows can't hold large schemas), hallucinated queries (LLMs invent non-existent tables or columns), and cost (LLMs don't reason about query plans — one bad SQL can melt a warehouse).
State of the art in 2026: "LLM drafts, human reviews" remains the norm. Text-to-SQL models (NSQL, Defog SQLCoder) hit ~85% accuracy on narrow schemas; complex joins / cross-DB / tuned queries still need humans.
Distributed SQL — after Spanner
CockroachDB / TiDB / YugabyteDB open-source the Spanner pattern: distributed strong-consistency plus SQL frontend. The 2026 question isn't "can it speak SQL" — it's "can ops cost match plain Postgres" — and it doesn't quite yet.
DuckDB — embedded analytics returns
Analytics used to demand a Snowflake / BQ warehouse. Now DuckDB chews 100 GB of Parquet on a laptop and runs SQL 10-100× faster than pandas. "Small-data analytics moves back local" is the most under-discussed data-stack shift of 2024-2026.
In one line: SQL doesn't need a "future" — it's already the data layer's default, 56 years deep, and likely good for another 56. New engines speak SQL, new BI tools emit SQL, even NoSQL came back and added SQL. It isn't a winner — it's infrastructure.