// 2023 — 2026 · Modular · Chris Lattner · Python syntax / C speed / MLIR IR

Mojo:AIKernel

Write Python's syntax, run at C's speed, with MLIR under the hood — the goal Chris Lattner set in 2023. Three years on, Mojo runs production inference at frontier AI labs, codegens one kernel to NVIDIA / AMD / Apple Silicon, and is the first credible CUDA challenger since OpenCL.

2023Unveiled May 2
Modular Keynote

35k×matmul vs Python
launch demo · same hw

3 archNVIDIA / AMD / Apple
one kernel · MLIR codegen

→1.0Pre-1.0 · pre-open
magic · stable stdlib

fn matmulSIMD[f32, 8]@parameterborrow / inout@valuealias N = 1024MLIRdef vs fn@always_inlinestruct TensorPTX · ROCmfrom python import

scroll

What is `Mojo`

Mojo is an AI systems language released in 2023 by Modular (Chris Lattner). The design goal is blunt: weld Python-grade syntax onto C/Rust-grade performance, with MLIR under the hood — the same AI compiler IR Lattner led at Google in 2017.

Python superset (goal) syntax

Syntactically Mojo aims to be a Python superset: def, indentation, import are nearly identical. "Superset" is the roadmap, not today's exact state — a few Python corner cases still don't run.

MLIR native compiler

The compiler's IR is MLIR directly — not LLVM IR. Multi-level dialects, extensible, the same program lowers to CPU / GPU / TPU. This is the deepest structural gap between Mojo and every other "speed up Python" project.

Ownership · no GC memory

Three parameter conventions — borrow / inout / owned — no GC, structs default to value types. Rust's memory model without writing lifetime annotations — annotations plus inference do the work.

SIMD / GPU first-class parallel

SIMD is a type, not an intrinsic; GPU codegen flows through MLIR → PTX (NVIDIA) / ROCm (AMD). You don't "call CUDA from Mojo" — what you write IS the kernel.

matmul.pyPure Python

def matmul(a, b, c, n):
    for i in range(n):
        for j in range(n):
            for k in range(n):
                c[i][j] += a[i][k] * b[k][j]

# 1024×1024 matrix · minutes-class runtime
# interpreted + refcounted + everything boxed

matmul.mojoMojo

fn matmul(inout c: Matrix,
           a: Matrix, b: Matrix):
    alias nelts = simdwidthof[DType.float32]()
    for i in range(c.rows):
        for k in range(a.cols):
            @parameter
            fn v[w: Int](j: Int):
                c.store[w](i, j,
                  c.load[w](i, j)
                  + a[i,k] * b.load[w](k, j))
            vectorize[v, nelts](c.cols)

# Modular's number: ~35,000× over interpreted Python

History `: Timeline`

Mojo didn't appear from nowhere in 2023 — it is Lattner's third language, on a line that runs from his 2003 LLVM thesis through LLVM, Clang, Swift (/code/swift), Tesla, Google Brain's MLIR, and finally lands at Modular.

2003
Chris Lattner submits the LLVM thesis
His UIUC master's thesis: "LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation". The infra it laid down became the substrate for the next two decades of Lattner's work — Clang, Swift, MLIR, and finally Mojo. The origin of the whole line.
2014·06
WWDC: Swift ships
The "modern Objective-C replacement" Lattner led inside Apple goes public (/code/swift). Swift becomes Apple's full-stack language. Lattner stays through 2017. It is his second language to reshape an industry; Mojo is the third.
2017·01
Lattner → Tesla → Google Brain
January: he joins Tesla as VP of Autopilot, then resigns six months later. Autumn: he lands at Google Brain on TPU compiler infrastructure — where MLIR is born. MLIR is the direct technical seed of Mojo: multi-level IR, extensible dialects, designed for heterogeneous hardware.
2022·01
Modular is founded
Lattner and ex-Google ML-compiler lead Tim Davis co-found Modular. The mission is bluntly simple: "AI shouldn't be locked into CUDA + Python." Early engineers come from Google, Apple and SiFive's compiler crowd.
2023·05·02
Mojo unveiled at Modular's keynote
Three-line pitch: Python syntax, C-class speed, MLIR for the IR. The first demo runs a naive Python matmul 35,000× faster on the same hardware — the number stuns the room. The AI world learns the name overnight.
2023·09
Mojo SDK 0.1 — first downloadable build
The 0.1 SDK ships in September, Linux only, behind a license waitlist. The language surface is still volatile, but you can finally run it locally. The early crowd is ML engineers plus compiler hobbyists.
2024·01
macOS support
Mojo lands on Apple Silicon. M-series chips are sweet-spot targets for LLVM/MLIR codegen; "runs on a Mac" instantly doubles the developer hardware base. The first wave of non-Linux issues hits the tracker.
2024·03·29
Standard library open-sourced (Apache 2.0)
March 29: Modular open-sources the stdlib on modularml/mojo. The compiler stays closed for now — same playbook as early Swift: hand the library to the community first, decide on the toolchain later. ~100 outside PRs land within a week.
2024·08
Mojo 24.4 — ownership overhaul
The Rust-flavoured ownership story gets a thorough rework: borrow / inout / owned become parameter conventions, explicitly annotated rather than inferred. The Reference[T] type lines up at the same time. The community senses the language is "starting to set."
2024·09
GPU kernels land — H100 / A100
The first NVIDIA H100 / A100 codegen ships, via the MLIR → PTX pipeline. This is the first real proof that you can write GPU kernels without CUDA C++. Triton (OpenAI) had been doing it via Python AST, but the route is different.
2025·02
MAX 24.6 — production inference
Modular's MAX inference engine wires Mojo kernels into production at multiple AI startups (Replit and Together AI have said so publicly). The "research-stage language" label starts peeling off.
2025·09
AMD GPU support — via ROCm/MLIR
Mojo gains AMD GPU support through a ROCm/MLIR backend. It matters: this is the first credible CUDA alternative since OpenCL. One kernel source, two vendors — the first real crack in the CUDA monopoly.
2025·11
Approaching 1.0 — package manager + stable stdlib
The new magic package manager ships; the stdlib enters its "breaking changes need an RFC" phase; the doc generator lands. The "still being torn up" feel fades; 1.0 is in sight.
2026
Three years in — production inference at frontier labs
Mojo in 2026: frontier AI labs use MAX + Mojo for production inference; the "Python is slow" pitch holds — Modular's matmul / softmax numbers are reproducible; the ecosystem is still tiny next to PyTorch-native. Hardware-portable AI kernels are the real 2026 battleground, with Mojo / Triton / CUDA / JAX-XLA all competing.

Language Essentials `: MojoAlphabet`

The eight cards below are where Mojo differs hardest from the other 11 languages on this site: def vs fn, ownership annotations, @parameter, SIMD types, @value, struct vs class, alias, perf annotations. The ninth covers the Python-superset story today.

`def` vs `fn`

Mojo carries both Python-loose def and strict typed fn. One file can mix: prototype with def like Python, then switch the hot path to fn for compile-time checks.

def loose(x):
    return x * 2

fn strict(x: Int) -> Int:
    return x * 2

`borrow` / `inout` / `owned`

Rust-flavoured ownership, but explicitly annotated: default is borrow (read-only ref), inout for mutable borrows, owned for transfer. None of Rust's lifetime annotation pain, but the semantics stay clear.

fn peek(borrow s: String):
    print(s)

fn grow(inout s: String):
    s += "!"

fn eat(owned s: String): ...

`@parameter` — compile-time programming

One @parameter annotation covers generics, conditional compilation, and loop unrolling. Runtime and compile-time code read the same; MLIR decides when to specialise.

fn repeat[@parameter n: Int]()
    @parameter
    for i in range(n):
        print(i)  # unrolled at compile time

`SIMD[T, n]` as a first-class type

SIMD is a type, not an intrinsic. SIMD[DType.float32, 8] is a parametric vector; arithmetic auto-parallelises — reads like scalar, runs as AVX/NEON.

var a = SIMD[DType.float32, 8](1.0)
var b = SIMD[DType.float32, 8](2.0)
var c = a * b + a   # 8-wide FMA

`@value` — auto-derived methods

Tag a struct with @value and copy / move / init / del are generated for you. The equivalent of Rust's derive(Copy, Clone) — one line saves 30 of boilerplate.

@value
struct Point:
    var x: Float64
    var y: Float64

# __init__ / __copyinit__ / __moveinit__ all derived

`struct` vs `class`

Mojo prefers struct (value types): stack-allocated, ownership-aware. class (reference type, GC-flavoured) is deferred for now — a deliberate trade: nail numeric / systems code first.

struct Vec3:
    var x: Float32
    var y: Float32
    var z: Float32

# class { ... }  # not stable yet, on roadmap

`alias` — compile-time constants

One keyword for every compile-time bound value. The three things C++ splits across #define, const and constexpr are all just alias here.

alias WIDTH: Int = 8
alias F32x8 = SIMD[DType.float32, WIDTH]

var v: F32x8 = 0

`@always_inline` and friends

Mojo doesn't gamble on LLVM heuristics; it gives programmers explicit knobs: @always_inline, @noinline, @register_passable. Performance becomes predictable — no more "regressed when the compiler upgraded".

@always_inline
fn dot(a: F32x8, b: F32x8) -> Float32:
    return (a * b).reduce_add()

∞

Python superset — how far it actually goes today

"Python superset" is Modular's public slogan, but in 2026 the reality is: most Python runs, corners don't. Working: def, list/dict literals, indentation, import for third-party packages (over the GIL). Not working: full metaclass machinery, exec / eval dynamic bytecode, parts of the dunder protocol. "Use Mojo as Python" largely works — just don't expect 100%.

"Python compatibility is a roadmap, not a finished checklist — Modular itself is explicit about this."

Why Mojo `: WhyMojo`

Mojo isn't out to replace Python for scripting, or Rust for OS work. It targets the gap nobody filled in the last 15 years: AI kernels that are both fast and portable, without forcing ML engineers to drop down into CUDA C++.

⊹

No more pybind11

Speeding up Python used to mean C/C++ extensions, pybind11 wrangling, and GIL accounting — three languages and three build systems. Mojo just imports any Python package and lets you write fn hot paths in the same file. The FFI ceremony is gone.

from python import Python
var np = Python.import_module("numpy")
var arr = np.array([1,2,3])

⌬

MLIR-native, not bolted on

Most languages bolt "AI acceleration" onto the toolchain (TorchScript, JAX trace, TF graph). Mojo inverts that — MLIR is the IR itself. Multi-level dialects, extensible, the same program lowers to CPU / GPU / TPU back ends without language-level changes.

# Mojo source → MLIR → LLVM IR → CPU
#                  → PTX     → NVIDIA
#                  → ROCm    → AMD

⌖

Hardware-portable — one kernel, many chips

The same Mojo kernel codegens to CPU, NVIDIA, AMD GPU, and Apple Silicon. The CUDA-era world of "one kernel, one vendor" is loosening. This is the biggest contested ground in 2026's AI infrastructure.

# mojo build matmul.mojo --target=cuda
# mojo build matmul.mojo --target=rocm
# mojo build matmul.mojo --target=cpu

⌘

Lattner's track record

2003 LLVM → 2007 Clang → 2014 Swift → 2017 MLIR → 2023 Mojo. Every project Lattner has shipped became industry infrastructure. Whether Mojo joins them, time will tell — but the resume is the strongest signal developers bet on.

# LLVM     · 2003 — every modern compiler
# Clang    · 2007 — C/C++/ObjC frontend
# Swift    · 2014 — Apple full stack
# MLIR     · 2017 — AI compiler IR
# Mojo     · 2023 — ?

⚛

The first credible crack in CUDA's monopoly

For 15 years, GPU programming meant CUDA C++. Mojo plus its ROCm back end is the first credible CUDA challenger since OpenCL: open kernels, open IR, open competition. Not "kill CUDA," but "finally give an alternative a real path".

# single kernel source · NVIDIA + AMD
# open IR · open stdlib · Apache 2.0

Who's Using `: ProductionUsers`

Mojo is young; the list is far shorter than Python's — but every entry is a real user, none invented. Modular's own MAX is the biggest; Replit and Together AI are publicly named AI platforms; the rest are robotics / quant-finance / drug-discovery shops Modular has called out on its blog.

Modular MAX

In-house inference engine · home of Mojo

Replit

Code execution + AI inference

Together AI

Inference cloud · Mojo-written kernels

MAX Kernels

matmul / softmax / attention

AI Robotics

Edge inference · realtime control

Quant Finance

Low-latency numeric kernels

Drug Discovery

Molecular simulation · GPU backend

Open stdlib

Apache 2.0 · community PRs

Edge Inference

IoT / mobile inference

Research Labs

Compiler / ML-systems groups

The AI Era `: Built For AI`

This is the heart of the page: Mojo is one of the very few languages designed for the AI era from day one, rather than retrofitted. PyTorch / vLLM / TensorRT-LLM are upper-layer frameworks; Mojo stands beside them at the kernel layer, not as a replacement.

"
For fifteen years the AI stack has been pinned to a single thread: Python calling CUDA C++. Algorithm engineers write Python, performance engineers write CUDA, with a wall between them. We're not building Mojo to make Python faster — we want the same person, in the same language, to write both the algorithm and the kernel.
— Chris LattnerModular CEO · LLVM / Swift / MLIR / Mojo · paraphrased from interviews + keynotes

35k×

matmul vs Python · same hardware

Modular's launch-demo number: a 1024×1024 float32 matmul, Mojo with SIMD + tiling + parallelize vs three nested Python for loops, on the same Intel Xeon ≈ 35,000×. Narrowly defined: one kernel, same hardware, interpreted Python baseline — those are the real bounds.

~7×

softmax vs PyTorch CUDA

A more realistic comparison: Modular's fused softmax on H100 is roughly 7× faster than PyTorch eager and on par with Triton (OpenAI). Numbers shift across kernels and hardware, but the conclusion "on par with hand-written CUDA" is stable.

3 targets

One kernel, multiple vendors

One .mojo file targets NVIDIA (PTX), AMD (ROCm), Apple Silicon. The first credible path since OpenCL. CUDA's monopoly isn't broken — it's cracked for the first time, and what grows from here is worth tracking.

SPOTLIGHT

SIMD + GPU — one kernel, every silicon target

What a Mojo kernel looks like: SIMD is a type, GPU is a back end. You write SIMD[DType.float32, 8]; the compiler, given a target, lowers it to AVX-512 / NEON / PTX / ROCm. One source, no abstraction tax, no performance lost.

SIMD type — parametric vector, arithmetic auto-parallel
GPU codegen — MLIR → PTX (NVIDIA) / ROCm (AMD)
Apple Silicon — M-series NEON + Metal compute
No CUDA C++ — No third language, no second build system

Compared with Triton (OpenAI): Triton uses a Python AST + JIT, GPU-only; Mojo is a standalone language covering CPU / GPU / edge. The two coexist; they don't substitute.

# one kernel · NVIDIA / AMD / CPU
from tensor import Tensor
from algorithm import vectorize, parallelize

fn softmax(inout x: Tensor[DType.float32]):
    alias nelts = simdwidthof[DType.float32]()

    @parameter
    fn row(i: Int):
        var m = x.row_max(i)
        var s: Float32 = 0
        @parameter
        fn v[w: Int](j: Int):
            var e = exp(x.load[w](i,j) - m)
            x.store[w](i, j, e)
            s += e.reduce_add()
        vectorize[v, nelts](x.cols)
        scale_row(x, i, 1/s)

    parallelize[row](x.rows)

# build: mojo build softmax.mojo --target=cuda
#        mojo build softmax.mojo --target=rocm

2026 toolchain / backends / surroundings

MAX Engine

Modular inference runtime

Mojo stdlib

Open source · Apache 2.0

magic

Official package manager

MLIR

Underlying IR · LLVM family

Mojo Playground

In-browser sandbox

CUDA backend

NVIDIA H100 / A100

ROCm backend

AMD GPU · 2025-09

Apple Silicon

M-series codegen

Python interop

Import any PyPI package

PyTorch bridge

Tensor interop

NumPy bridge

Shared array buffers

Triton (cmp)

OpenAI · GPU-kernel rival

AI ERA

Counter-intuitive: AIs write Mojo worse than older languages

An interesting paradox: Mojo is an AI-era language, yet LLMs write it less reliably than Python / Java / Rust. The cause is simple — less training data. Public Mojo on GitHub in 2026 is still a few thousand repos, four orders of magnitude below Java.

Actual workflow: "AI writes the Python prototype → human translates to a Mojo kernel" is still dominant. Modular ships its own "AI-assisted Mojo authoring" tooling — feeding the model the stdlib + docs — but in 2026 we're still far from "let the AI write kernels for you".

Ironic but expected: every new language pays a "training-data cold-start tax". Rust paid it early; Zig still pays it; Mojo can't dodge it either. Every early PR, blog post and public kernel is teaching the models this language.

# Status today: AI writes Python prototype → human ports to Mojo

# Python (AI-friendly)
def attention(q, k, v):
    scores = q @ k.T / sqrt(dim)
    return softmax(scores) @ v

# Mojo (human-translated · kernel-class speed)
fn attention(borrow q: Tensor,
              borrow k: Tensor,
              borrow v: Tensor) -> Tensor:
    # SIMD-packed matmul + fused softmax
    # vectorize / parallelize / tile
    ...

In one line: Mojo isn't "Python glue" and isn't a "CUDA replacement" slogan — it's the first real language that puts the algorithm and the GPU kernel in one place. In 2026 it's young, the ecosystem is small, and the AIs still struggle with it — but the architecture is right: MLIR + Lattner's track record + real production users.

vs Python / Swift `: Mojo vs Python vs Swift`

Versus Python: Mojo is Python's acceleration off-ramp, not a replacement. Cross-link /code/python. Versus Swift (/code/swift): same designer (Chris Lattner), but Swift targets app developers and Mojo targets ML compiler engineers — one person, two completely different audiences.

	Python	Mojo	Swift
Origin	Guido · 1991	Modular · 2023	Apple · 2014
Designer	Guido van Rossum	Chris Lattner	Chris Lattner
Primary audience	Scripts · data · AI algorithms	AI kernels · compiler engineers	iOS/macOS app developers
Syntax	Python itself	Python superset (in progress)	Own syntax · ML-flavoured
Performance	Interpreted (CPython)	C / Rust class · MLIR codegen	C-class · LLVM
Memory model	GC + refcount	borrow / inout / owned	ARC + value types
GPU	CUDA C++ via PyTorch	Native · NVIDIA + AMD + Apple	Metal · Apple GPU only
SIMD	NumPy abstraction, outside the language	`SIMD[T, n]` first-class type	`SIMD[N]` · stdlib
Compile-time programming	None (dynamic language)	`@parameter` · shared with generics	Yes (associated types / macros)
Interop	Everything · pip ecosystem	Native Python import · across GIL	C direct · ObjC bridge
Ecosystem maturity	35 years · the largest of all	3 years · early (~10³ public repos)	11 years · Apple-saturated
Open source	Fully · PSF	stdlib yes · compiler closed (open before 1.0)	Fully · Apache 2.0

Outlook `: TheRoadAhead`

Mojo in 2026 sits on the eve of 1.0 — open-sourcing the compiler is the final big gate. Deeper NumPy / PyTorch ABI interop is on the way; Apple's own MLX is a same-niche competitor. Whether Mojo escapes AI to become a general systems language is an open question.

HOT · 2026+

Open-sourcing the compiler — the last gate on the way to 1.0

The stdlib opened in 2024; the compiler is still closed. Community pressure to fork, embed and audit codegen has been steady. Modular has publicly committed to "open it before 1.0" — the cadence mirrors early Swift's exactly.

What it unlocks: this is the final gate between "an interesting Modular product" and "a real industrial language." Only after open-source do you get third-party compilers, teaching distros, and a cross-vendor RFC process — the Rust pattern.

Compiler closed (today, 2026)~1%

Compiler open (post-1.0)100%

INTEROP

Deeper NumPy / PyTorch ABI

Today's Python interop crosses the GIL via Python.import_module; data has to round-trip. The next step is shared buffers / zero-copy tensors — operating on a PyTorch Tensor as a Mojo struct without moving the bytes. The path from "fast but isolated" to "fast and seamless."

MLX

Apple MLX — direct competition

Apple shipped MLX in 2023: NumPy-style, Apple-Silicon-tuned, LLVM/MLIR-flavoured. On macOS, Mojo competes with Apple's first-party stack. The very line Lattner walked away from, Apple has now picked up — he sees this competitor more clearly than anyone.

GENERAL

A general-purpose systems language?

Mojo in 2026 is positioned for AI, but the language itself is general — struct + ownership + SIMD + MLIR has no "AI-only" baked in. Can it leave the AI niche and challenge Rust for general systems work? Depends on where the post-1.0 community pushes it. An open question, not a roadmap item.

Mojo:AIKernel

Chris Lattner submits the LLVM thesis

WWDC: Swift ships

Lattner → Tesla → Google Brain

Modular is founded

Mojo unveiled at Modular's keynote

Mojo SDK 0.1 — first downloadable build

macOS support

Standard library open-sourced (Apache 2.0)

Mojo 24.4 — ownership overhaul

GPU kernels land — H100 / A100

MAX 24.6 — production inference

AMD GPU support — via ROCm/MLIR

Approaching 1.0 — package manager + stable stdlib

Three years in — production inference at frontier labs

def vs fn

borrow / inout / owned

@parameter — compile-time programming

SIMD[T, n] as a first-class type

@value — auto-derived methods

struct vs class

alias — compile-time constants

@always_inline and friends