// 2023 — 2026 · Modular · Chris Lattner · Python syntax / C speed / MLIR IR

Mojo:AIKernel

Write Python's syntax, run at C's speed, with MLIR under the hood — the goal Chris Lattner set in 2023. Three years on, Mojo runs production inference at frontier AI labs, codegens one kernel to NVIDIA / AMD / Apple Silicon, and is the first credible CUDA challenger since OpenCL.

2023Unveiled May 2
Modular Keynote
35k×matmul vs Python
launch demo · same hw
3 archNVIDIA / AMD / Apple
one kernel · MLIR codegen
→1.0Pre-1.0 · pre-open
magic · stable stdlib
fn matmulSIMD[f32, 8]@parameterborrow / inout@valuealias N = 1024MLIRdef vs fn@always_inlinestruct TensorPTX · ROCmfrom python import
scroll
01

What is Mojo

Mojo is an AI systems language released in 2023 by Modular (Chris Lattner). The design goal is blunt: weld Python-grade syntax onto C/Rust-grade performance, with MLIR under the hood — the same AI compiler IR Lattner led at Google in 2017.

Python superset (goal) syntax

Syntactically Mojo aims to be a Python superset: def, indentation, import are nearly identical. "Superset" is the roadmap, not today's exact state — a few Python corner cases still don't run.

MLIR native compiler

The compiler's IR is MLIR directly — not LLVM IR. Multi-level dialects, extensible, the same program lowers to CPU / GPU / TPU. This is the deepest structural gap between Mojo and every other "speed up Python" project.

Ownership · no GC memory

Three parameter conventions — borrow / inout / owned — no GC, structs default to value types. Rust's memory model without writing lifetime annotations — annotations plus inference do the work.

SIMD / GPU first-class parallel

SIMD is a type, not an intrinsic; GPU codegen flows through MLIR → PTX (NVIDIA) / ROCm (AMD). You don't "call CUDA from Mojo" — what you write IS the kernel.

matmul.pyPure Python
def matmul(a, b, c, n):
    for i in range(n):
        for j in range(n):
            for k in range(n):
                c[i][j] += a[i][k] * b[k][j]

# 1024×1024 matrix · minutes-class runtime
# interpreted + refcounted + everything boxed
matmul.mojoMojo
fn matmul(inout c: Matrix,
           a: Matrix, b: Matrix):
    alias nelts = simdwidthof[DType.float32]()
    for i in range(c.rows):
        for k in range(a.cols):
            @parameter
            fn v[w: Int](j: Int):
                c.store[w](i, j,
                  c.load[w](i, j)
                  + a[i,k] * b.load[w](k, j))
            vectorize[v, nelts](c.cols)

# Modular's number: ~35,000× over interpreted Python
02

History : Timeline

Mojo didn't appear from nowhere in 2023 — it is Lattner's third language, on a line that runs from his 2003 LLVM thesis through LLVM, Clang, Swift (/code/swift), Tesla, Google Brain's MLIR, and finally lands at Modular.

  1. 2003

    Chris Lattner submits the LLVM thesis

    His UIUC master's thesis: "LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation". The infra it laid down became the substrate for the next two decades of Lattner's work — Clang, Swift, MLIR, and finally Mojo. The origin of the whole line.

  2. 2014·06

    WWDC: Swift ships

    The "modern Objective-C replacement" Lattner led inside Apple goes public (/code/swift). Swift becomes Apple's full-stack language. Lattner stays through 2017. It is his second language to reshape an industry; Mojo is the third.

  3. 2017·01

    Lattner → Tesla → Google Brain

    January: he joins Tesla as VP of Autopilot, then resigns six months later. Autumn: he lands at Google Brain on TPU compiler infrastructure — where MLIR is born. MLIR is the direct technical seed of Mojo: multi-level IR, extensible dialects, designed for heterogeneous hardware.

  4. 2022·01

    Modular is founded

    Lattner and ex-Google ML-compiler lead Tim Davis co-found Modular. The mission is bluntly simple: "AI shouldn't be locked into CUDA + Python." Early engineers come from Google, Apple and SiFive's compiler crowd.

  5. 2023·05·02

    Mojo unveiled at Modular's keynote

    Three-line pitch: Python syntax, C-class speed, MLIR for the IR. The first demo runs a naive Python matmul 35,000× faster on the same hardware — the number stuns the room. The AI world learns the name overnight.

  6. 2023·09

    Mojo SDK 0.1 — first downloadable build

    The 0.1 SDK ships in September, Linux only, behind a license waitlist. The language surface is still volatile, but you can finally run it locally. The early crowd is ML engineers plus compiler hobbyists.

  7. 2024·01

    macOS support

    Mojo lands on Apple Silicon. M-series chips are sweet-spot targets for LLVM/MLIR codegen; "runs on a Mac" instantly doubles the developer hardware base. The first wave of non-Linux issues hits the tracker.

  8. 2024·03·29

    Standard library open-sourced (Apache 2.0)

    March 29: Modular open-sources the stdlib on modularml/mojo. The compiler stays closed for now — same playbook as early Swift: hand the library to the community first, decide on the toolchain later. ~100 outside PRs land within a week.

  9. 2024·08

    Mojo 24.4 — ownership overhaul

    The Rust-flavoured ownership story gets a thorough rework: borrow / inout / owned become parameter conventions, explicitly annotated rather than inferred. The Reference[T] type lines up at the same time. The community senses the language is "starting to set."

  10. 2024·09

    GPU kernels land — H100 / A100

    The first NVIDIA H100 / A100 codegen ships, via the MLIR → PTX pipeline. This is the first real proof that you can write GPU kernels without CUDA C++. Triton (OpenAI) had been doing it via Python AST, but the route is different.

  11. 2025·02

    MAX 24.6 — production inference

    Modular's MAX inference engine wires Mojo kernels into production at multiple AI startups (Replit and Together AI have said so publicly). The "research-stage language" label starts peeling off.

  12. 2025·09

    AMD GPU support — via ROCm/MLIR

    Mojo gains AMD GPU support through a ROCm/MLIR backend. It matters: this is the first credible CUDA alternative since OpenCL. One kernel source, two vendors — the first real crack in the CUDA monopoly.

  13. 2025·11

    Approaching 1.0 — package manager + stable stdlib

    The new magic package manager ships; the stdlib enters its "breaking changes need an RFC" phase; the doc generator lands. The "still being torn up" feel fades; 1.0 is in sight.

  14. 2026

    Three years in — production inference at frontier labs

    Mojo in 2026: frontier AI labs use MAX + Mojo for production inference; the "Python is slow" pitch holds — Modular's matmul / softmax numbers are reproducible; the ecosystem is still tiny next to PyTorch-native. Hardware-portable AI kernels are the real 2026 battleground, with Mojo / Triton / CUDA / JAX-XLA all competing.

03

Language Essentials : MojoAlphabet

The eight cards below are where Mojo differs hardest from the other 11 languages on this site: def vs fn, ownership annotations, @parameter, SIMD types, @value, struct vs class, alias, perf annotations. The ninth covers the Python-superset story today.

A

def vs fn

Mojo carries both Python-loose def and strict typed fn. One file can mix: prototype with def like Python, then switch the hot path to fn for compile-time checks.

def loose(x):
    return x * 2

fn strict(x: Int) -> Int:
    return x * 2
B

borrow / inout / owned

Rust-flavoured ownership, but explicitly annotated: default is borrow (read-only ref), inout for mutable borrows, owned for transfer. None of Rust's lifetime annotation pain, but the semantics stay clear.

fn peek(borrow s: String):
    print(s)

fn grow(inout s: String):
    s += "!"

fn eat(owned s: String): ...
C

@parameter — compile-time programming

One @parameter annotation covers generics, conditional compilation, and loop unrolling. Runtime and compile-time code read the same; MLIR decides when to specialise.

fn repeat[@parameter n: Int]()
    @parameter
    for i in range(n):
        print(i)  # unrolled at compile time
D

SIMD[T, n] as a first-class type

SIMD is a type, not an intrinsic. SIMD[DType.float32, 8] is a parametric vector; arithmetic auto-parallelises — reads like scalar, runs as AVX/NEON.

var a = SIMD[DType.float32, 8](1.0)
var b = SIMD[DType.float32, 8](2.0)
var c = a * b + a   # 8-wide FMA
E

@value — auto-derived methods

Tag a struct with @value and copy / move / init / del are generated for you. The equivalent of Rust's derive(Copy, Clone) — one line saves 30 of boilerplate.

@value
struct Point:
    var x: Float64
    var y: Float64

# __init__ / __copyinit__ / __moveinit__ all derived
F

struct vs class

Mojo prefers struct (value types): stack-allocated, ownership-aware. class (reference type, GC-flavoured) is deferred for now — a deliberate trade: nail numeric / systems code first.

struct Vec3:
    var x: Float32
    var y: Float32
    var z: Float32

# class { ... }  # not stable yet, on roadmap
G

alias — compile-time constants

One keyword for every compile-time bound value. The three things C++ splits across #define, const and constexpr are all just alias here.

alias WIDTH: Int = 8
alias F32x8 = SIMD[DType.float32, WIDTH]

var v: F32x8 = 0
H

@always_inline and friends

Mojo doesn't gamble on LLVM heuristics; it gives programmers explicit knobs: @always_inline, @noinline, @register_passable. Performance becomes predictable — no more "regressed when the compiler upgraded".

@always_inline
fn dot(a: F32x8, b: F32x8) -> Float32:
    return (a * b).reduce_add()

Python superset — how far it actually goes today

"Python superset" is Modular's public slogan, but in 2026 the reality is: most Python runs, corners don't. Working: def, list/dict literals, indentation, import for third-party packages (over the GIL). Not working: full metaclass machinery, exec / eval dynamic bytecode, parts of the dunder protocol. "Use Mojo as Python" largely works — just don't expect 100%.

"Python compatibility is a roadmap, not a finished checklist — Modular itself is explicit about this."

04

Why Mojo : WhyMojo

Mojo isn't out to replace Python for scripting, or Rust for OS work. It targets the gap nobody filled in the last 15 years: AI kernels that are both fast and portable, without forcing ML engineers to drop down into CUDA C++.

No more pybind11

Speeding up Python used to mean C/C++ extensions, pybind11 wrangling, and GIL accounting — three languages and three build systems. Mojo just imports any Python package and lets you write fn hot paths in the same file. The FFI ceremony is gone.

from python import Python
var np = Python.import_module("numpy")
var arr = np.array([1,2,3])

MLIR-native, not bolted on

Most languages bolt "AI acceleration" onto the toolchain (TorchScript, JAX trace, TF graph). Mojo inverts that — MLIR is the IR itself. Multi-level dialects, extensible, the same program lowers to CPU / GPU / TPU back ends without language-level changes.

# Mojo source → MLIR → LLVM IR → CPU
#                  → PTX     → NVIDIA
#                  → ROCm    → AMD

Hardware-portable — one kernel, many chips

The same Mojo kernel codegens to CPU, NVIDIA, AMD GPU, and Apple Silicon. The CUDA-era world of "one kernel, one vendor" is loosening. This is the biggest contested ground in 2026's AI infrastructure.

# mojo build matmul.mojo --target=cuda
# mojo build matmul.mojo --target=rocm
# mojo build matmul.mojo --target=cpu

Lattner's track record

2003 LLVM → 2007 Clang → 2014 Swift → 2017 MLIR → 2023 Mojo. Every project Lattner has shipped became industry infrastructure. Whether Mojo joins them, time will tell — but the resume is the strongest signal developers bet on.

# LLVM     · 2003 — every modern compiler
# Clang    · 2007 — C/C++/ObjC frontend
# Swift    · 2014 — Apple full stack
# MLIR     · 2017 — AI compiler IR
# Mojo     · 2023 — ?

The first credible crack in CUDA's monopoly

For 15 years, GPU programming meant CUDA C++. Mojo plus its ROCm back end is the first credible CUDA challenger since OpenCL: open kernels, open IR, open competition. Not "kill CUDA," but "finally give an alternative a real path".

# single kernel source · NVIDIA + AMD
# open IR · open stdlib · Apache 2.0
05

Who's Using : ProductionUsers

Mojo is young; the list is far shorter than Python's — but every entry is a real user, none invented. Modular's own MAX is the biggest; Replit and Together AI are publicly named AI platforms; the rest are robotics / quant-finance / drug-discovery shops Modular has called out on its blog.

06

The AI Era : Built For AI

This is the heart of the page: Mojo is one of the very few languages designed for the AI era from day one, rather than retrofitted. PyTorch / vLLM / TensorRT-LLM are upper-layer frameworks; Mojo stands beside them at the kernel layer, not as a replacement.

"

For fifteen years the AI stack has been pinned to a single thread: Python calling CUDA C++. Algorithm engineers write Python, performance engineers write CUDA, with a wall between them. We're not building Mojo to make Python faster — we want the same person, in the same language, to write both the algorithm and the kernel.

— Chris LattnerModular CEO · LLVM / Swift / MLIR / Mojo · paraphrased from interviews + keynotes
35k×
matmul vs Python · same hardware

Modular's launch-demo number: a 1024×1024 float32 matmul, Mojo with SIMD + tiling + parallelize vs three nested Python for loops, on the same Intel Xeon ≈ 35,000×. Narrowly defined: one kernel, same hardware, interpreted Python baseline — those are the real bounds.

~7×
softmax vs PyTorch CUDA

A more realistic comparison: Modular's fused softmax on H100 is roughly 7× faster than PyTorch eager and on par with Triton (OpenAI). Numbers shift across kernels and hardware, but the conclusion "on par with hand-written CUDA" is stable.

3 targets
One kernel, multiple vendors

One .mojo file targets NVIDIA (PTX), AMD (ROCm), Apple Silicon. The first credible path since OpenCL. CUDA's monopoly isn't broken — it's cracked for the first time, and what grows from here is worth tracking.

SPOTLIGHT

SIMD + GPU one kernel, every silicon target

What a Mojo kernel looks like: SIMD is a type, GPU is a back end. You write SIMD[DType.float32, 8]; the compiler, given a target, lowers it to AVX-512 / NEON / PTX / ROCm. One source, no abstraction tax, no performance lost.

  • SIMD typeparametric vector, arithmetic auto-parallel
  • GPU codegenMLIR → PTX (NVIDIA) / ROCm (AMD)
  • Apple SiliconM-series NEON + Metal compute
  • No CUDA C++No third language, no second build system

Compared with Triton (OpenAI): Triton uses a Python AST + JIT, GPU-only; Mojo is a standalone language covering CPU / GPU / edge. The two coexist; they don't substitute.

# one kernel · NVIDIA / AMD / CPU
from tensor import Tensor
from algorithm import vectorize, parallelize

fn softmax(inout x: Tensor[DType.float32]):
    alias nelts = simdwidthof[DType.float32]()

    @parameter
    fn row(i: Int):
        var m = x.row_max(i)
        var s: Float32 = 0
        @parameter
        fn v[w: Int](j: Int):
            var e = exp(x.load[w](i,j) - m)
            x.store[w](i, j, e)
            s += e.reduce_add()
        vectorize[v, nelts](x.cols)
        scale_row(x, i, 1/s)

    parallelize[row](x.rows)

# build: mojo build softmax.mojo --target=cuda
#        mojo build softmax.mojo --target=rocm

2026 toolchain / backends / surroundings

MAX Engine
Modular inference runtime
Mojo stdlib
Open source · Apache 2.0
magic
Official package manager
MLIR
Underlying IR · LLVM family
Mojo Playground
In-browser sandbox
CUDA backend
NVIDIA H100 / A100
ROCm backend
AMD GPU · 2025-09
Apple Silicon
M-series codegen
Python interop
Import any PyPI package
PyTorch bridge
Tensor interop
NumPy bridge
Shared array buffers
Triton (cmp)
OpenAI · GPU-kernel rival
AI ERA

Counter-intuitive: AIs write Mojo worse than older languages

An interesting paradox: Mojo is an AI-era language, yet LLMs write it less reliably than Python / Java / Rust. The cause is simple — less training data. Public Mojo on GitHub in 2026 is still a few thousand repos, four orders of magnitude below Java.

Actual workflow: "AI writes the Python prototype → human translates to a Mojo kernel" is still dominant. Modular ships its own "AI-assisted Mojo authoring" tooling — feeding the model the stdlib + docs — but in 2026 we're still far from "let the AI write kernels for you".

Ironic but expected: every new language pays a "training-data cold-start tax". Rust paid it early; Zig still pays it; Mojo can't dodge it either. Every early PR, blog post and public kernel is teaching the models this language.

# Status today: AI writes Python prototype → human ports to Mojo

# Python (AI-friendly)
def attention(q, k, v):
    scores = q @ k.T / sqrt(dim)
    return softmax(scores) @ v

# Mojo (human-translated · kernel-class speed)
fn attention(borrow q: Tensor,
              borrow k: Tensor,
              borrow v: Tensor) -> Tensor:
    # SIMD-packed matmul + fused softmax
    # vectorize / parallelize / tile
    ...

In one line: Mojo isn't "Python glue" and isn't a "CUDA replacement" slogan — it's the first real language that puts the algorithm and the GPU kernel in one place. In 2026 it's young, the ecosystem is small, and the AIs still struggle with it — but the architecture is right: MLIR + Lattner's track record + real production users.

07

vs Python / Swift : Mojo vs Python vs Swift

Versus Python: Mojo is Python's acceleration off-ramp, not a replacement. Cross-link /code/python. Versus Swift (/code/swift): same designer (Chris Lattner), but Swift targets app developers and Mojo targets ML compiler engineers — one person, two completely different audiences.

PythonMojoSwift
OriginGuido · 1991Modular · 2023Apple · 2014
DesignerGuido van RossumChris LattnerChris Lattner
Primary audienceScripts · data · AI algorithmsAI kernels · compiler engineersiOS/macOS app developers
SyntaxPython itselfPython superset (in progress)Own syntax · ML-flavoured
PerformanceInterpreted (CPython)C / Rust class · MLIR codegenC-class · LLVM
Memory modelGC + refcountborrow / inout / ownedARC + value types
GPUCUDA C++ via PyTorchNative · NVIDIA + AMD + AppleMetal · Apple GPU only
SIMDNumPy abstraction, outside the languageSIMD[T, n] first-class typeSIMD[N] · stdlib
Compile-time programmingNone (dynamic language)@parameter · shared with genericsYes (associated types / macros)
InteropEverything · pip ecosystemNative Python import · across GILC direct · ObjC bridge
Ecosystem maturity35 years · the largest of all3 years · early (~10³ public repos)11 years · Apple-saturated
Open sourceFully · PSFstdlib yes · compiler closed (open before 1.0)Fully · Apache 2.0
08

Outlook : TheRoadAhead

Mojo in 2026 sits on the eve of 1.0 — open-sourcing the compiler is the final big gate. Deeper NumPy / PyTorch ABI interop is on the way; Apple's own MLX is a same-niche competitor. Whether Mojo escapes AI to become a general systems language is an open question.

HOT · 2026+

Open-sourcing the compiler — the last gate on the way to 1.0

The stdlib opened in 2024; the compiler is still closed. Community pressure to fork, embed and audit codegen has been steady. Modular has publicly committed to "open it before 1.0" — the cadence mirrors early Swift's exactly.

What it unlocks: this is the final gate between "an interesting Modular product" and "a real industrial language." Only after open-source do you get third-party compilers, teaching distros, and a cross-vendor RFC process — the Rust pattern.

Compiler closed (today, 2026)~1%
Compiler open (post-1.0)100%
INTEROP

Deeper NumPy / PyTorch ABI

Today's Python interop crosses the GIL via Python.import_module; data has to round-trip. The next step is shared buffers / zero-copy tensors — operating on a PyTorch Tensor as a Mojo struct without moving the bytes. The path from "fast but isolated" to "fast and seamless."

MLX

Apple MLX — direct competition

Apple shipped MLX in 2023: NumPy-style, Apple-Silicon-tuned, LLVM/MLIR-flavoured. On macOS, Mojo competes with Apple's first-party stack. The very line Lattner walked away from, Apple has now picked up — he sees this competitor more clearly than anyone.

GENERAL

A general-purpose systems language?

Mojo in 2026 is positioned for AI, but the language itself is general — struct + ownership + SIMD + MLIR has no "AI-only" baked in. Can it leave the AI niche and challenge Rust for general systems work? Depends on where the post-1.0 community pushes it. An open question, not a roadmap item.