Shuang Chen (cs0x7f)'s Java implementation pushes Kociemba two-phase to its practical limit. Symmetry compression shrinks tables by an order of magnitude, huge prune tables tighten phase-1 lower bounds, and the search interleaves phase-1 candidates with phase-2 to minimise total length. It's what csTimer and TNoodle ship.
min2phase is written by Chinese speedcuber Shuang Chen (Chen Shuang, GitHub: cs0x7f) and released under dual GPLv3 + MIT. The Java package is literally cs.min2phase. The earliest public traces go back to Chinese cubing forums in the early 2010s; it has long been the bundled solver behind csTimer and TNoodle, and the upstream is still actively maintained by the author as of 2023.
The algorithmic skeleton is still Kociemba's two-phase. What sets min2phase apart is that every engineering knob has been turned to its limit: prune tables compressed by symmetry classes into a few hundred KB, ~200 ms startup, ~0.8 ms average solve, average length 19–20.6 moves depending on probe budget. After csTimer compiles it to JS in the browser, virtually every scramble a WCA-aware local timer hands you was generated by this solver.
This page is about the 3x3 two-phase solver. Inside csTimer, cs0x7f also wrote standalone solvers for 2x2 / 4x4 / 5x5 / Square-1 / Skewb with the same family of tricks; their tables are different and out of scope here.
Herbert Kociemba's 1992 algorithm splits the 3x3 solve into G0→G1→solved. Phase 1 forces the cube into the subgroup ⟨U,R²,F²,D,L²,B²⟩ (zero edge orientation, zero corner orientation, four UD-slice edges in their layer). Phase 2 finishes the permutation inside that subgroup. The original implementation uses dense prune tables of tens of MB, and feeds the first phase-1 solution straight into phase-2, which is not necessarily globally shortest. min2phase keeps the same split but rebuilds five things.
| Aspect | Vanilla Kociemba | min2phase |
|---|---|---|
| Prune-table storage | Dense raw-coord arrays, tens of MB | Sym-class compressed, 4-bit packed, ~1 MB total |
| Heuristic lower bound | UDSliceTwist or UDSliceFlip alone | Max of three: UDSliceTwist + UDSliceFlip + TwistFlip |
| Multi-axis + inverse | Single axis, forward only | 3 G1 axes × forward / inverse = 6 orientations in parallel |
| Phase 1 ↔ Phase 2 interleaving | Commits to first phase-1 solution | Iterates on total length with allowShorter + pre-moves |
| Pre-scramble | None | MAX_PRE_MOVES=20, conjugate-equivalent re-roots |
Stacked together: at maxDepth=21 the full-init average is 0.8 ms per solve at average length 20.6; raising probeMin to 10000 brings the average down to 18.63 moves, within striking distance of the proven 20-move God's number.
CubieCube is the cubie-level representation: 8 corners + 12 edges, each carrying (position, orientation). In code it is literally two arrays:
// CubieCube.java
byte[] ca = {0, 1, 2, 3, 4, 5, 6, 7}; // 8 corners
byte[] ea = {0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22}; // 12 edges
// ca[i] = orientation << 3 | position
// ea[i] = position << 1 | orientationEach byte packs both fields: corners have 2 bits of twist (0–2) and 3 bits of slot (0–7); edges have 1 bit of flip and 4 bits of slot. Composing two CubieCubes (acting one cube state on another) is just permutation algebra:
// Corner multiplication: prod = a * b
static void CornMult(CubieCube a, CubieCube b, CubieCube prod) {
for (int corn = 0; corn < 8; corn++) {
int oriA = a.ca[b.ca[corn] & 7] >> 3;
int oriB = b.ca[corn] >> 3;
prod.ca[corn] = (byte) (a.ca[b.ca[corn] & 7] & 7
| (oriA + oriB) % 3 << 3);
}
}CubieCube is complete but doing 8+12 byte-array multiplications per node is too slow. CoordCube is its coordinate projection: collapse a cube state into a handful of integers (twist / flip / slice / …), and precompute, for each coordinate, where every one of the 18 moves takes it. Search then becomes table lookups instead of array ops:
// CoordCube.java — every coordinate gets a move table static char[][] UDSliceMove = new char[N_SLICE][N_MOVES]; // 495 × 18 static char[][] TwistMove = new char[N_TWIST_SYM][N_MOVES]; // 324 × 18 (sym-compressed) static char[][] FlipMove = new char[N_FLIP_SYM][N_MOVES]; // 336 × 18 static char[][] CPermMove = new char[N_PERM_SYM][N_MOVES2]; // 2768 × 10 static char[][] EPermMove = new char[N_PERM_SYM][N_MOVES2]; // 2768 × 10 static char[][] MPermMove = new char[N_MPERM][N_MOVES2]; // 24 × 10
Names ending in SYM are symmetry-compressed versions; raw tables are indexed by the raw coordinate value. Each cell stores a char (16 bits) — wide enough for the (class<<3 | sym) packing used by sym-coords.
The point of coordinates: slice the full 4.3×10¹⁹-state space along some equivalence and label each slice with one integer, such that the 18 moves' action on the label is precomputable. min2phase uses six of them:
| Coord | Size | Meaning | Phase |
|---|---|---|---|
Twist | 2187 = 37 | Orientation of 8 corners (7 independent) | P1 |
Flip | 2048 = 211 | Orientation of 12 edges (11 independent) | P1 |
UDSlice | 495 = C(12,4) | Position of FL/FR/BL/BR among 12 edges, ignoring permutation among themselves | P1 |
CPerm / EPerm | 40320 = 8! | Permutation of 8 corners / 8 UD-layer edges | P2 |
MPerm | 24 = 4! | Permutation of the 4 E-slice edges within E layer | P2 |
CComb (Parity+Pos) | 140 = 70 × 2 | Position of 4 U-layer corners (70) × edge parity (×2) | P2 |
Phase 1 wants Twist, Flip and UDSlice all simultaneously zero — equivalently, zero corner-orientation, zero edge-orientation, and four E-slice edges back in the E layer. That is exactly the G1 subgroup definition. Phase 2 then drives CPerm/EPerm/MPerm to zero inside G1.
Parity is not a free coordinate; cs0x7f folds it into the high bit of CComb (70 corner positions × 2 parity bits = 140), saving one table.
The cube has 48 spatial symmetries; the UD-axis-preserving subgroup has 16 elements (generated by S_U4 quarter-turns, the LR mirror S_LR2 and the F-axis 180° S_F2). If b = S⁻¹ a S for some such S, then a and b have the exact same optimal solution length — they are the same state viewed from a different orientation. So we can collapse each 16-orbit into one symmetry class, storing only the representative + a small sym index back.
| Coord | raw | sym class | ratio |
|---|---|---|---|
| Twist | 2187 | 324 | 6.75× |
| Flip | 2048 | 336 | 6.10× |
| EPerm / CPerm | 40320 | 2768 | 14.57× |
The ratio falls short of 16× because some states are self-symmetric and need extra entries (selfSym). The packed format is (class<<3 | sym), fitting in a 16-bit char. Move transitions on a sym-coord use precomputed SymMult / SymMove / Sym8Move tables to translate move m into its equivalent in the representative's frame:
// CoordCube.doMovePrun — phase 1 transition on a sym-coord twist = TwistMove[cc.twist][CubieCube.Sym8Move[m << 3 | cc.tsym]]; tsym = (twist & 7) ^ cc.tsym; twist >>= 3;
It is set up inside CubieCube.initFlipSym2Raw / initTwistSym2Raw / initPermSym2Raw, producing the bidirectional maps FlipS2R / TwistS2R / EPermS2R.
UDSlice has only 495 values to begin with, and it is aligned with the UD axis — most symmetries in the group don't preserve "those four edges are in E", so compressing it is more trouble than it's worth. It stays a raw coordinate; the symmetry compression only applies to the other coordinate, giving raw×sym tables like UDSliceTwistPrun[495 × 324].
The phase-1 heuristic is the max of three tables:
// CoordCube.setWithPrun (phase 1 lower-bound)
prun = max(
getPruning(TwistFlipPrun, twist<<11 | flipIdx), // 2187 × 2048 → "huge"
getPruning(UDSliceTwistPrun, twist*N_SLICE + sliceX), // 324 × 495
getPruning(UDSliceFlipPrun, flip *N_SLICE + sliceX)); // 336 × 495The first one — TwistFlipPrun — is cs0x7f's key addition, the "Huge prune table". It has N_TWIST_SYM × N_FLIP = 324 × 2048 = 663,552 cells, 4 bits each, totalling 332 KB. It supplies a joint lower bound on corner+edge orientation alone, abstracting UDSlice away entirely. The accuracy is visible in pruningValue.txt — by depth 9 it already covers 99.97% of states:
// pruningValue.txt — TwistFlipPrun depth distribution 1 3 2 15 3 106 4 951 5 9826 6 89711 7 447496 8 662070 9 663552 ← full coverage at depth 9, total 663,552 cells
The other two saturate only around depth 14; the huge table gives a bound that is on average 1–2 moves tighter, which cascades into exponential pruning of the phase-1 tree. Algorithm.md gives the per-table averages: UDSliceTwist 6.76, UDSliceFlip 6.85, TwistFlip 7.18 — the huge table is tightest but the real win is taking max over all three at every node.
All prune tables are 4-bit packed: one int holds 8 nibbles, each nibble being that cell's distance to solved mod 16. Read/write are one-liners:
static int getPruning(int[] t, int i) { return t[i >> 3] >> (i << 2) & 0xf; }
static void setPruning(int[] t, int i, int v) { t[i >> 3] ^= v << (i << 2); }The (i << 2) wraps mod 32 in Java, so the same expression yields both the word offset and the nibble bit offset, saving an (i & 7) * 4. TwistFlipPrun is 332 KB, UDSliceTwistPrun / UDSliceFlipPrun ~80 KB each, MCPermPrun 33 KB, EPermCCombPPrun 195 KB — together right around the advertised ~1 MB.
Construction is BFS with an inverse-fill switch: forward-fill while the frontier is small, then once depth exceeds INV_DEPTH switch to scanning unfilled cells and asking whether any successor is already filled — cheaper when the unfilled set has shrunk. See CoordCube.initRawSymPrun.
Phase 1 is IDA* iterated by total depth length1 from 0 upwards. At each iteration limit, a DFS expands all 18 moves and calls doMovePrun to get the successor's heuristic; the branch is pruned the moment prun + depth > maxl:
// Search.phase1 (excerpt)
for (int axis = 0; axis < 18; axis += 3) {
if (axis == lm || axis == lm - 9) continue; // skip same / opposite axis
for (int power = 0; power < 3; power++) {
int m = axis + power;
if (skipMoves != 0 && (skipMoves & 1 << m) != 0) continue;
int prun = nodeUD[maxl].doMovePrun(node, m, true);
if (prun > maxl) break; // whole axis ruled out
else if (prun == maxl) continue; // this power ruled out
if (USE_CONJ_PRUN) {
prun = nodeUD[maxl].doMovePrunConj(node, m);
if (prun > maxl) break;
if (prun == maxl) continue;
}
move[depth1 - maxl] = m;
int ret = phase1(nodeUD[maxl], ssym & moveCubeSym[m], maxl - 1, axis);
if (ret == 0) return 0;
if (ret >= 2) break;
}
}Three things to notice. (1) Among the three powers (U / U2 / U') on the same axis, continue rules out just that power while break kills the whole axis. cs0x7f signals the grain via "> maxl" vs "== maxl". (2) skipMoves uses selfSym: if the current state has a non-trivial stabilizer, we can drop first-moves that are symmetry-equivalent to ones already tried, avoiding duplicate expansions. (3) USE_CONJ_PRUN queries the same prune table at one conjugate of the state — effectively doubling the heuristic for free, picking up another move or two of pruning.
When node.prun == 0 && maxl < 5 and allowShorter permits, phase1 chooses to exit early — the hook into phase 2 that the next section unpacks.
Inside phase 2 the move set shrinks to 10: U, U2, U', D, D2, D', R2, F2, L2, B2 — anything else would leave G1. Three coordinates fully describe the state: EPerm (UD edges), CPerm (corners), MPerm (E-slice internal). The heuristic is again a max:
// Search.initPhase2 — phase-2 lower bound
int prun = max(
getPruning(MCPermPrun, cperm * N_MPERM + MPermConj[mid][csym]),
getPruning(EPermCCombPPrun, edge * N_COMB + CCombPConj[Perm2CombP[corn]][...]),
getPruning(EPermCCombPPrun, edgei * N_COMB + CCombPConj[Perm2CombP[corni]][...]) // inverse
);The third term is the same heuristic evaluated on the inverse of the current state. cs0x7f makes getPermSymInv an O(1) lookup, so it's free. Phase-2 IDA* runs within maxDep2 (≤12). USE_COMBP_PRUN folds parity into the 70–139 range of CComb so it contributes to the pruning bound automatically.
// Search.phase2 — branching with parity-aware move skip
int moveMask = Util.ckmv2bit[lm];
for (int m = 0; m < 10; m++) {
if ((moveMask >> m & 1) != 0) { m += 0x42 >> m & 3; continue; }
int midx = MPermMove[mid][m];
int cornx = CPermMove[corn][SymMoveUD[csym][m]];
int csymx = SymMult[cornx & 0xf][csym];
cornx >>= 4;
int edgex = EPermMove[edge][SymMoveUD[esym][m]];
int esymx = SymMult[edgex & 0xf][esym];
edgex >>= 4;
// … prun checks … then recurse phase2(edgex, esymx, cornx, csymx, midx, maxl-1, …);
}m += 0x42 >> m & 3 is one of cs0x7f's bit-LUT tricks: 0x42 = 0100 00102 encodes "at m=1 skip 2, at m=6 skip 2", letting the loop hop past sibling powers on the same axis without an inner branch.
Vanilla Kociemba commits to the first phase-1 solution and chains phase-2, returning a legal but not necessarily shortest answer (say 23 moves). The issue: a 7-move phase 1 + 16-move phase 2 may lose to a 9-move phase 1 + 12-move phase 2 = 21, because the longer phase 1 lands at a friendlier point in G1. Vanilla won't look back.
min2phase iterates the outer loop by total length length1. The top-level search() loop:
// Search.search() — outer loop on total length
for (length1 = 0; length1 < solLen; length1++) {
maxDep2 = Math.min(MAX_DEPTH2, solLen - length1 - 1);
for (urfIdx = 0; urfIdx < 6; urfIdx++) {
if ((conjMask & 1 << urfIdx) != 0) continue; // axis already covered by selfSym
if (phase1PreMoves(maxPreMoves, -30,
urfCubieCube[urfIdx],
(int)(selfSym & 0xffff)) == 0) {
return solution == null ? "Error 8" : solution.toString();
}
}
}For each value of length1, every phase-1 solution found is immediately handed to phase 2 (initPhase2); a phase1+phase2 chain of total ≤ solLen is accepted. The instant one is accepted, maxDep2 is tightened to solLen - length1 - 1 and the search keeps probing more phase-1 routes at this same length — a different phase-1 path may unlock a shorter phase 2. probeMin sets how many phase-2 probes must elapse before early exit. That's the "find shorter solutions" knob in the README.
Two more multipliers: 3 axes × forward/inverse. urfCubieCube[0..5] stores the same cube viewed through six URF-conjugates and inverses; phase 1 runs once on each — matching Algorithm.md's "try three G1 targets + the inverse state". Pre-moves rewrite Phase1 · Phase2 · PreMoves = Scramble⁻¹, letting phase 1 not start at the literal scramble — another factor of candidates. conjMask uses the selfSym bitmap to skip urfIdx values already covered by a symmetry equivalence.
(a) Solution at length ≤ maxDepth and probe ≥ probeMin → return; (b) probe ≥ probeMax with no solution → "Error 8"; (c) length1 exhausts solLen with no solution → "Error 7" (no solution within the given maxDepth).
Benchmark.md ships two tables (author's environment). The first is target-length vs solve time:
| Init | Unlimited | 21 moves | 20 moves | Init time |
|---|---|---|---|---|
| Full | 0.685 ms | 0.805 ms | 4.62 ms | 195 ms |
| Partial | 4.12 ms | 4.52 ms | 19.0 ms | 60 ms |
Full init precomputes every prune table (~195 ms), after which each solve costs 0.8 ms; partial init only fills BFS up to a shallow depth — init drops to 60 ms but solves are 5–6× slower. The second table shows the probe-budget trade-off:
| probeMin | Avg length | Time |
|---|---|---|
| 5 | 20.63 | 0.827 ms |
| 10 | 20.48 | 1.07 ms |
| 20 | 20.27 | 1.55 ms |
| 50 | 19.95 | 2.83 ms |
| 100 | 19.68 | 4.87 ms |
| 200 | 19.48 | 8.58 ms |
| 500 | 19.29 | 19.4 ms |
| 1000 | 19.11 | 36.3 ms |
| 10000 | 18.63 | — |
Typical csTimer setup: probeMin = 0 (accept the first legal-length-≤21 solution); one scramble costs ~1 ms. For fewest-move analysis, raise probeMin to 1000+ and the average compresses to 19.1 moves at a tens-of-ms cost per solve. Memory per the README: ~1 MB with the TwistFlip table, ~0.7 MB without. That budget is precisely why this fits inside a browser JS bundle.
initLevel.nodeUD/nodeRL/nodeFB are 21 preallocated CoordCube slots; urfCubieCube[0..5] and preMoveCubes[1..MAX_PRE_MOVES] likewise. The entire inner search reuses these slots — zero per-node allocation.isRec = true); next(probeMax, probeMin, verbose) resumes IDA* in place to keep finding shorter solutions until the probe budget runs out.searchopt() which iterates all three axes' phase 1 simultaneously, producing a provably optimal solve but an order of magnitude slower. csTimer leaves it off.min2phase ships as Java. Putting it inside csTimer is less about hand-porting and more about exploiting Java's strictness and zero external deps. Historically csTimer used GWT (Google Web Toolkit) to compile cs.min2phase to equivalent JS; later cs0x7f maintained a hand-translated JS port at cstimer/src/lib/scramble_333.js, almost line-for-line aligned with the Java source, swapping char[][] for Uint16Array and int[] for Int32Array.
TNoodle (the WCA's official scramble generator) takes a different route: it runs cs.min2phase as a jar on the JVM, hashes the scrambles, and feeds them into the WCA competition system. csTimer and TNoodle share the same algorithm but on different runtimes — TNoodle generates every official WCA scramble.
The iframe at /cstimer on this site runs that JS port of min2phase; the default 3x3 scramble at /scramble/gen is also it (random state + min2phase solve, scramble keeps the inverse). /scramble/analyzer uses cubeopt-wasm — a more aggressive SAB + multi-thread approach — running in parallel.
min2phase is the textbook case of "engineering the last drop out of an existing algorithm". Kociemba's two-phase framework is not touched; symmetry compression + the huge prune + multi-axis views + full-length IDA* iteration drop the same algorithm from tens of MB and seconds to 1 MB and milliseconds. It doesn't propose a new algorithm — it just makes the old one cheap enough to live in a browser tab. Which is why every WCA scrambler of the last decade looks the way it does.