| The WhitePaper Reading Club + zkHub | Nov 10 2024 |
|---|---|
| zkMIPS: A high-level specification | Avdhesh Charjan (@CuriousCharjan) |
| https://whitepaper.zkm.io/new_zkMIPS_white_paper.pdf |
Summary
zkMIPS is a zero-knowledge virtual machine that enables verifiable execution of MIPS instruction set programs by arithmetizing computation traces into polynomials and proving their correctness using advanced cryptographic proof systems like STARKs and PLONKs.
Why This Is Important
It allows for secure, efficient, and scalable verification of computations across a wide range of applications utilizing the MIPS architecture that ensures computational integrity without revealing internal states or requiring trust in external computation parties, impacting industries needing verifiable off-chain computations
Key Innovation
zkMIPS introduces a multi-layered, recursive proof system architecture that combines STARKs, PLONKs, and Groth16 proofs to achieve both scalability and succinctness in verifying MIPS program executions. By leveraging the well-established and stable MIPS ISA, it bridges the gap between theoretical cryptographic proofs and practical, real-world processor architectures, enabling compatibility with existing compilers and toolchains.
Overview
zkMIPS is designed to provide a zero-knowledge proof framework for programs executed on the MIPS architecture. It models the execution of MIPS programs as a sequence of valid CPU states forming a computation trace. This trace is then arithmetized into polynomials, enabling the application of cryptographic proofs. The proof system employs a hierarchical, multi-layered structure:
- Module Proofs (Layer 3): Individual instructions are categorized into modules (arithmetic, logic, memory, control, Poseidon hashing), and each module is independently verified using specialized STARK proofs.
- Segment Proofs (Layer 2): Program execution is divided into segments. Each segment collects traces from module proofs and uses a lookup scheme to verify that these traces match the segment’s execution trace.
- Continuation Proofs (Layer 1): Segment proofs are recursively combined using continuation proofs based on the PLONK protocol, resulting in a single proof for the entire program execution.
An optional additional layer employs Groth16 proofs for on-chain verification, enhancing compatibility with Ethereum.
Background
MIPS: MIPS (Microprocessor without Interlocked Pipeline Stages) is a RISC (Reduced Instruction Set Computing) architecture that uses a simplified, fixed-length instruction set and a load/store design to enhance performance through efficient pipelining. Compared to other RISC architectures, MIPS emphasizes simplicity and scalability in hardware implementation, making it suitable for a wide range of computing applications.
(STARKs, PLONKs, and Groth16)
Questions
Why go full on MIPS? Why is removing interlocking so important? Doesn’t this put more effort on the software engineer to do dependency management?
Write about Starky
Team
(i) Lucas Fraga: Lead researcher focusing on cryptographic protocols and the integration of proof systems within zkMIPS. (ii) Stephen Duan: Key developer responsible for implementing the zkVM and optimizing the interaction between the MIPS architecture and cryptographic proofs.
(iii) Jeroen van de Graaf: Contributor specializing in theoretical underpinnings and the design of the zkMIPS protocol. (iv) Ming Guo: Researcher and developer working on performance optimization and system architecture of zkMIPS.
Questions
- How does zkMIPS handle the arithmetization and proof of complex MIPS instructions involving memory access, branching and system calls within the polynomial constraints?
- What specific optimizations are implemented to reduce proof size and verification, especially in the context of recursive proofs and multi-layered architecture?
- How does zkMIPS’s performance, scalability and proof size compare to other zkVMs based on different instruction set architectures such as RISC-V (or even Ethereum’s EVM bytecode)?
- Are there any plans to extend zkMIPS to support Ethereum’s smart contracts directly?
Components
| Multi-Layered Proof Architecture | zkMIPS’s proof system is divided into three distinct layers—continuation, segmentation, and modularization—which separate the verification process into manageable parts. Each layer has specialized proofs:Module Proofs: At the lowest layer, individual instructions are grouped into modules based on their functionality (arithmetic, logic, memory, control). Each module is independently verified using specialized STARK proofs, which arithmetize the module’s computation trace into polynomials.Segment Proofs: Execution is divided into segments, and a lookup scheme (LogUp protocol, implemented as a STARK proof) is used to verify that the module proofs correctly correspond to the segment’s execution trace. This layer enhances parallelism and scalability.Continuation Proofs: Segment proofs are recursively combined using continuation proofs based on the PLONK protocol. This recursive composition results in a single, succinct proof representing the entire program execution. |
|---|---|
| Combination of Cryptographic Proof Systems | STARKs and SNARKs: zkMIPS uses STARK for transparent, scalable proof generation, ideal for large computations.PLONKs: Employed in continuation proofs to achieve succinctness and efficient recursive proof composition.Groth16 Proofs: Optionally used for on-chain verification, leveraging pairing-based cryptography to enable efficient verification within blockchain environments like Ethereum.This combination of proof systems allows zkMIPS to maintain both security and efficiency. STARK-based proofs ensure transparency without requiring a trusted setup, while SNARK-based proofs (such as those based on Groth16) provide succinctness in settings where computational resources or time is limited, enabling efficient on-chain verification and making zkMIPS adaptable to various verification environments. |
| Arithmetization Strategy | Trace Representation: The computation trace is carefully structured as a table to represent CPU states and transitions accurately. There is no fixed correspondence between CPU register and trace columns as the set of trace columns is optimized to improve instructions proving. Registers and memory states are arithmetized into polynomials, and specialized constraints ensure the validity of state transitions. Every register-write is saved in the memory and every register-read is loaded from memory, meaning register consistency is guaranteed by memory consistency.Optimized Field Selection: The Goldilocks field (a 64-bit prime field) is used to balance computational efficiency and proof size, enabling efficient representation of 16-bit and 32-bit values within the proof system. |
| Memory Management and Merkle Trees | Poseidon Hashing: To maintain memory integrity, zkMIPS leverages Poseidon hashes, optimized for zero-knowledge environments, as part of a Merkle tree structure. This setup records the state of each 4KB memory page, creating a memory state root that can be efficiently verified through Merkle paths. Memory consistency is verified in parallel with the execution of CPU states, ensuring that only modified variables are logged, which minimizes overhead and enables faster verification times.Memory Consistency: Constraints are established to ensure memory consistency across state transitions, and memory access operations are verified within the proof system. |
| Prover Networks for Scalability | zkMIPS incorporates a network of provers that enables distributed and parallel computation of proofs, allowing large tasks to be divided across multiple computational units. This parallel processing capability results in significant performance gains and supports segment-based aggregation, meaning that for large-scale computations, zkMIPS can achieve logarithmic-time proof verification relative to the number of segments. The networked prover design is ideal for applications requiring rapid verification of numerous or computationally intensive tasks, making zkMIPS scalable and suitable for real-world, large-scale applications. |
Opinions
zkMIPS represents a forward-thinking solution for general-purpose computation in zero-knowledge form, but its performance scalability and flexibility for larger MIPS applications will be crucial to its adoption. The reliance on MIPS, a stable architecture, is a strategic choice, as it appeals to industries needing long-term support. However, how well it scales on different hardware platforms (especially with the memory overhead) will determine its broader applicability.
Integrating zero-knowledge proofs with the MIPS architecture could democratize access to privacy-preserving computation by leveraging a well-understood and simple instruction set. It might lower barriers for developers and increase the adoption of secure computing practices.
The multi-layered proof system architecture is innovative, aiming to balance scalability, succinctness, and transparency.
The paper could be strengthened by providing more detailed analyses in several areas. Specifically, a deeper exploration of performance metrics, scalability, and comparisons with other zkVMs would enhance the assessment of zkMIPS’s practicality and competitiveness. Additionally, addressing the complexities introduced by the multi-layered approach and providing optimization strategies would offer valuable insights to researchers.
Overall, zkMIPS is a promising contribution to the field of verifiable computing. With further refinement and more comprehensive evaluations, it has the potential to become a foundational platform for zero-knowledge proofs in general-purpose computing contexts.
References
- https://whitepaper.zkm.io/new_zkMIPS_white_paper.pd
- https://www.cs.nthu.edu.tw/~ychung/slides/CSC3050/MIPS-ISA.pdf
Glossary
- Arithmetization: The process of transforming computational logic, operations, and state transitions into algebraic representations, typically polynomials over finite fields. This conversion allows computational problems to be expressed as mathematical equations suitable for cryptographic proofs, particularly in zero-knowledge proof systems.
- Common Reference String (CRS): A set of public parameters generated during a trusted setup phase in some zero-knowledge proof systems. The CRS contains cryptographic material that is used by the prover to construct proofs and by the verifier to verify them. In SNARKs like Groth16 and PLONK, the CRS is crucial for the security and efficiency of the proof system.
- Constraint Polynomial: Polynomials that encode the constraints or rules that a computation must satisfy. In zero-knowledge proofs, constraint polynomials define the relationships between variables at each step of the computation. The prover demonstrates that these polynomials evaluate to zero, verifying that the computation adheres to the specified constraints.
- Interactive Oracle Proof (IOP): A proof system model that combines aspects of interactive proofs and probabilistically checkable proofs (PCPs). In an IOP, the prover and verifier interact by exchanging messages, and the verifier has oracle access to the prover’s messages, allowing for efficient verification of complex statements while reducing communication overhead.
- FRI Protocol (Fast Reed-Solomon Interactive Oracle Proofs of Proximity): A protocol used in STARKs for efficiently testing whether a function is close to a low-degree polynomial over a finite field. It allows for probabilistic verification of polynomial degrees without revealing the polynomials themselves, ensuring that the prover’s commitments are sound.
- DEEP-FRI Protocol (Direct Extension of Error-correcting Probabilistic Fast Reed-Solomon Interactive Oracle Proofs of Proximity): An advanced version of the FRI protocol used in STARKs for low-degree testing of polynomials. DEEP-FRI improves soundness and efficiency by sampling polynomials outside of the initially claimed domain, enhancing the robustness and security of the proof.
- Fiat-Shamir Transformation: A cryptographic technique that converts an interactive proof protocol into a non-interactive one by replacing the verifier’s random challenges with the output of a cryptographic hash function applied to the transcript of the proof so far. This heuristic assumes the hash function behaves like a random oracle, enabling practical non-interactive zero-knowledge proofs.
- Goldilocks Field: A finite field used in cryptographic protocols, especially in zero-knowledge proof systems like Plonky2. The field is defined by a 64-bit prime modulus that allows for optimized arithmetic operations on modern processors. It balances computational efficiency and security, enabling efficient representation of 16-bit and 32-bit values within the proof system.
- Groth16: A succinct non-interactive zero-knowledge proof system (zk-SNARK) introduced by Jens Groth in 2016. Groth16 features very short proof sizes and fast verification times but requires a trusted setup to generate a common reference string (CRS). It is widely used in blockchain applications for efficient on-chain verification.
- MIPS (Microprocessor without Interlocked Pipeline Stages): A RISC (Reduced Instruction Set Computing) architecture known for its simplicity, regularity, and efficiency. MIPS uses a fixed instruction length and a load/store architecture, where operations are performed on registers, and memory is accessed through explicit load and store instructions. Its simplicity makes it suitable for modeling in zero-knowledge proof systems like zkMIPS.
- Plonky: A proof system and library that combines features from both PLONK and STARKs, aiming to achieve transparency (no trusted setup) and efficiency. Plonky utilizes Merkle Trees as polynomial commitments and the DEEP-FRI protocol for low-degree testing, removing the need for a trusted setup (transparency) and supporting recursive proofs for scalability.
- Plonky2: An advanced implementation of Plonky that further improves performance and supports efficient recursive proof composition. Plonky2 leverages the Goldilocks field for optimized arithmetic operations and integrates STARK-like features into a module called Starky.
- PLONK (Permutation over Lagrange-bases for Oecumenical Non-Interactive arguments of Knowledge): A universal zk-SNARK protocol that supports arbitrary circuits without the need for per-circuit trusted setups. PLONK introduces permutation arguments and uses polynomial commitments to verify that computations satisfy specified constraints. It relies on a variation of CRSs called structured reference string (SRS), which is generated during a trusted setup and can be updated for different circuits.
- Polynomial Commitment Scheme (PCS): A cryptographic protocol that allows a prover to commit to a polynomial and later prove properties about it, such as evaluations at certain points, without revealing the polynomial itself. Polynomial commitments are essential for constructing zero-knowledge proofs that involve polynomials, as they enable the verifier to check the correctness of computations efficiently.
- Poseidon Hash Function: A cryptographic hash function designed specifically for efficiency within zero-knowledge proof systems. Poseidon is optimized for operations over prime fields and offers low computational complexity within zk-SNARKs and zk-STARKs, making it suitable for hashing within polynomial constraint systems used in protocols like zkMIPS.
- Rank-1 Constraint System (R1CS): A representation of computational statements where each constraint is a rank-1 quadratic equation over a finite field. In R1CS, constraints take the form (\sum a_i x_i)(\sum b_i x_i) = \sum c_i x_i, allowing arbitrary computations to be encoded as a set of polynomial equations suitable for zk-SNARKs like Groth16.
- SNARK (Succinct Non-interactive Argument of Knowledge): A type of cryptographic proof that is succinct (having short proof sizes and quick verification times), non-interactive (requires no interaction between prover and verifier after setup), and provides an argument of knowledge (ensures that the prover possesses a witness to the statement). SNARKs typically require a trusted setup phase.
- STARK (Scalable Transparent ARgument of Knowledge): A type of zero-knowledge proof that is scalable (efficient for large computations), transparent (requires no trusted setup), and provides an argument of knowledge. STARKs use hash functions and probabilistic checking techniques to ensure soundness and are considered post-quantum secure due to their reliance on minimal cryptographic assumptions (hash function outputs are random and computing a hash collision is difficult).
- zk-SNARK (Zero-Knowledge Succinct Non-interactive Argument of Knowledge): A SNARK that also provides the zero-knowledge property, ensuring that the verifier learns nothing beyond the validity of the statement being proven. zk-SNARKs are widely used in applications requiring privacy and succinct proofs, such as confidential transactions in blockchain systems.
- zk-STARK (Zero-Knowledge Scalable Transparent ARgument of Knowledge): A STARK that includes the zero-knowledge property. zk-STARKs are transparent (requiring no trusted setup), scalable for large computations, and post-quantum secure. They are suitable for applications requiring both privacy and robustness against quantum attacks.
Notesbtw, Optimism's fraud proof also runs on a MIPS virtual machine implemented as a smart contract: https://docs.optimism.io/stack/fault-proofs/mips@alex.xiong.tech@gmail.com ohh waow, that's super interesting!that's not the reason for choosing MIPS, team choose MIPS for its maturity and wide-adoption, availability of toolkits etc. RISC-V is newer, allow customization via extension instruction sets. Whereas MIPS is stable.
forget about interlock in its name, it's irrelevant to the discussion here.
but if one has to ask, interlocked operations lead to deadlock (thread A wait for B to release the lock while thread B is waiting for A to release its lock, both wait indefinitely). this might happen in concurrent executionThanks for clarifying, actually zkMIPS is using MIPS32, and only small part of syscalls are supported and suppose to have no concurrent execution. https://github.com/zkMIPS/zkm/blob/main/prover/src/witness/operation.rs#L934Thanks for clarifying!👍just like any other VM, like SP1 which runs on RISC-V, there's memory region and register region. You prove valid read/write to those region. "Memory checking" in verifiable compute literature is a well-studied problem. nothing fancy here.
Ben-Sasson et.al described how to do it in 2013: https://eprint.iacr.org/2013/507.pdf
Nexus has its own technique: https://docs.nexus.xyz/specs/nexus-memory-checkingThank you! From the paper "Less rich ISAs are reduced instruction set computer (RISC) machines designed for devices like
smartphones (ARM) and embedded microcontrollers (Atmel AVR). Yet, even these “simple” ISAs are quite
rich: they support many addressing modes, many conditional branches, floating point arithmetic, instructions
for parallel execution, and so on."i didn't read ZKM's whitepaper, but it seems that they rely on plonky2. the recursion is just out of the box. I don't think they did anything more special on top. (i could be wrong)
so if you want to know the technique plonky2 uses, we already discussed during last reading session, right? https://github.com/0xPolygonZero/plonky2/blob/main/plonky2/plonky2.pdf
use of snark-friendly hash (poseidon), FRI optimization. etc.Thank you!!!it all depends on the concrete proof system chosen. each proof system comes with its performance and tradeoffs.
there's nothing inherently fancy about choosing one RISC instruction set over another.
I'd say they are pretty much the same across all metrics if they use the same library (say plonky2)👍EVM node software (geth or reth) can be compiled down to MIPS target, then any contract will be re-interpreted as MIPS instructions.
then just run the entire EVM processing transactions, and prove that using zkMIPS.Should we also have a short 1-2 line on Groth 16?umm i was thinking of adding a new section., "glossary' where we explain groth16, snarks, starks, mips, etcPerfect