Logo
Research

KEVM: A Complete Formal Semantics of the Ethereum Virtual Machine

Abstract

Smart contracts on the Ethereum blockchain manage assets worth billions of dollars, yet the Ethereum Virtual Machine (EVM) that executes them lacked a rigorous, complete formal semantics. The existing reference was the Ethereum Yellow Paper by Wood, a semi-formal document written in modified set-theoretic notation that serves as the de facto standard but suffers from ambiguities, inconsistencies, and gaps. Hildenbrandt, Saxena, Zhu, Rodrigues, Daian, Guth, Moore, Zhang, Park, and Rosu present KEVM -- the first complete, executable, formally rigorous semantics of the EVM bytecode stack-based language, built using the K Framework. KEVM serves simultaneously as a formal specification, a reference interpreter, a test-suite runner, and a foundation for program verification tools. The semantics passes all 40,683 tests in the official Ethereum VM test suite, making it the first formal semantics to achieve this milestone.

The Ethereum Virtual Machine

The Ethereum Virtual Machine is a stack-based bytecode language that processes transactions by executing contract code. Each transaction creates an execution environment consisting of a program counter, a word stack (of 256-bit integers), local memory (a byte array), and persistent account storage (a word-to-word mapping). The EVM consumes a finite resource called gas with each instruction: users attach gas to transactions, and execution halts if gas is exhausted. The gas model has evolved over multiple hard forks (Frontier, Homestead, Tangerine Whistle, Spurious Dragon, Byzantium, Constantinople, and beyond), each adjusting costs for specific opcodes. KEVM is made parametric in the particular fee schedule, meaning it defines a family of EVMs -- one per fee schedule -- allowing a single semantics to capture the behavior of all historical and future EVM versions.

The K Configuration for EVM

KEVM is developed in the K Framework, a rewriting-based semantic framework where language semantics are defined through a configuration (the program state structure) and a set of rewrite rules (the transition relation). The K configuration for the EVM uses an XML-like notation with over 70 nested cells organized hierarchically. The top-level <kevm> cell contains an <ethereum> cell with an <evm> cell (holding execution state: program counter <pc>, word stack <wordStack>, local memory <localMem>, gas <gas>, call depth <callDepth>, output <output>, and status code <statusCode>) and a <network> cell (holding account states with <acctID>, <balance>, <code>, <storage>, and <nonce>). The EVM data types module defines 256-bit word arithmetic, word stack operations, local memory as a byte-indexed mapping, and account storage as a word-to-word mapping, all with precise mathematical definitions:

W256={0,1,,22561}chop(i)=imod2256W_{256} = \{0, 1, \ldots, 2^{256} - 1\} \qquad \text{chop}(i) = i \mod 2^{256}

Opcode Semantics and Execution Cycle

The semantics is organized into several K modules that mirror the structure of the EVM specification. The core execution cycle operates through the #next [_] operator, which loads the current opcode from the program at the program counter position and orchestrates execution. For each opcode, the semantics: (1) checks for stack over/underflow conditions, (2) verifies no static-mode violations occur, (3) calculates any required address conversions for items on the word stack, (4) deducts gas according to the current fee schedule, and (5) executes the opcode's state transition. Arithmetic opcodes like ADD operate on the word stack directly -- for example, the ADD rule pops two values W0W_0 and W1W_1 from the stack, pushes chop(W0+W1)\text{chop}(W_0 + W_1), increments the program counter, and deducts gas:

rule <k> ADD W0 W1 => . ... </k>
     <wordStack> WS => chop(W0 +Int W1) : WS </wordStack>
     <pc> PCOUNT => PCOUNT +Int 1 </pc>
     <gas> G => G -Int Gverylow </gas>

More complex opcodes such as CALL, CREATE, DELEGATECALL, and STATICCALL involve intricate interactions between multiple cells, including creating new execution contexts, transferring Ether between accounts, managing the call stack, and handling return data. The SSTORE opcode for persistent storage is particularly complex due to gas refund mechanisms that vary across hard forks. The semantics also handles exceptional conditions including out-of-gas, invalid opcodes, stack violations, and revert operations, each producing appropriate status codes. Precompiled contracts (ECDSARECOVER, SHA256, RIPEMD160, ID, MODEXP, ECADD, ECMUL, ECPAIRING) are given explicit K rules that capture their input/output behavior.

Empirical Evaluation

The empirical evaluation of KEVM is extensive. The semantics was tested against the official Ethereum test suite, which contains 40,683 individual tests covering arithmetic operations, control flow, environment information, block information, system operations, and more. KEVM passes all tests completely. Regarding performance, when running with the automatically generated OCaml-based interpreter from K, KEVM was only approximately one order of magnitude slower than the reference C++ implementation of the EVM (cpp-ethereum). The authors also identified several ambiguities and inconsistencies in the Yellow Paper during the formalization process, including unclear specifications of gas costs for certain opcodes, edge cases in the CREATE instruction behavior, and imprecise definitions of exceptional halting conditions. These findings demonstrate the value of a formal executable semantics as a supplement to or replacement for informal documentation.

A formal semantics of a language is more than just a reference manual -- it is a mathematical object that can serve as the basis for rigorous analysis and verification.

Smart Contract Verification

The paper demonstrates KEVM's utility for smart contract verification by verifying two different implementations of the ERC20 Standard Token specification -- one written in Solidity and one in Vyper. The verification uses K's built-in reachability logic theorem prover, which reasons about all possible execution paths through the contract. The ERC20 specification defines functions totalSupply(), balanceOf(address), transfer(address, uint256), approve(address, uint256), transferFrom(address, address, uint256), and allowance(address, address), along with the Transfer and Approval events. The verification proves that both implementations correctly implement all ERC20 functions, handle edge cases (zero transfers, self-transfers, insufficient balance), and emit the required events. The specifications are written as reachability claims that describe the relationship between pre-states and post-states for each function, with conditions covering all relevant cases including arithmetic overflow checks.

Significance

The significance of KEVM extends beyond its immediate technical contributions. By demonstrating that a complete, executable formal semantics of a production virtual machine is achievable and practical, KEVM establishes a methodology for bringing formal methods to blockchain systems. The semantics has been adopted by Runtime Verification Inc. and is actively maintained as an open-source project. It has been used to verify high-profile smart contracts including the Ethereum Casper protocol, DappHub MakerDAO contracts, and various DeFi protocols. The work shows that the K Framework's approach -- where a single semantic definition yields an interpreter, a symbolic execution engine, and a deductive verifier -- is particularly well-suited for domains where correctness is critical and the cost of bugs is measured in real financial loss. KEVM remains the gold standard for EVM formalization and continues to evolve alongside the Ethereum protocol.

Stay Informed

Sign up for updates and never miss important announcements.

Subscribe

Join the community

Join our Discord server to get support or connect with the Pi² community.

Join discord

Follow us

Learn about company and product updates, upcoming events, rewards, and more.

TwitterTelegramDiscordLinkedInGitHubYouTube