
A Complete Formal Semantics of x86-64 User-Level Instruction Set Architecture
Abstract
This paper, authored by Sandeep Dasgupta, Daejun Park, Theodoros Kasampalis, Vikram S. Adve, and Grigore Rosu, presents the most complete and thoroughly tested formal semantics of the x86-64 instruction set architecture (ISA) to date. Published at PLDI 2019 (the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation), the work faithfully formalizes all non-deprecated, sequential user-level instructions of the x86-64 Haswell microarchitecture. This amounts to a staggering 3,155 instruction variants corresponding to 774 mnemonics, making it by far the largest formal ISA semantics ever produced. The semantics is defined in the K Semantic Framework (), making it fully executable and amenable to formal analysis, and has been validated against more than 7,000 instruction-level test cases as well as the GCC torture test suite.
Challenges of x86-64 Formalization
The x86-64 architecture is notoriously complex, presenting enormous challenges for formal semantics. The instruction set has evolved over decades through numerous extensions (MMX, SSE, SSE2, SSE3, SSE4, AVX, AVX2, etc.), resulting in a vast number of instructions with intricate encoding schemes, implicit operand conventions, and complex flag behavior. The official Intel reference manual spans thousands of pages and is written in informal English prose, which is inherently ambiguous and has been shown to contain errors. Previous attempts at formalizing x86-64 semantics, such as those by Strata and Stoke, covered only subsets of the ISA and were limited in their approach. The authors' work addresses these limitations by providing a complete, machine-readable, and rigorously tested formal specification that can serve as a single source of truth for the architecture's behavior.
Machine Configuration in K
The machine configuration in K defines the formal state of the x86-64 processor. This configuration is organized into cells representing the key components of the processor's user-level state:
- General-purpose registers: The 16 64-bit registers
RAX,RBX,RCX,RDX,RSI,RDI,RSP,RBP,R8--R15, along with their sub-register views (32-bit, 16-bit, and 8-bit) - SIMD registers: The 256-bit
YMM0--YMM15registers (and their 128-bitXMMsub-views) for vector operations - RFLAGS register: Individual flag cells for
CF(carry),PF(parity),AF(auxiliary carry),ZF(zero),SF(sign), andOF(overflow), along with the critical modeling of undefined flag behavior - Memory: A flat address space model for user-level instruction execution
- Instruction pointer: The
RIPregister tracking the current execution point
Each instruction's semantics is specified as a K rewrite rule that transforms the relevant portions of this configuration. For example, an ADD instruction updates the destination operand and modifies all six status flags according to the result.
Methodology: Combining Automated and Manual Formalization
The systematic methodology for producing the semantics involved a multi-phase approach. First, the authors leveraged the existing work of Strata, a project that had used program synthesis to automatically learn semantics for approximately 60% of the x86-64 instruction variants. The Strata-derived semantics were translated into K rules and served as a foundation. For the remaining approximately 40% of instructions -- which Strata could not handle due to their complexity (e.g., string operations, system-level instructions with user-mode behavior, complex floating-point operations) -- the authors manually wrote K semantic rules by carefully studying the Intel Software Developer's Manual. This manual effort took approximately 8 person-months and required resolving numerous ambiguities and inconsistencies in the reference documentation.
The extensive testing process revealed bugs in both the Intel x86-64 reference manual itself and in other existing formal and informal semantics, including Strata and Stoke, underscoring the value of rigorous formal specification.
Validation and Testing
Validation was performed at two levels of granularity. At the instruction level, over 7,000 test cases were developed to exercise individual instructions with various operand combinations, edge cases, and flag behaviors. These tests compare the output of the K semantics (executing via the K interpreter) against the actual behavior of a physical x86-64 processor. At the program level, the semantics was validated against the GCC torture test suite, a widely-used collection of C programs designed to stress-test compiler correctness. The ability to execute compiled binaries against the formal semantics and obtain correct results provides strong evidence that the semantics faithfully models the real hardware. The testing process was instrumental in uncovering errors: bugs were found in Stoke's modeling of undefined flag behavior and in the Intel manual's description of certain instruction behaviors.
Modeling Undefined and Implementation-Defined Flag Behavior
One of the most challenging aspects of x86-64 semantics is the treatment of undefined and implementation-defined behavior, particularly regarding the RFLAGS register. Many x86-64 instructions leave certain flags in an "undefined" state, meaning the processor is free to set them to any value. Formally modeling this requires careful distinction between flags that are (1) set according to the result, (2) cleared to 0, (3) set to 1, (4) left unchanged, or (5) left in an undefined state. The K semantics models undefined flags explicitly, enabling analyses that can reason about programs relying (potentially unsafely) on specific flag values after instructions that leave those flags undefined. This precise modeling is essential for applications such as compiler verification and binary analysis.
Applications of the Formal Semantics
The paper discusses several applications enabled by the formal semantics. These include program verification of assembly and binary code, co-simulation with hardware implementations for processor verification, compiler testing and validation by comparing compiled output behavior against source-level semantics, and binary analysis for security applications such as vulnerability detection. The formal semantics can also serve as a reference specification for developing new tools, such as disassemblers, decompilers, and binary rewriters. In the broader context of Pi2 Labs's work, this x86-64 semantics exemplifies how formal language definitions can underpin verifiable computing: by providing a mathematically precise definition of what each instruction does, one can generate proofs that a given binary execution produced the correct result, which can then be verified cryptographically.
Significance
The completeness and rigor of this work set a new standard for ISA formalization. Prior efforts either covered only subsets of the instruction set, used informal or semi-formal notations, or lacked systematic validation. By combining the scale of Strata's automated approach with the precision of manual K formalization and the rigor of extensive testing, Dasgupta et al. produced an artifact that is simultaneously complete, formal, executable, and empirically validated. The work demonstrates that even the most complex real-world instruction set architectures can be faithfully captured in a formal framework, opening the door to a new era of trustworthy systems software built on formally verified foundations.
Stay Informed
Sign up for updates and never miss important announcements.
Join the community
Join our Discord server to get support or connect with the Pi² community.