I came across this very interesting paper from Andreas Lööw, A Proof-Producing Translator for Verilog Development in HOL, and that piqued my curiosity. Speaking as a digital design architect, it would indeed be very interesting to be able to generate RTL code upon a foundation of proof.

Side note: more related papers

Verified Compilation on a Verified Processor
They’re the same picture: a software-verification flow adapted for hardware verification

I decided to give it a try, but there was just a little obstacle in front of me; my basic knowledge of the de facto standard for formal verification in hardware design (i.e. model checking and SystemVerilog assertions) and my utter ignorance in whatever is related to theorem proving. But, hey, how hard can it be? It’s just another programming language! This post is essentially the description of my first dive into the theorem proving ocean.

Setting up the toolchain and cloning the repo is trivial.
Feeling a sense of fearful reverence towards the topic, I decided to stay away from the complete processor stuff and focus on the simplest examples: the average calculator (a component calculating the mean of the last 4 input values) seemed an accessible example.

The project is composed by 2 files: avgCircuitScript.sml, which describes the implementation of the module and the verification proofs, and avgCircuitCompilerScript.sml, which implements the translation to Verilog.

Armed with documentation and a smart chatbot, I started unraveling the mysteries of HOL4 – here’s what I learned.

The code starts with the inclusion of the necessary libraries

open hardwarePreamble;
open translatorTheory translatorCoreLib;
open blastLib;

whose content I skipped, as I deemed it too advanced for my first dive into the topic.

Then comes the name of the current module/theory

val _ = new_theory "avgCircuit";

The next instruction

val _ = prefer_num ();

probably indicates that numbers are to be interpreted as naturals.

And then the juicy part starts.

Then comes the declaration of the datatypes:

Datatype:
 avg_state_very_inner = <| h0 : word8 |>
End

Datatype:
 avg_state_inner = <| very_inner : avg_state_very_inner; h1 : word8 |>
End

Datatype:
 avg_state = <| inner : avg_state_inner; h2 : word8; h3 : word8; avg : word8 |>
End

Datatype:
 avg_ext_state = <| signal : word8 |>
End

h0, h1, h2 and h3 represent the status of the system (it stores the last 4 input values).
avg is the output of the system and signal is the input (note: the input seems to be often referred as “external state” in CakeML).
The notation <| ... |> indicates that these types are records/structures.
All types are considered 8-bit words (word8).

I couldn’t help but notice some odd design choices:

  • the output is listed together with the status of the system
  • part of the status is nested (avg_state_inner and avg_state_very_inner)

Side note: I have modified the definition using a single structure as data type for the internal state and I obtained the same results.

Next is the definition of the division by 4 operation, implemented (as it would be in RTL) as right shift

Definition soft_div_by_4_def:
 soft_div_by_4 s' = let
  s' = s' with avg := (0 :+ word_bit 2 s'.avg) s'.avg;
  s' = s' with avg := (1 :+ word_bit 3 s'.avg) s'.avg;
  s' = s' with avg := (2 :+ word_bit 4 s'.avg) s'.avg;
  s' = s' with avg := (3 :+ word_bit 5 s'.avg) s'.avg;
  s' = s' with avg := (4 :+ word_bit 6 s'.avg) s'.avg;
  s' = s' with avg := (5 :+ word_bit 7 s'.avg) s'.avg;
  s' = s' with avg := (6 :+ F) s'.avg;
  s' = s' with avg := (7 :+ F) s'.avg in
 s'
End

soft_div_by_4_def is just a label for the definition; the definition itself is soft_div_by_4.

s' declared after soft_div_by_4 is the input argument of the definition; the last s' is the result of the operation.

Everything contained between let and in is the implementation of the divide by 4 operation. So we have:

  • s' = s' with avg := (0 :+ word_bit 2 s'.avg) s'.avg; means that s' is the same as s', it’s element avg is equal to s'.avg but it’s bit 0 must be updated with the value of bit 2
  • s' = s' with avg := (6 :+ F) s'.avg; is the same as the above, with the difference that but 6 in avg must be replaced with zero (false)

The ' (prime) character is just part of the name and not a special character; I would interpret it as “the updated version of s”. I think the same variable s' is used both as initial state and as final state because it simplifies the notation; if we used a different variable for the initial state, only the assignment of bit 0 would have used it: after that, s' contains the new value, to be used in the following assignments.

Afterwords comes the first proof, we want to demonstrate that the right shift by 2 bits equals a division by 4

Theorem soft_div_by_4_correct:
 ∀s. soft_div_by_4 s = s with avg := s.avg // 4w
Proof
 rw [soft_div_by_4_def, theorem"avg_state_component_equality"] \\ BBLAST_TAC
QED

The theorem says that for any value of s, if we apply soft_div_by_4 to it we get the same as if we updated its avgfield with its value divided by 4 (avg := s.avg // 4w).
The demonstration is done by rewriting (rw) the definition of the “divide by shifting” operation using the bit-blasting tactic (BBLAST_TAC). From the cheatsheet: “bit-blasting (i.e. SAT) for goals concerning concrete word types (i.e. word32 and others)”.

We then define the way the averaging is performed

Definition avg_comb_def:
 avg_comb (fext:avg_ext_state) (s:avg_state) (s':avg_state) = let
  s' = s' with avg := s.inner.very_inner.h0 + s.inner.h1 + s.h2 + s.h3 in
  soft_div_by_4 s'
End

We first add the 4 state values (s' = s' with avg := s.inner.very_inner.h0 + s.inner.h1 + s.h2 + s.h3) and we apply the division by 4 using the previous definition.
Note that the value of some of the states is nested, because of how the states have been defined.
Also note that the arguments of avg_comb have their type specified (e.g. the initial state s must be of type avg_state). Removing the type specified results in a correct proof (avgCircuitTheory is compiled) but the Verilog is not generated – I don’t know why.
I was also wondering why both an initial state s and a final state s' have been indicated as arguments: in a previous definition we had just one. I tried simplifying the notation and using only s'(I did the same also in the definition of avg_com_alt and modified avg_comb_const accordingly), but the compilation fails when processing avg = mk_module (procs [avg_ff]) (procs [avg_comb]) avg_init. The error message suggests that procs requires 3 arguments and the code confirm it.

The next theorem

Theorem avg_comb_trans = REWRITE_RULE [soft_div_by_4_def] avg_comb_def;

includes the definition of soft_div_by_4_def into avg_comb_def.

Then comes the proof for the implementation of the combinatorial logic (avg_comb_def)

Theorem avg_comb_alt:
 avg_comb fext s s' = let
  s' = s' with avg := s.inner.very_inner.h0 + s.inner.h1 + s.h2 + s.h3 in
  s' with avg := s'.avg // 4w
Proof
 rw [avg_comb_def, soft_div_by_4_correct]
QED

It is basically saying that using the division to calculate the average is the same as doing the right shift. The proof is done by rewriting the statements and verifying that they are equal.

Another property which is worth verifying is that the combinatorial logic doesn’t alter the state of the system

Theorem avg_comb_const:
 (avg_comb fext s s').inner = s'.inner ∧
 (avg_comb fext s s').h2 = s'.h2 ∧
 (avg_comb fext s s').h3 = s'.h3
Proof
 simp [avg_comb_alt]
QED

(avg_comb fext s s').inner = s'.inner means that “applying avg_comb leave the inner element of avg_state unchanged. The same must be true also for h2 and h3. avg is notably not mentioned.

This terminates the design and verification of the combinatorial part of the module. Let’s have a look at the sequential part.

Definition avg_ff_def:
 avg_ff (fext:avg_ext_state) (s:avg_state) (s':avg_state) = let
  s' = s' with inner := (s'.inner with very_inner := (s'.inner.very_inner with h0 := fext.signal));
  s' = s' with inner := (s'.inner with h1 := s.inner.very_inner.h0);
  s' = s' with h2 := s.inner.h1 in
  s' with h3 := s.h2
End

By now the syntax should not be scary any more: it is simply stating that h0 = signal (signal being the external state, i.e. the input), h1 = h0 and so on.

Last comes the definition of the initial state

val init_tm = add_x_inits “<| inner := <| very_inner := <| h0 := 0w; |>; h1 := 0w; |>; h2 := 0w; h3 := 0w |>”;

Definition avg_init_def:
 avg_init fbits = ^init_tm
End

which consists of zeroing the h0h3 variables.
^ extracts the value from the variable on its right.

Then we say that the average calculator is composed by the sequential and combinatorial functions and initial value defined before.

Definition avg_def:
 avg = mk_module (procs [avg_ff]) (procs [avg_comb]) avg_init
End

We prove that the averaging module can be expressed using an alternative syntax

Theorem avg_alt:
 avg = mk_module avg_ff avg_comb avg_init
Proof
 simp [avg_def, procs_sing]
QED

procs_sing (defined here) states that procs [x] = x. Not sure why this is necessary.

Another property we have to verify is that the average is actually calculated over the last 4 most recent values of the input signal (and that otherwise 0 is considered).
We first define that we have to look at old samples or take the value 0

Definition signal_previously_def:
 signal_previously fext n shift = if n < shift then 0w else (fext (n - shift)).signal
End

and then we specify that the average is calculated taking the last 4 values

Definition avg_spec_def:
 avg_spec fext n =
  ((signal_previously fext n 1) +
   (signal_previously fext n 2) +
   (signal_previously fext n 3) +
   (signal_previously fext n 4)) // 4w
End

and an invariant relationship: the previous input values are stored in the state register hx and the average is the one we’ve specified before

Definition avg_inv_def:
 avg_inv fext n s ⇔
  s.inner.very_inner.h0 = signal_previously fext n 1 ∧
  s.inner.h1 = signal_previously fext n 2 ∧
  s.h2 = signal_previously fext n 3 ∧
  s.h3 = signal_previously fext n 4 ∧
  s.avg = avg_spec fext n
End

We then verify that avg always maintains the invariant relationship

Theorem avg_inv_avg:
 ∀fext fbits n. avg_inv fext n (avg fext fbits n)
Proof
 ntac 2 gen_tac \\ Induct >- EVAL_TAC \\
 fs [avg_inv_def, avg_spec_def, avg_alt, mk_module_def, mk_circuit_def, avg_comb_const] \\
 ntac 4 (conj_asm1_tac >-
         (rw [avg_ff_def, signal_previously_def, arithmeticTheory.ADD1] \\ gs [])) \\
 simp [avg_comb_alt]
QED

I couldn’t understand the proof, so I asked Claude.ai

~~~

This theorem (avg_inv_avg) proves that the avg module maintains its invariant at all time steps. Let’s break down the proof:

1. ∀fext fbits n. avg_inv fext n (avg fext fbits n)

  • States that for all external inputs ( fext ), initial bits ( fbits ), and time steps ( n )
  • The invariant avg_inv holds for the state produced by avg module

2. The proof steps:

ntac 2 gen_tac \\\\ Induct >- EVAL_TAC

  • ntac 2 gen_tac : Handle the first two universal quantifiers ( fext and fbits )
  • Induct : Do induction on n (time steps)
  • EVAL_TAC : Base case (n=0) is solved by evaluation

3.

fs [avg_inv_def, avg_spec_def, avg_alt, mk_module_def, mk_circuit_def, avg_comb_const]

  • Unfolds various definitions for the inductive case

4.

ntac 4 (conj_asm1_tac >-
(rw [avg_ff_def, signal_previously_def, arithmeticTheory.ADD1] \\\\ gs []))

  • ntac 4 : Do this 4 times (likely for the 4 history registers h0-h3)
  • conj_asm1_tac : Handle conjunctions in assumptions
  • Rewrites using flip-flop definitions and signal history
  • arithmeticTheory.ADD1 : Handles arithmetic for n+1 case

5.

simp [avg_comb_alt]

  • Finishes proof using alternative combinational logic definition

The proof structure suggests it’s showing:

  1. The invariant holds initially (base case)
  2. For each history register, it maintains the correct previous value
  3. The average computation matches the specification
  4. If the invariant holds at n, it holds at n+1 (inductive step)

This is a typical structure for proving correctness of sequential circuits using invariants.

~~~

Lastly, we prove that our implementation of avg matches the specifications avg_spec

Theorem avg_correct:
 ∀fext fbits n. (avg fext fbits n).avg = avg_spec fext n
Proof
 metis_tac [avg_inv_avg, avg_inv_def]
QED

From the cheatsheet: metis_tac is a “first-order decision procedure using the supplied theorems”.

And with this the D&V is over.

I haven’t focused that much on the Verilog emitter, avgCircuitCompilerScript.sml.

local
 val module_def = avg_def;
 val abstract_fields = [];
 val outputs = ["avg"];
 val comms = ["h0", "h1", "h2", "h3"];
in
 val trans_thm = module2hardware_old avg_def abstract_fields outputs comms
end

module2hardware creates an AST which represents the Verilog code.

This can be converted to a string with

val verilogstr =
 definition"avg_v_def"
 |> REWRITE_RULE [definition"avg_v_seqs_def", definition"avg_v_combs_def", definition"avg_v_decls_def"]
 |> concl
 |> rhs
 |> verilog_print "avg";

and printed to file with writeFile "avg.sv" verilogstr;

The next portion of the file synthesizes the module and again emits the corresponding netlist.

The behavioural verilog looks like this

`timescale 1ns / 1ps

module avg(
  input clk,
  input logic[7:0] signal,
  output logic[7:0] avg = 'x
);

logic[7:0] h0 = 8'd0;
logic[7:0] h1 = 8'd0;
logic[7:0] h2 = 8'd0;
logic[7:0] h3 = 8'd0;

always_comb begin
avg = ((h0 + h1) + h2) + h3;
avg[0] = avg[2];
avg[1] = avg[3];
avg[2] = avg[4];
avg[3] = avg[5];
avg[4] = avg[6];
avg[5] = avg[7];
avg[6] = 0;
avg[7] = 0;
end

always_ff @ (posedge clk) begin
h0 <= signal;
h1 <= h0;
h2 <= h1;
h3 <= h2;
end

endmodule

That’s cool right? Instinctively I would say a big “YES!” – a proof-producing translation tool exists!

But a deeper look at the code shows that it is wrong. The problem is that adding 4 8bit words requires a 10bit signal, while avg is only 8bit wide.
If you don’t believe it, we can go the old way: this testbench

module avg_tb;

logic clk;

logic[7:0] signal = 8'h40;
logic[7:0] mean;

initial begin
    clk = 0;
    forever #5 clk = ~clk;
end

initial begin
    $display("start: %x", mean);
    @(posedge clk);
    $display("step: %x", mean);
    @(posedge clk);
    $display("step: %x", mean);
    @(posedge clk);
    $display("step: %x", mean);
    @(posedge clk);
    $display("step: %x", mean);
    @(posedge clk);
    $display("step: %x", mean);
    #100
    $display("end: %x", mean);
    $finish();

end

avg i_avg (
    .clk,
    .signal,
    .avg (mean)
);

endmodule

produces

# start: xx
# step: 00
# step: 10
# step: 20
# step: 30
# step: 00
# end: 00

What surprises me most is that the verification flow has not identified this problem. Why is that?
I would say that the specifications are wrong: they specify word8 as data type for the result of the addition. It’s easy to have this sort of problems when both design and verification are performed by the same person. An alternative approach would be to split the design and verification: e.g. have one file which only defines the functionality of the module (and is used to generate the Verilog) and another file which describes the proves, decoupling the two processes. I have tried removing all the theorems from avgCircuitScript.sml (with the exception of avg_comb_trans otherwise the Verilog translation failed) and I get the same output as above.

I also wonder whether it would be possible to add a check for overflows in the theorem prover logic – this situation is quite common, both in hardware and in software.

Side note: I have checked with Andreas, who is aware of the problem. He commented that this is just an example and he wanted to keep it simple. I understand his approach.

To conclude: this finding does not invalidate the whole framework, I am still very curious and I want to understand more – and possibly use it for real-word problems; we just have to be conscious of its how it works and ideally collaborate further to make an awesome project out of it. That’s the beauty of open source!

Next step: modify the specifications to fix the problem just described. Stay tuned for more!