I came across this very interesting paper from Andreas Lööw, A Proof-Producing Translator for Verilog Development in HOL, and that piqued my curiosity. Speaking as a digital design architect, it would indeed be very interesting to be able to generate RTL code upon a foundation of proof.
Side note: more related papers
• Verified Compilation on a Verified Processor
• They’re the same picture: a software-verification flow adapted for hardware verification
I decided to give it a try, but there was just a little obstacle in front of me; my basic knowledge of the de facto standard for formal verification in hardware design (i.e. model checking and SystemVerilog assertions) and my utter ignorance in whatever is related to theorem proving. But, hey, how hard can it be? It’s just another programming language! This post is essentially the description of my first dive into the theorem proving ocean.
Setting up the toolchain and cloning the repo is trivial.
Feeling a sense of fearful reverence towards the topic, I decided to stay away from the complete processor stuff and focus on the simplest examples: the average calculator (a component calculating the mean of the last 4 input values) seemed an accessible example.
The project is composed by 2 files: avgCircuitScript.sml, which describes the implementation of the module and the verification proofs, and avgCircuitCompilerScript.sml, which implements the translation to Verilog.
Armed with documentation and a smart chatbot, I started unraveling the mysteries of HOL4 – here’s what I learned.
The code starts with the inclusion of the necessary libraries
open hardwarePreamble;
open translatorTheory translatorCoreLib;
open blastLib;
whose content I skipped, as I deemed it too advanced for my first dive into the topic.
Then comes the name of the current module/theory
val _ = new_theory "avgCircuit";
The next instruction
val _ = prefer_num ();
probably indicates that numbers are to be interpreted as naturals.
And then the juicy part starts.
Then comes the declaration of the datatypes:
Datatype:
avg_state_very_inner = <| h0 : word8 |>
End
Datatype:
avg_state_inner = <| very_inner : avg_state_very_inner; h1 : word8 |>
End
Datatype:
avg_state = <| inner : avg_state_inner; h2 : word8; h3 : word8; avg : word8 |>
End
Datatype:
avg_ext_state = <| signal : word8 |>
End
h0
, h1
, h2
and h3
represent the status of the system (it stores the last 4 input values).avg
is the output of the system and signal
is the input (note: the input seems to be often referred as “external state” in CakeML).
The notation <| ... |>
indicates that these types are records/structures.
All types are considered 8-bit words (word8
).
I couldn’t help but notice some odd design choices:
- the output is listed together with the status of the system
- part of the status is nested (
avg_state_inner
andavg_state_very_inner
)
Side note: I have modified the definition using a single structure as data type for the internal state and I obtained the same results.
Next is the definition of the division by 4 operation, implemented (as it would be in RTL) as right shift
Definition soft_div_by_4_def:
soft_div_by_4 s' = let
s' = s' with avg := (0 :+ word_bit 2 s'.avg) s'.avg;
s' = s' with avg := (1 :+ word_bit 3 s'.avg) s'.avg;
s' = s' with avg := (2 :+ word_bit 4 s'.avg) s'.avg;
s' = s' with avg := (3 :+ word_bit 5 s'.avg) s'.avg;
s' = s' with avg := (4 :+ word_bit 6 s'.avg) s'.avg;
s' = s' with avg := (5 :+ word_bit 7 s'.avg) s'.avg;
s' = s' with avg := (6 :+ F) s'.avg;
s' = s' with avg := (7 :+ F) s'.avg in
s'
End
soft_div_by_4_def
is just a label for the definition; the definition itself is soft_div_by_4.
s'
declared after soft_div_by_4
is the input argument of the definition; the last s'
is the result of the operation.
Everything contained between let
and in
is the implementation of the divide by 4 operation. So we have:
s' = s' with avg := (0 :+ word_bit 2 s'.avg) s'.avg;
means thats'
is the same ass'
, it’s elementavg
is equal tos'.avg
but it’s bit 0 must be updated with the value of bit 2s' = s' with avg := (6 :+ F) s'.avg;
is the same as the above, with the difference that but 6 inavg
must be replaced with zero (false)
The '
(prime) character is just part of the name and not a special character; I would interpret it as “the updated version of s
”. I think the same variable s'
is used both as initial state and as final state because it simplifies the notation; if we used a different variable for the initial state, only the assignment of bit 0 would have used it: after that, s'
contains the new value, to be used in the following assignments.
Afterwords comes the first proof, we want to demonstrate that the right shift by 2 bits equals a division by 4
Theorem soft_div_by_4_correct:
∀s. soft_div_by_4 s = s with avg := s.avg // 4w
Proof
rw [soft_div_by_4_def, theorem"avg_state_component_equality"] \\ BBLAST_TAC
QED
The theorem says that for any value of s
, if we apply soft_div_by_4
to it we get the same as if we updated its avg
field with its value divided by 4 (avg := s.avg // 4w
).
The demonstration is done by rewriting (rw
) the definition of the “divide by shifting” operation using the bit-blasting tactic (BBLAST_TAC
). From the cheatsheet: “bit-blasting (i.e. SAT) for goals concerning concrete word types (i.e. word32
and others)”.
We then define the way the averaging is performed
Definition avg_comb_def:
avg_comb (fext:avg_ext_state) (s:avg_state) (s':avg_state) = let
s' = s' with avg := s.inner.very_inner.h0 + s.inner.h1 + s.h2 + s.h3 in
soft_div_by_4 s'
End
We first add the 4 state values (s' = s' with avg := s.inner.very_inner.h0 + s.inner.h1 + s.h2 + s.h3
) and we apply the division by 4 using the previous definition.
Note that the value of some of the states is nested, because of how the states have been defined.
Also note that the arguments of avg_comb
have their type specified (e.g. the initial state s
must be of type avg_state
). Removing the type specified results in a correct proof (avgCircuitTheory
is compiled) but the Verilog is not generated – I don’t know why.
I was also wondering why both an initial state s
and a final state s'
have been indicated as arguments: in a previous definition we had just one. I tried simplifying the notation and using only s'
(I did the same also in the definition of avg_com_alt
and modified avg_comb_const
accordingly), but the compilation fails when processing avg = mk_module (procs [avg_ff]) (procs [avg_comb]) avg_init
. The error message suggests that procs
requires 3 arguments and the code confirm it.
The next theorem
Theorem avg_comb_trans = REWRITE_RULE [soft_div_by_4_def] avg_comb_def;
includes the definition of soft_div_by_4_def
into avg_comb_def
.
Then comes the proof for the implementation of the combinatorial logic (avg_comb_def
)
Theorem avg_comb_alt:
avg_comb fext s s' = let
s' = s' with avg := s.inner.very_inner.h0 + s.inner.h1 + s.h2 + s.h3 in
s' with avg := s'.avg // 4w
Proof
rw [avg_comb_def, soft_div_by_4_correct]
QED
It is basically saying that using the division to calculate the average is the same as doing the right shift. The proof is done by rewriting the statements and verifying that they are equal.
Another property which is worth verifying is that the combinatorial logic doesn’t alter the state of the system
Theorem avg_comb_const:
(avg_comb fext s s').inner = s'.inner ∧
(avg_comb fext s s').h2 = s'.h2 ∧
(avg_comb fext s s').h3 = s'.h3
Proof
simp [avg_comb_alt]
QED
(avg_comb fext s s').inner = s'.inner
means that “applying avg_comb
leave the inner
element of avg_state
unchanged. The same must be true also for h2
and h3
. avg
is notably not mentioned.
This terminates the design and verification of the combinatorial part of the module. Let’s have a look at the sequential part.
Definition avg_ff_def:
avg_ff (fext:avg_ext_state) (s:avg_state) (s':avg_state) = let
s' = s' with inner := (s'.inner with very_inner := (s'.inner.very_inner with h0 := fext.signal));
s' = s' with inner := (s'.inner with h1 := s.inner.very_inner.h0);
s' = s' with h2 := s.inner.h1 in
s' with h3 := s.h2
End
By now the syntax should not be scary any more: it is simply stating that h0 = signal
(signal
being the external state, i.e. the input), h1 = h0
and so on.
Last comes the definition of the initial state
val init_tm = add_x_inits “<| inner := <| very_inner := <| h0 := 0w; |>; h1 := 0w; |>; h2 := 0w; h3 := 0w |>”;
Definition avg_init_def:
avg_init fbits = ^init_tm
End
which consists of zeroing the h0
… h3
variables.^
extracts the value from the variable on its right.
Then we say that the average calculator is composed by the sequential and combinatorial functions and initial value defined before.
Definition avg_def:
avg = mk_module (procs [avg_ff]) (procs [avg_comb]) avg_init
End
We prove that the averaging module can be expressed using an alternative syntax
Theorem avg_alt:
avg = mk_module avg_ff avg_comb avg_init
Proof
simp [avg_def, procs_sing]
QED
procs_sing
(defined here) states that procs [x] = x
. Not sure why this is necessary.
Another property we have to verify is that the average is actually calculated over the last 4 most recent values of the input signal (and that otherwise 0 is considered).
We first define that we have to look at old samples or take the value 0
Definition signal_previously_def:
signal_previously fext n shift = if n < shift then 0w else (fext (n - shift)).signal
End
and then we specify that the average is calculated taking the last 4 values
Definition avg_spec_def:
avg_spec fext n =
((signal_previously fext n 1) +
(signal_previously fext n 2) +
(signal_previously fext n 3) +
(signal_previously fext n 4)) // 4w
End
and an invariant relationship: the previous input values are stored in the state register hx
and the average is the one we’ve specified before
Definition avg_inv_def:
avg_inv fext n s ⇔
s.inner.very_inner.h0 = signal_previously fext n 1 ∧
s.inner.h1 = signal_previously fext n 2 ∧
s.h2 = signal_previously fext n 3 ∧
s.h3 = signal_previously fext n 4 ∧
s.avg = avg_spec fext n
End
We then verify that avg
always maintains the invariant relationship
Theorem avg_inv_avg:
∀fext fbits n. avg_inv fext n (avg fext fbits n)
Proof
ntac 2 gen_tac \\ Induct >- EVAL_TAC \\
fs [avg_inv_def, avg_spec_def, avg_alt, mk_module_def, mk_circuit_def, avg_comb_const] \\
ntac 4 (conj_asm1_tac >-
(rw [avg_ff_def, signal_previously_def, arithmeticTheory.ADD1] \\ gs [])) \\
simp [avg_comb_alt]
QED
I couldn’t understand the proof, so I asked Claude.ai
~~~
This theorem (avg_inv_avg
) proves that the avg
module maintains its invariant at all time steps. Let’s break down the proof:
1. ∀fext fbits n. avg_inv fext n (avg fext fbits n)
- States that for all external inputs ( fext ), initial bits ( fbits ), and time steps ( n )
- The invariant avg_inv holds for the state produced by avg module
2. The proof steps:
ntac 2 gen_tac \\\\ Induct >- EVAL_TAC
- ntac 2 gen_tac : Handle the first two universal quantifiers ( fext and fbits )
- Induct : Do induction on n (time steps)
- EVAL_TAC : Base case (n=0) is solved by evaluation
3.
fs [avg_inv_def, avg_spec_def, avg_alt, mk_module_def, mk_circuit_def, avg_comb_const]
- Unfolds various definitions for the inductive case
4.
ntac 4 (conj_asm1_tac >-
(rw [avg_ff_def, signal_previously_def, arithmeticTheory.ADD1] \\\\ gs []))
- ntac 4 : Do this 4 times (likely for the 4 history registers h0-h3)
- conj_asm1_tac : Handle conjunctions in assumptions
- Rewrites using flip-flop definitions and signal history
- arithmeticTheory.ADD1 : Handles arithmetic for n+1 case
5.
simp [avg_comb_alt]
- Finishes proof using alternative combinational logic definition
The proof structure suggests it’s showing:
- The invariant holds initially (base case)
- For each history register, it maintains the correct previous value
- The average computation matches the specification
- If the invariant holds at n, it holds at n+1 (inductive step)
This is a typical structure for proving correctness of sequential circuits using invariants.
~~~
Lastly, we prove that our implementation of avg
matches the specifications avg_spec
Theorem avg_correct:
∀fext fbits n. (avg fext fbits n).avg = avg_spec fext n
Proof
metis_tac [avg_inv_avg, avg_inv_def]
QED
From the cheatsheet: metis_tac
is a “first-order decision procedure using the supplied theorems”.
And with this the D&V is over.
I haven’t focused that much on the Verilog emitter, avgCircuitCompilerScript.sml.
local
val module_def = avg_def;
val abstract_fields = [];
val outputs = ["avg"];
val comms = ["h0", "h1", "h2", "h3"];
in
val trans_thm = module2hardware_old avg_def abstract_fields outputs comms
end
module2hardware creates an AST which represents the Verilog code.
This can be converted to a string with
val verilogstr =
definition"avg_v_def"
|> REWRITE_RULE [definition"avg_v_seqs_def", definition"avg_v_combs_def", definition"avg_v_decls_def"]
|> concl
|> rhs
|> verilog_print "avg";
and printed to file with writeFile "avg.sv" verilogstr;
The next portion of the file synthesizes the module and again emits the corresponding netlist.
The behavioural verilog looks like this
`timescale 1ns / 1ps
module avg(
input clk,
input logic[7:0] signal,
output logic[7:0] avg = 'x
);
logic[7:0] h0 = 8'd0;
logic[7:0] h1 = 8'd0;
logic[7:0] h2 = 8'd0;
logic[7:0] h3 = 8'd0;
always_comb begin
avg = ((h0 + h1) + h2) + h3;
avg[0] = avg[2];
avg[1] = avg[3];
avg[2] = avg[4];
avg[3] = avg[5];
avg[4] = avg[6];
avg[5] = avg[7];
avg[6] = 0;
avg[7] = 0;
end
always_ff @ (posedge clk) begin
h0 <= signal;
h1 <= h0;
h2 <= h1;
h3 <= h2;
end
endmodule
That’s cool right? Instinctively I would say a big “YES!” – a proof-producing translation tool exists!
But a deeper look at the code shows that it is wrong. The problem is that adding 4 8bit words requires a 10bit signal, while avg is only 8bit wide.
If you don’t believe it, we can go the old way: this testbench
module avg_tb;
logic clk;
logic[7:0] signal = 8'h40;
logic[7:0] mean;
initial begin
clk = 0;
forever #5 clk = ~clk;
end
initial begin
$display("start: %x", mean);
@(posedge clk);
$display("step: %x", mean);
@(posedge clk);
$display("step: %x", mean);
@(posedge clk);
$display("step: %x", mean);
@(posedge clk);
$display("step: %x", mean);
@(posedge clk);
$display("step: %x", mean);
#100
$display("end: %x", mean);
$finish();
end
avg i_avg (
.clk,
.signal,
.avg (mean)
);
endmodule
produces
# start: xx
# step: 00
# step: 10
# step: 20
# step: 30
# step: 00
# end: 00
What surprises me most is that the verification flow has not identified this problem. Why is that?
I would say that the specifications are wrong: they specify word8
as data type for the result of the addition. It’s easy to have this sort of problems when both design and verification are performed by the same person. An alternative approach would be to split the design and verification: e.g. have one file which only defines the functionality of the module (and is used to generate the Verilog) and another file which describes the proves, decoupling the two processes. I have tried removing all the theorems from avgCircuitScript.sml (with the exception of avg_comb_trans
otherwise the Verilog translation failed) and I get the same output as above.
I also wonder whether it would be possible to add a check for overflows in the theorem prover logic – this situation is quite common, both in hardware and in software.
Side note: I have checked with Andreas, who is aware of the problem. He commented that this is just an example and he wanted to keep it simple. I understand his approach.
To conclude: this finding does not invalidate the whole framework, I am still very curious and I want to understand more – and possibly use it for real-word problems; we just have to be conscious of its how it works and ideally collaborate further to make an awesome project out of it. That’s the beauty of open source!
Next step: modify the specifications to fix the problem just described. Stay tuned for more!