2026-01-05

Meeting Preparation

Summary

Fine tuned llama3.2-8B-bnb-4bit using unsloth
Used a reduced rank than the paper
But there is a fundamental flaw in this paper
If they have a deterministic way of extracting these properties, there is no need for an LLM
Maybe the point of the project was to improve the conditional extraction capabilities of an LLM

Setup CVA-6 verification environment for the following configuration cv64a60ax
- cv64 uses R64 and is 64 bit
- a6 is 6 pipe line stages
- 0ax is the product identifier

20260105T110941-chatgpt_cva6_verification_results
This is just covers
It is not looking great
I haven’t tried Coverage measurement yet. There was some error when I naively tried it.

liCoverageDrivenFormalMethodology2019 recommend 100% functional coverage, which requires us to write extra assertions, 100% COI coverage and all properties being proven
- This paper is from Intel and it is intended to be applied to actual designers. So the above constraints are so that they don’t face a loss after fabrication
- To test out our generation methods, these may be overkill
Jasper Gold recommends to start with COI too

I took a look at 4 papers 20260102T143933-sota_assertion_gen
baiAssertionForgeEnhancingFormal2025 seems to be the best
- They compare against fangAssertLLMGeneratingEvaluating2024 but the problem is AssertLLM is a pre-RTL methodology. So I wasn’t sure if it was the apt comparison
- AssertionForge actually has good coverage
maliChIRAAGChatGPTInformed2024 coverts specification and RTL into a JSON format (essentially structure it somehow). They compare against OpenTitan’s own assertions or lack thereof. But OpenTitan plan to write assertions but it is not their focus now.
guptaSANGAMSystemVerilogAssertion2025 is mostly prompt engineering.

All of the above methods involve some kind of synthesis of specification documents and RTL
But as of now we have only looked at generating assertions from pure RTL with no spec document

I wasn’t satisfied with the organization of this project. So I was working on cleaning it up. I decided to replicate some of the benchmarks.
I noticed that the verified assertions by HARM and Texada have very poor coverage
Example arithmetic_core_2d_fht/fht_1d_x8
This has 597 assertions
Out of this only 133 are proven and reachable
240 are vacuous proofs
224 are wrong
I am currently working on Coverage for these. I am not really sure how to set things up properly
Even some of proven and reachable properties are useless.

assert property(@(posedge sclk) ((x_data >= 0) && (x_data <= 63)) |-> (x_data >= 0) && (x_data <= 124));

The supposedly golden assertions had 0% toggle coverage. I have to look more into what toggle coverage means and if I’m setting that up correctly