../
Meeting Preparation
Summary
VERT
- Fine tuned
llama3.2-8B-bnb-4bitusingunsloth - Used a reduced rank than the paper
- But there is a fundamental flaw in this paper
- If they have a deterministic way of extracting these properties, there is no need for an LLM
- Maybe the point of the project was to improve the conditional extraction capabilities of an LLM
Nuances to writing Assertions
- Writing assertions is very very hard
- Seemingly simple Natural Language sentences are very hard to describe in LTL
- Usage of operators such as
$post, $stableetc. - There are many properties that you just cannot describe using pure LTL
- Need to generate auxiliary Verilog code
CVA-6 formal verification setup
- Setup CVA-6 verification environment for the following configuration
cv64a60axcv64uses R64 and is 64 bita6is 6 pipe line stages0axis the product identifier
ChatGPT results
- 20260105T110941-chatgpt_cva6_verification_results
- This is just covers
- It is not looking great
- I haven’t tried Coverage measurement yet. There was some error when I naively tried it.
Best coverage metric?
- liCoverageDrivenFormalMethodology2019 recommend 100% functional coverage, which requires us to write extra assertions, 100% COI coverage and all properties being proven
- This paper is from Intel and it is intended to be applied to actual designers. So the above constraints are so that they don’t face a loss after fabrication
- To test out our generation methods, these may be overkill
- Jasper Gold recommends to start with COI too
Some SOTA methods
- I took a look at 4 papers 20260102T143933-sota_assertion_gen
- baiAssertionForgeEnhancingFormal2025 seems to be the best
- They compare against fangAssertLLMGeneratingEvaluating2024 but the problem is AssertLLM is a pre-RTL methodology. So I wasn’t sure if it was the apt comparison
- AssertionForge actually has good coverage
- maliChIRAAGChatGPTInformed2024 coverts specification and RTL into a JSON format (essentially structure it somehow). They compare against OpenTitan’s own assertions or lack thereof. But OpenTitan plan to write assertions but it is not their focus now.
- guptaSANGAMSystemVerilogAssertion2025 is mostly prompt engineering.
Takeaways
- All of the above methods involve some kind of synthesis of specification documents and RTL
- But as of now we have only looked at generating assertions from pure RTL with no spec document
Benchmark used
- OpenTitan hw blocks guptaSANGAMSystemVerilogAssertion2025, maliChIRAAGChatGPTInformed2024,
- Opencores baiAssertionForgeEnhancingFormal2025, fangAssertLLMGeneratingEvaluating2024
- pulavarthiAssertionBenchBenchmarkEvaluate2025 also uses Opencore
AssertionBench Recreation
- I wasn’t satisfied with the organization of this project. So I was working on cleaning it up. I decided to replicate some of the benchmarks.
- I noticed that the verified assertions by HARM and Texada have very poor coverage
- Example
arithmetic_core_2d_fht/fht_1d_x8 - This has 597 assertions
- Out of this only 133 are proven and reachable
- 240 are vacuous proofs
- 224 are wrong
- I am currently working on Coverage for these. I am not really sure how to set things up properly
- Even some of proven and reachable properties are useless.
assert property(@(posedge sclk) ((x_data >= 0) && (x_data <= 63)) |-> (x_data >= 0) && (x_data <= 124));
- The supposedly golden assertions had 0% toggle coverage. I have to look more into what toggle coverage means and if I’m setting that up correctly