Comparative Analysis Of The Commonly Used Code Generated By Large Language Models For MISRA C Compliance

2026-1
Öztop, Umut
This report presents whether Large Language Models (LLMs) like ChatGPT and Gemini can generate safety-critical C code compliant with MISRA C:2012. We tasked six models with implementing CRC16 and COBS algorithms, verifying the output with PC-lint Plus and functional tests. Our results show a clear drop in compliance as algorithmic complexity increases. While models excelled at the simple CRC16 task (Gemini averaging 0.3 violations), the memory-intensive COBS task caused widespread safety failures, especially regarding single-point exit rules. We observed a distinct "compliance paradox": Claude produced the fewest violations but frequently generated broken code, whereas ChatGPT achieved 100% functional accuracy but penalized its safety score by adding unrequested complexity. Ultimately, while LLMs show promise as prototyping assistants, they cannot yet autonomously generate certification-ready embedded software.
Citation Formats
U. Öztop, “Comparative Analysis Of The Commonly Used Code Generated By Large Language Models For MISRA C Compliance,” M.S. - Master Of Science Without Thesis, Middle East Technical University, 2026.