CI/CD for Bare-Metal Embedded Development
CI/CD for Bare-Metal Embedded Development
Practical guide to automating build, flashing, and testing of microcontrollers
Why is this needed?
Many embedded developers are used to working without automated tests, relying on manual testing and debugging via a programmer. This seems like a simple and quick solution for small projects. However, as the codebase and team grow, this approach leads to critical problems: bugs return in new releases, system knowledge is stored only in developers’ heads, and every change requires lengthy manual testing on a bench.
CI/CD automation for embedded systems solves these problems, although it requires initial effort to set up the infrastructure.
The harsh truth about embedded development
Typical excuses without tests:
- “It’s a complex problem” = “I don’t know where the bug is"
- "Need to test on the bench” = “Hope it works"
- "It’s a hardware problem” = “Don’t want to dig into the code”
Tests provide objectivity:
- Either the test passes
- Or the test fails
- Or there are no tests
A situation that repeats in 90% of companies:
- Found a bug on the bench (at best, if there is a bench)
- Developer debugs for a week via J-Link
- Changes one line of code
- ”Fixed it!”
- Commits only the fix without tests
- After 2 months the bug returns in a new release
- New developer spends another week searching
What really happens:
// BAD: typical "fix" through debugging
// Was:
if (adc_value > threshold) {
set_alarm();
}
// Became after 5 days of debugging:
if (adc_value > threshold && !is_calibrating) {
set_alarm();
}
Nobody will know:
- Why exactly this logic?
- What edge cases were considered?
- How to reproduce the problem?
- What was checked?
The right approach: Test-Driven Bug Fixing:
// 1. Write a test that fails
TEST(AlarmTest, ShouldNotTriggerDuringCalibration) {
start_calibration();
set_adc_value(threshold + 100); // Value above threshold
process_alarm_logic();
ASSERT_FALSE(alarm_triggered()); // Test fails!
}
// 2. Fix the code
if (adc_value > threshold && !is_calibrating) {
set_alarm();
}
// 3. Test passes
// 4. Now we have a regression test FOREVER
Benefits for management
Developer report without tests:
- “Fixed a bug with false alarm triggers”
- Time: 5 days
- Changes: 1 line of code
- Guarantees: “seems to work”
System report with tests:
- Added regression tests (including one reproducing the bug): 150 lines
- Fixed bug: 3 lines
- Code coverage: +15%
- Guarantees: automatic check on every commit
Benefits for developers
Without tests:
- “Need to remember all my workarounds” — cognitive load
- ”This bug returned again” — endless refactoring of the same places
- ”It’s not my bug, hardware is glitching” — constant disputes with hardware team
- ”Need to flash 10 boards manually” — boring routine instead of development
With tests:
- “My 100 tests confirm the fix works” — confidence in code
- ”CI failed — my code broke something” — instant feedback
- ”Here’s a test proving the hardware problem” — documented bug reports
- ”Flashed 10 boards with one commit” — automated routine
Benefits for the team
Without tests:
- “Who broke this?” — looking for culprits instead of solving problems
- ”Only Vasya’s works, let him fix it” — knowledge bottleneck
- ”We can’t hire a new developer — they won’t understand anything” — bus factor = 1
- ”This is legacy code, better not touch it” — fear of changes
With tests:
- “CI showed that Peter broke GPIO” — objective diagnosis
- ”Anyone can modify code — tests will catch issues” — collective ownership
- ”Newcomer wrote working feature in a week” — fast onboarding
- ”Refactor confidently — 200 tests confirm functionality” — architecture evolution
Benefits for the product
Without tests:
- “Release and pray” — Russian roulette with releases
- ”Worked on the bench…” — gap between dev and prod
- ”Client found a bug we didn’t see for 2 years” — market embarrassment
- ”Can’t add features — everything will fall apart” — technical debt
With tests:
- “Release every Tuesday” — predictable process
- ”CI tests on real hardware” — identical bench and production
- ”Client bugs reproduced in 5 minutes” — fast response
- ”Adding features without fear” — development speed
Why might this not be needed?
If your goal is to become an “irreplaceable” developer, the only person who understands the code and can fix it, then automated testing really isn’t for you. Tests make code transparent, understandable, and accessible to the entire team.
Your career strategy without tests:
- “This is a very complex system” = “Only I know how it works"
- "Better not touch it” = “My job security"
- "Need deep context” = “I’m irreplaceable"
- "This is legacy code” = “My personal family jewel”
General pipeline
What’s required for minimal CI?
The bare minimum is to at least build the project and flash the microcontroller. Sounds simple, but in practice situations often arise where a project builds for one developer but not for another. This can be due to different compiler versions, missing dependencies, or code changes that broke the build for certain environments.
Finding the cause of such problems can take hours. CI solves this problem: each commit is automatically built in a clean environment, guaranteeing that the build isn’t broken and the project can be reproduced on any machine.
Let’s examine the necessary tools with a concrete example.
Required components
-
GitHub or GitLab — version control system with CI/CD support. All examples in this article will be for GitHub. Simply create a new repository, we’ll return to it later.
-
Build and test server — this can be a regular computer, Raspberry Pi, or even a virtual machine. GitHub provides free servers (runners), but with time execution limitations. For embedded development where access to real hardware is needed, typically a self-hosted runner is used.
Alternative: VCON — third-party service for remote device access. Used, for example, by the Mongoose project. Works like this: ESP32 with VCON firmware connects to Wi-Fi and registers on their server, playing the role of over-the-air programmer. Target device connects to it, and through CI you can upload firmware, read logs, etc.
VCON Pros:
- Everything ready to use out of the box
- No need to configure own server
- Access to device from anywhere in the world
VCON Cons:
- Device limitations
- Dependency on external service
- Non-standard programmer (not suitable for field use)
-
Programmer — I’ll consider J-Link, as it provides convenient tools for working with RTT (Real-Time Transfer). Technically you can use any programmer (ST-Link, CMSIS-DAP, etc.), but J-Link gives more automation capabilities.
-
Target device or development bench — microcontroller or board on which tests will run.
CI Architecture for embedded systems
┌─────────────┐ ┌──────────────────┐ ┌────────────────┐
│ GitHub │────────>│ Self-hosted │────────>│ J-Link │
│ Repository │ │ Runner │ │ Programmer │
│ │ │ (Linux/Mac/Win) │ │ │
└─────────────┘ └──────────────────┘ └────────┬───────┘
│
│ SWD/JTAG
│
v
┌──────────────┐
│ Target MCU │
│ (STM32F103) │
└──────────────┘
For regular software, cloud CI servers from GitHub/GitLab are sufficient — you can test on different operating systems without additional equipment. But in embedded development, access to real hardware is needed, so a self-hosted server with connected programmer and target device is required.
If you only need to verify that firmware builds, standard GitHub Actions without hardware are sufficient. But for full functional testing, a self-hosted runner is needed.
Software on the server
-
GitHub Actions Runner — agent that executes CI tasks. Downloaded and registered in your repository through GitHub settings (Settings → Actions → Runners → New self-hosted runner). After registration, runs as a background service and waits for commands from GitHub. You can run multiple runners for different devices, marking them with tags.
-
J-Link Software — utilities for working with J-Link programmer. Includes command-line tools for flashing and reading RTT. I recommend J-Link specifically thanks to SEGGER RTT — fast debug output technology without delays.
-
CMake (or other build system) — the example uses CMake as a cross-platform meta-build system. You can use Make, Meson, or other tools of your choice.
-
Python — for automating flashing and analyzing test results. The
pylinklibrary allows programmatic J-Link control. -
ARM GCC Toolchain — compiler for ARM microcontrollers (
arm-none-eabi-gcc).
Installing necessary packages on Ubuntu/Debian:
# ARM toolchain
sudo apt-get install gcc-arm-none-eabi
# CMake
sudo apt-get install cmake
# Python and dependencies
sudo apt-get install python3 python3-pip
pip3 install pylink-square
Project example with CI
runit project structure
Let’s examine a real example from the runit library — a framework for unit testing on bare-metal systems:
runit/
├── .github/
│ ├── workflows/
│ │ └── build.yml # CI configuration
│ └── scripts/
│ ├── flashing.py # Flashing script
│ └── units.py # Test running script
├── src/
│ ├── runit.h # Library header
│ └── runit.c # Implementation
├── examples/
│ └── f103re-cmake-baremetal-builtin/
│ ├── CMakeLists.txt # Build configuration
│ ├── main.c # MCU tests
│ ├── startup_stm32f103xe.s
│ └── STM32F103RETX_FLASH.ld
└── tst/
└── selftest.c # Linux tests
GitHub Actions Workflow
Create CI with five stages:
- Clone repository to server
- Configure CMake
- Build project
- Flash microcontroller
- Run tests on device
File .github/workflows/build.yml:
name: Build Runit Selftest
on: [pull_request]
jobs:
# Job 1: Linux testing (without hardware)
linux_build:
runs-on: self-hosted
steps:
- uses: actions/checkout@v4
- name: Configure and Build project
run: |
cmake -S . -B build
cmake --build build
- name: Run selftest
run: ./build/runit-selftest
# Job 2: STM32F103 testing (with real hardware)
stm32f103re_build:
runs-on: self-hosted
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
fetch-depth: 1
- name: Configure and Build project
run: |
cmake -S examples/f103re-cmake-baremetal-builtin -B examples/f103re-cmake-baremetal-builtin/build
cmake --build examples/f103re-cmake-baremetal-builtin/build
- name: Flash firmware
run: |
python3 .github/scripts/flashing.py ${{ secrets.JLINK_SERIAL_CI_STM32F103RE }} STM32F103RE examples/f103re-cmake-baremetal-builtin/build/example_f103re.bin
- name: Unit tests
run: |
python3 .github/scripts/units.py ${{ secrets.JLINK_SERIAL_CI_STM32F103RE }} STM32F103RE
Important points:
runs-on: self-hosted— specifies using own runner, not GitHub cloud serversecrets.JLINK_SERIAL_CI_STM32F103RE— secret variable with J-Link programmer serial number. Configured in Settings → Secrets → Actions of your repository. This protects your device from unauthorized access.- Two independent jobs run in parallel: one for Linux version of library, another for microcontroller.
Detailed stage breakdown
Stage 1: Linux build (cross-platform verification)
The runit library is cross-platform — works both on microcontrollers and regular OSes. So the first job simply builds and runs tests on Linux:
cmake -S . -B build
cmake --build build
./build/runit-selftest
If executable returns exit code not equal to 0, CI is considered failed. This is the standard approach for unit tests in Unix systems.
Stages 2-5: Build, flash, and test on STM32
Now let’s move to the most interesting part — automated flashing and testing on real microcontroller:
1. Project build
cmake -S examples/f103re-cmake-baremetal-builtin -B examples/f103re-cmake-baremetal-builtin/build
cmake --build examples/f103re-cmake-baremetal-builtin/build
At this stage we guarantee the project builds without errors. If build fails — the problem is localized, and we know that changes broke compilation.
Bonus: Binary file can be saved in GitHub Actions artifacts and used for flashing a batch of devices or provided to team for testing without needing to build locally.
2. Microcontroller flashing
Python script .github/scripts/flashing.py is used:
import sys, os
import pylink
from pylink import JLink
def flash_device_by_usb(jlink_serial: int, fw_file: str, mcu: str) -> None:
jlink = pylink.JLink()
jlink.open(serial_no=jlink_serial)
if jlink.opened():
jlink.set_tif(pylink.enums.JLinkInterfaces.SWD)
jlink.connect(mcu)
print(jlink.flash_file(fw_file, 0x08000000))
jlink.reset(halt=False)
jlink.close()
def main():
try:
jlink_serial = int(sys.argv[1].strip())
mcu = sys.argv[2].strip()
fw_file = os.path.abspath(sys.argv[3].strip())
flash_device_by_usb(jlink_serial, fw_file, mcu)
except Exception as e:
print(f"Error: {e}")
sys.exit(1)
if __name__ == "__main__":
main()
Script accepts three parameters:
- J-Link programmer serial number
- MCU name (e.g.,
STM32F103RE) - Path to firmware binary file
If flashing fails, script returns error code 1, and CI stops.
3. Run tests and read results via RTT
The most interesting part — how to get test results from microcontroller?
SEGGER RTT — fast data transfer technology
SEGGER RTT (Real-Time Transfer) — bidirectional data transfer technology between target device and host via debug interface (SWD/JTAG). Developed by SEGGER.
RTT advantages:
- High speed — up to 2 MB/sec
- No delays — doesn’t block program execution
- No additional pins required (like UART or SWO) — uses existing debug interface. So even without SWO this solution works
- Bidirectional communication — can not only read data but send commands
How it works:
- Small buffer allocated in MCU RAM (usually 1-16 KB)
- MCU code writes data to this buffer (
SEGGER_RTT_printf()) - Programmer reads data from buffer via SWD/JTAG
- Python script on host receives and analyzes this data
RTT disadvantage: Limited buffer size. If there are too many logs and they don’t get read in time, overwriting occurs and some data is lost. Solution — increase buffer size or optimize log output.
Python script for running tests
File .github/scripts/units.py:
import sys, re, time
import pylink
from pylink import JLink
def remove_ansi_colors(text: str) -> str:
"""Remove ANSI color codes from text"""
return re.sub(r"\x1b\[[0-9;]*m", "", text)
def run_tests_by_rtt(jlink: JLink, duration: float = 10.0) -> bool:
has_error = False
try:
jlink.rtt_start()
start_time = time.time()
while True:
elapsed = time.time() - start_time
if elapsed >= duration:
break
response = jlink.rtt_read(0, 1024)
if response:
text = remove_ansi_colors(bytes(response).decode("utf-8", errors="ignore"))
# Parse test results
for line in text.splitlines():
# Look for report lines: "REPORT | File: ... | Passes: X | Failures: Y"
match = re.search(
r'REPORT\s*\|\s*File:\s*(.*?)\s*\|\s*Test case:\s*(.*?)\s*\|\s*Passes:\s*(\d+)\s*\|\s*Failures:\s*(\d+)',
line
)
if match:
passed = match.group(3)
failed = match.group(4)
print(f"Test result: {passed} passed, {failed} failed")
if failed != '0':
has_error = True
elif "All tests passed successfully!" in line:
has_error = False
print("All tests passed successfully!")
elif line.strip():
print(line)
finally:
jlink.rtt_stop()
return has_error
def main():
jlink_serial = int(sys.argv[1].strip())
mcu = sys.argv[2].strip()
jlink = pylink.JLink()
jlink.open(serial_no=jlink_serial)
jlink.set_tif(pylink.enums.JLinkInterfaces.SWD)
jlink.connect(mcu)
has_error = run_tests_by_rtt(jlink, 10.0)
jlink.close()
if has_error:
sys.exit(1)
if __name__ == "__main__":
main()
How it works:
- Script connects to J-Link programmer
- Starts RTT connection
- Microcontroller after reset starts executing tests and outputs results via
SEGGER_RTT_printf() - Script reads output in real-time (10 seconds)
- Parses results by pattern and determines if tests passed
- Returns error code if there are failed tests
Example code with tests on MCU
File examples/f103re-cmake-baremetal-builtin/main.c contains runit library self-testing:
#include <stm32f103xe.h>
#include "runit.h"
static size_t expected_failures_counter = 0;
#define SHOULD_FAIL(failing) \
printf("Expected failure: "); \
expected_failures_counter++; \
failing
static void test_eq(void)
{
runit_eq(12, 12);
runit_eq(12.0f, 12U);
SHOULD_FAIL(runit_eq(100, 1)); // This test should fail
}
static void test_gt(void)
{
runit_gt(100, 1);
SHOULD_FAIL(runit_gt(1, 100)); // This test should fail
}
static void test_fapprox(void)
{
runit_fapprox(1.0f, 1.0f);
runit_fapprox(1.0f, 1.000001f);
SHOULD_FAIL(runit_fapprox(1.0f, 1.1f)); // This test should fail
}
int main(void)
{
test_eq();
test_gt();
test_fapprox();
runit_report(); // Outputs final report
if (expected_failures_counter != runit_counter_assert_failures)
printf("Expected %u failures, but got %u\n",
expected_failures_counter, runit_counter_assert_failures);
else
printf("All tests passed successfully!\n");
for (;;) {} // Infinite loop
return 0;
}
Important: For RTT output, the _write function is redefined to use SEGGER_RTT_PutChar(). This allows using standard printf() in test code, and all output automatically goes to RTT buffer.
Example of _write redefinition in syscalls.c file:
#include "SEGGER_RTT.h"
__attribute__((weak)) int _write(int file, char* ptr, int len)
{
for (int i = 0; i < len; i++)
{
SEGGER_RTT_PutChar(0, ptr[i]);
}
return len;
}
The weak attribute allows redefining this function elsewhere in the project if needed.
The runit_report() function outputs one line with cumulative statistics of executed tests. You can call runit_report() multiple times in different places of the program — each call will output a separate report with accumulated statistics. To reset counters between test groups, you need to zero internal library variables.
RTT output looks like this:
REPORT | File: main.c:42 | Test case: main | Passes: 5 | Failures: 3
All tests passed successfully!
Python script parses this output and determines the result.
Extended testing capabilities
Test organization strategies
Your project may have multiple build targets:
- Production build — final production firmware without debug code
- Test build — special version with unit tests for libraries and modules
- Debug build — working firmware with
DEBUGflag, where self-test module is enabled by conditional compilation
Choice of approach depends on your needs and capabilities:
Option 1: Separate test project
# CMakeLists.txt for tests
add_executable(firmware_tests
tests/test_main.c
tests/test_uart.c
tests/test_modbus.c
src/uart.c
src/modbus.c
)
Option 2: Conditional test compilation
#ifdef DEBUG_TESTS
static void run_all_tests(void) {
test_uart();
test_modbus();
test_eeprom();
runit_report();
}
#endif
int main(void) {
system_init();
#ifdef DEBUG_TESTS
// Tests run on command via RTT
if (check_rtt_command("run_tests")) {
run_all_tests();
}
#endif
// Main firmware code
while(1) {
main_loop();
}
}
Personally, I use the build flag approach and added the ability to invoke tests via RTT commands. This allows:
- Not rebuilding firmware to run tests
- Running tests at any time on running device
- Testing specific modules on demand
Protocol and interface testing
Python script can interact not only with microcontroller via RTT but also test production firmware via real interfaces:
Example: Modbus RTU testing
Device should communicate via Modbus RTU. Connect it to CI server via corresponding interface and run Python tests:
import serial
from pymodbus.client import ModbusSerialClient
def test_modbus_valid_requests():
"""Check valid requests"""
client = ModbusSerialClient(port='/dev/ttyUSB0', baudrate=9600)
# Read registers
result = client.read_holding_registers(address=0, count=10, slave=1)
assert not result.isError(), "Registers should read correctly"
assert len(result.registers) == 10
# Write register
result = client.write_register(address=0, value=100, slave=1)
assert not result.isError(), "Should be able to write"
# Check write
result = client.read_holding_registers(address=0, count=1, slave=1)
assert result.registers[0] == 100, "Value should persist"
def test_modbus_invalid_requests():
"""Check invalid request handling"""
client = ModbusSerialClient(port='/dev/ttyUSB0', baudrate=9600)
# Non-existent address
result = client.read_holding_registers(address=9999, count=1, slave=1)
assert result.isError(), "Should return error for non-existent address"
# Corrupted data (wrong CRC)
# Device should ignore such packets
with serial.Serial('/dev/ttyUSB0', 9600) as ser:
ser.write(b'\x01\x03\x00\x00\x00\x0A\xFF\xFF') # Wrong CRC
time.sleep(0.5)
response = ser.read_all()
assert len(response) == 0, "Corrupted packets should be ignored"
if __name__ == "__main__":
test_modbus_valid_requests()
test_modbus_invalid_requests()
print("All Modbus tests passed!")
Such tests verify:
- Correct handling of valid data
- Proper input data validation
- Predictable behavior with incorrect requests
- Protocol specification compliance
Similarly you can test:
- CAN interface — sending/receiving messages, bus error handling
- Ethernet/TCP — connection establishment, link break handling
- I2C/SPI — peripheral interaction
- GPIO — signal level checking, timing
- Performance measurement — response time, throughput
Main argument for implementing CI
For those who care but are lazy:
No more need to:
- Convince yourself to retest everything after each change
- Worry something broke if you didn’t retest
- Remember which modules depend on changed code
- Spend time manually testing same scenarios
CI does this for you:
- Retests all scenarios automatically
- Points exactly where the problem is
- Runs on every Pull Request
- Added a new test? It runs forever
Real time savings: Made changes to UART library? CI automatically runs:
- Unit tests of library itself
- Integration tests with Modbus (which uses UART)
- Communication protocol tests
- Memory leak checks
- Timing validation
All this — without your involvement, in minutes, with exact indication of problem location.
Example from real project: In the BMPLC device, EEPROM library work is automatically tested (in local development, tests can also be run manually via RTT interface commands). Test suite checks critical memory operation scenarios:
void run_eeprom_tests(void) {
eeprom_partial_page_write_test(); // Partial page write correctness
eeprom_size_limit_test(); // Protection from memory bounds overflow
eeprom_multi_page_write_test(); // Multi-page write (several pages at once)
eeprom_random_access_test(); // Random access to different addresses
runit_report();
}
These tests reveal typical problems:
- Address space bounds overflow (32 KB for AT24C256)
- Errors when writing data larger than page size
- Basic write and read correctness checks
Important point: From constant CI runs, test device can exhaust EEPROM resource (usually 100,000 - 1,000,000 write cycles). Similarly, MCU Flash memory degrades from frequent flashing. But this is a small price for code quality confidence — replacing one test device costs incomparably less than an error in production.
If tests suddenly start failing:
- Run old, verified firmware version → tests pass → memory works, problem in new code
- Run old version → tests fail → test device exhausted resource, replace it
Without automatic tests, such library error could get into production and lead to data corruption on all devices requiring mass reflashing.
Step-by-step implementation guide
Step 1: Repository preparation
- Create repository on GitHub
- Add
.github/workflows/build.ymlwith CI configuration - Create
.github/scripts/folder for Python scripts
Step 2: Server setup
-
Install necessary dependencies:
sudo apt-get install gcc-arm-none-eabi cmake python3 python3-pip pip3 install pylink-square -
Download J-Link Software from SEGGER website
-
Register GitHub Actions Runner:
- Open Settings → Actions → Runners → New self-hosted runner
- Follow instructions for your OS
- Run runner as service
Step 3: Equipment connection
- Connect J-Link to server via USB
- Connect target device to J-Link via SWD/JTAG
- Check connection:
JLinkExe→connect→ specify MCU - Find serial number:
$ lsusb -vor viaJLinkExeutility
Step 4: Secrets configuration
- In repository: Settings → Secrets and variables → Actions → New repository secret
- Add
JLINK_SERIAL_CIwith programmer serial number
Step 5: Adding tests
-
Integrate
runit(or other test framework) into your project:git submodule add https://github.com/RoboticsHardwareSolutions/runit.git libs/runit -
Add SEGGER RTT to project (via CMake FetchContent or manually)
-
Write tests in
runitstyle:void test_my_function(void) { runit_eq(my_function(5), 25); runit_gt(my_function(10), 90); } -
In
main()call tests andrunit_report()
Step 6: First run
- Create Pull Request
- GitHub Actions automatically starts CI
- Check execution logs
- If needed, debug by running tests locally with same scripts
Conclusion
CI/CD automation for embedded systems requires initial effort but pays off many times over:
- Accelerated development through fast feedback
- Protection from regressions and repeat bugs
- Objective code quality metrics
- Simplified teamwork and onboarding
- Confidence in every release
The runit library and described approach are just a starting point. You can extend the testing system for your needs: add coverage analysis, integration with test benches, automatic release creation, and much more.
Start small — automate build and basic tests. Gradually add new checks. And remember: every automatic test is an investment in stability and speed of your development.
Useful links
- runit on GitHub — unit testing framework
- SEGGER RTT — RTT documentation
- pylink-square — Python library for J-Link
- GitHub Actions — CI/CD documentation