2025.2 Series Release Notes

v5.0.0-beta.1-392

New Features

  • Valkey service is now available on Atmosphere. This is required service for introduce Octavia Amphora V2 support.

  • Add specific helm-toolkit patch on 0.2.78. This will allow DB drop and init job compatible with SQLAlchemy 2.0

  • Octavia Amphere V2 is now supported and enable by default with Atmosphere. The Amphora V2 provider driver improves control plane resiliency. Should a control plane host go down during a load balancer provisioning operation, an alternate controller can resume the in-process provisioning and complete the request. This solves the issue with resources stuck in PENDING_* states by writing info about task states in persistent backend and monitoring job claims via jobboard.

  • Add confluent-kafka Python package to OpenStack images to enable the use of Kafka for notifications.

  • The Keystone role now supports additional parameters when creating the Keycloak realm to allow for the configuration of options such as password policy, brute force protection, and more.

  • Added support for deploying the frr-k8s chart for BGP routing with OVN. Introduced the ovn_bgp_agent_enabled flag. When set to true, the frr-k8s chart will be automatically installed before OVN deployment.

  • Add glance_image_tempfile_path variable to allow users for changing the temporary path for downloading images before uploading them to Glance.

  • Keycloak is now configured to have the token-exchange and the admin-fine-grained-authz features enabled to allow for use of the OAuth Token Exchange protocol.

  • The Keystone role now supports configuring multi-factor authentication for the users within the Atmosphere realm.

  • Add Neutron plugins for neutron-dynamic-routing and networking-generic-switch. These modules enable support for Neutron BGP agents and Ironic networking.

  • Add support for Neutron policy check when perform port update with add address pairs. This will add a POST method /address-pair. It will check if both ports (to be paired) are created within same project. With this check, we can give non-admin user to operate address pair binding without risk on expose resource to other projects.

  • The ovn-bgp-agent has been added to the chart. The ovn-bgp-agent is deployed as a DaemonSet within the OVN Helm chart.

  • Add OVN BGP Agent image build.

  • Introduced a new Rust-based binary ovsinit which focuses on handling the migration of IP addresses from a physical interface to an OVS bridge during the Neutron or OVN initialization process.

  • Added udev rules for Pure Storage devices to optimize iSCSI LUN performance. The rules: - Set the I/O scheduler to none for improved throughput. - Reduce CPU usage by disabling entropy collection. - Balance CPU load by directing I/O completions to the originating CPU. - Increase the HBA timeout to 60 seconds for reliable I/O operations.

  • Adding basic Atmosphere upgrade process.

  • It is now possible to configure DPDK interfaces using the interface names in addition to possibly being able to use the pci_id to ease deploying in heterogeneous environments.

  • All roles that deploy Ingress resources as part of the deployment process now support the ability to specify the class name to use for the Ingress resource. This is done by setting the <role>_ingress_class_name variable to the desired class name.

  • Introduced the ability to specify a prefix for image names. This allows for easier integration with image proxies and caching mechanisms, eliminating the need to maintain separate inventory overrides for each image.

  • It’s now possible to use the default TLS certificates configured within the ingress by using the ingress_use_default_tls_certificate variable which will omit the tls section from any Ingress resources managed by Atmosphere.

  • Barbican now supports multiple KEKs in configuration. The config value .conf.simple_crypto_plugin_rewrap.old_kek now accepts comma-separated strings for KEK lists, and multiple .conf.barbican.simple_crypto_plugin.kek values can now be specified. The first key in the comma-separated .conf.simple_crypto_plugin_rewrap.old_kek string is used for encrypting new data, while additional keys are used for decrypting existing data. This behavior is consistent with .conf.barbican.simple_crypto_plugin.kek.

  • The Barbican role now allows users to configure the priorityClassName and the runtimeClassName for all of the different components of the service.

  • The Storpool driver has been updated from the Bobcat release to the Caracal release.

  • Upgraded OpenStack service containers from Ubuntu 22.04 (Jammy) to Ubuntu 24.04 (Noble). All images now run on the latest Ubuntu LTS release with improved security and enhanced system libraries.

  • Upgraded OpenStack service containers from Python 3.10 to 3.12, delivering significant performance improvements and better memory management while maintaining backward compatibility.

  • The Cinder role now allows users to configure the priorityClassName and the runtimeClassName for all of the different components of the service.

  • The Designate role now allows users to configure the priorityClassName and the runtimeClassName for all of the different components of the service.

  • Atmosphere previously deactivated the Keystone auth token cache due to bug https://tracker.ceph.com/issues/64094. This issue is now resolved upstream, making it safe to reactivate the cache in the new version of Ceph which includes the fix (18.2.7).

  • The Atmosphere project now includes the Tap-as-a-Service (TaaS) extension for the OpenStack Neutron networking service. This feature introduces local and remote port mirroring capabilities, enabling tenants and cloud administrators to monitor and debug complex virtual networks by capturing and analyzing network traffic associated with virtual machines.

  • Applied the same pod affinity rules used for OVN NB/SB sts’s to northd deployment and changed the default pod affinity rules from preferred during scheduling to required during scheduling.

  • The ovn-northd service did not have liveness probes enabled which can result in the pod failing readiness checks but not being automatically restarted. The liveness probe is now enabled by default which will restart any stuck ovn-northd processes.

  • The Glance role now allows users to configure the priorityClassName and the runtimeClassName for all of the different components of the service.

  • The Heat role now allows users to configure the priorityClassName and the runtimeClassName for all of the different components of the service.

  • The Horizon role now allows users to configure the priorityClassName and the runtimeClassName for all of the different components of the service.

  • The Ironic role now allows users to configure the priorityClassName and the runtimeClassName for all of the different components of the service.

  • The Keystone role now allows users to configure the priorityClassName and the runtimeClassName for all of the different components of the service.

  • The OpenStack database exporter has been updated and the collection of Octavia metrics happens through it only.

  • Added alerting for amphoras to cover cases for when an Amphora becomes in ERROR state or not ready for an unexpected duration.

  • The Magnum role now allows users to configure the priorityClassName and the runtimeClassName for all of the different components of the service.

  • The Manila role now allows users to configure the priorityClassName and the runtimeClassName for all of the different components of the service.

  • Adjust Neutron policy server to network scope checks for port update or delete operations. This will improve scope check when Neutron goes through policy for port update or delete when allowed-address-pair binding exists.

  • The Neutron role now allows users to configure the priorityClassName and the runtimeClassName for all of the different components of the service.

  • The Nova role now allows users to configure the priorityClassName and the runtimeClassName for all of the different components of the service.

  • The Octavia role now allows users to configure the priorityClassName and the runtimeClassName for all of the different components of the service.

  • The Open vSwitch container image now uses a more centralized location at ghcr.io/vexxhost/docker-openvswitch. This provides better maintainability and a dedicated repository for the Open vSwitch container images. The image now uses a specific version tag (v3.3.6-2) for better reproducibility and stability.

  • Neutron now supports using the built-in DHCP agent when using OVN (Open Virtual Network) for cases when DHCP relay is necessary.

  • The Placement role now allows users to configure the priorityClassName and the runtimeClassName for all of the different components of the service.

  • The ovn-controller image is now being pre-pulled on the nodes prior to the Helm chart being deployed. This will help reduce the time it takes to switch over to the new version of the ovn-controller image.

  • The Staffeln role now allows users to configure the priorityClassName and the runtimeClassName for all of the different components of the service.

  • Add required neutron plugin to support port mirroring with OVN backend.

  • Update the frr-k8s webhook server runs on the control plane.

  • Upgrade Percona XtraDB Cluster operator from 1.14.0 to 1.16.1 and Percona XtraDB Cluster from 8.0.36-28.1 to 8.0.41-32.1. This update includes performance improvements and bug fixes.

Known Issues

  • The MTU for the metadata interfaces for OVN was not being set correctly, leading to a mismatch between the MTU of the metadata interface and the MTU of the network. This has been fixed with a Neutron change to ensure the neutron:mtu value in external_ids is set correctly.

Upgrade Notes

  • Bump Cert-Manager from v1.12.10 to v1.12.17 to address a breaking change in Cloudflare’s API which impacted ACME DNS-01 challenges using Cloudflare.

  • Bump Kubernetes collection from 2.0.1 to 2.3.2 fix bugs and add new features.

  • Bump the Cluster API driver for Magnum from 0.30.0 to 0.31.2 to improve stability, fix bugs and add new features.

  • Bump OVN from 24.03.1-44 to 24.03.2.34.

    • Upgraded Portworx CSI operator to version 25.2.1 from 23.10.5 for improved stability and performance.

    • Updated Portworx OCI monitor to version 25.4.0 from 3.1.1 to support the latest operator features.

    • Upgraded RabbitMQ operator to version 2.16.1 from 2.9.0 for improved stability and performance.

    • Upgraded RabbitMQ server to version 4.1.4 from 3.13.3 for improved stability and performance.

    • RabbitMQ 4.1.x supports upgrades from 3.13.x and 4.0.x versions.

  • The max_allowed_packet setting increased from 4M (the default in MySQL 5.x) to 16M to support larger queries. Because MySQL 8.x uses a new default of 64M, the configuration no longer specifies this setting.

  • Upgrade Cluster API driver for Magnum to 0.26.0.

  • Upgrade CAPI and CAPO version to 1.10.5 and 0.12.4 respectively.

Security Issues

  • The Horizon service now runs as the non-privileged user horizon in the container.

  • The Horizon service ALLOWED_HOSTS setting is now configured to point to the configured endpoints for the service.

  • The CORS headers are now configured to only allow requests from the configured endpoints for the service.

  • Set libvirt’s TLS remote API port 16514 to use TLS 1.3 only to improve service security.

  • Upgrade nginx ingress controller from 1.10.1 to 1.12.1 to fix CVE-2025-1097 CVE-2025-1098, CVE-2025-1974, CVE-2025-24513, CVE-2025-24514.

Bug Fixes

  • Applied patch 948053 to resolve database synchronization issues between Neutron and Open Virtual Network (OVN) for log resources. This patch addresses bug 2107925 where the neutron_pg_drop table could be incorrectly deleted during synchronization when existing log resources are present. The fix also updates the Access Control List (ACL) table to maintain proper synchronization of log resources between the Neutron and OVN databases.

  • The [privsep_osbrick]/helper_command configuration value was not configured in both of the Cinder and Nova services, which lead to the inability to run certain CLI commands since it instead tried to do a plain sudo instead. This has been fixed by adding the missing helper command configuration to both services.

  • The dmidecode package which is required by the os-brick library for certain operations was not installed on the images that needed it, which can cause NVMe-oF discovery issues. The package has been added to all images that require it.

  • The [cinder]/auth_type configuration value was not set resulting in the entire Cinder section not being rendered in the configuration file, it is now set to password which will fully render the Cinder section for OpenStack Nova.

  • The nova user within the nova-ssh image was missing the SHELL build argument which would cause live & cold migrations to fail, this has been resolved by adding the missing build argument.

  • The generic switch networking driver now uses a coordination backend to enable a distributed lock on switches.

  • During a Neutron or OVN initialization process, the routes assigned to the physical interface are now removed and added to the OVS bridge to maintain the connectivity of the host.

  • The Cluster API driver for Magnum has been bumped to 0.28.0 to improve stability, fix bugs and add new features.

  • The Cluster API driver for Magnum has been bumped to 0.27.0 to improve stability, fix bugs and add new features.

  • The Cluster API driver for Magnum has been bumped to 0.26.2 to address bugs around cluster deletion.

  • The Open vSwitch version has been bumped to 3.3.0 in order to resolve packet drops include Packet dropped. Max recirculation depth exceeded. log messages in the Open vSwitch log.

  • This change fixes a regression where Cinder volume creation fails with error FailedToDropPrivileges. Since update to Cinder 24.0.0, Cinder-Ceph container needs access to more capabilities for operations such as boot from volume or create a volume from an image.

  • This fix introduces a kernel option to adjust aio-max-nr, ensuring that the system can handle more asynchronous I/O events, preventing VM startup failures related to AIO limits.

  • Fixed containers failing to validate TLS certificates on Red Hat-based systems. The issue occurred when mounting the OpenSSL trusted certificate bundle (/etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt) which uses the “TRUSTED CERTIFICATE” format that’s incompatible with Go applications. The configuration now uses the standard PEM format bundle (/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem) on Red Hat systems, which resolves certificate validation errors.

  • Added a custom build of Cluster API driver for OpenStack which includes fixes unblocking upgrades of Magnum clusters created using a specific network or subnet configuration.

  • Corrected Cinder authentication configuration handling in Nova. Nova now respects authentication overrides defined in OpenStack Helm endpoints, such as openstack_helm_endpoints_nova_region_name.

  • In an OVN deployment where external (baremetal) ports connect to VLAN networks, you need to bind the internal router port associated with the network to the same ha_chassis_group as the network. This setup mimics how the external port of the router functions in relation to the upstream gateway.

    In essence, the baremetal ports aren’t able to communicate with their default gateway if either the internal router port is unbound or if the vrouter doesn’t have an external gateway set, with the external router port bound to the same exact chassis and with the same exact priorities as the ha_chassis_group of the VLAN network.

  • The Ironic agent for Neutron uses the internal API endpoint by default to avoid hitting the public endpoint unnecessarily.

  • Manila now uses Nova micro-version 2.60 by default. This change enables support for attaching multiple volumes to an instance.

  • Manila now connects to the internal Nova and Glance endpoints instead of the public ones. This improves performance and reduces reliance on external network paths.

  • Fixed an issue in the Manila service image where the fetch-public-ssh-keys systemd service could start too early in the boot process, before the instance metadata service or network was fully available. This caused failures to retrieve and install SSH public keys.

  • Fixed an issue where the neutron-ironic-agent service failed to start.

  • Fixed the node-exporter Prometheus monitoring configuration by setting the nodeExporterSelector to filter metrics by job="node-exporter" label. This ensures that node-exporter dashboards and alerts correctly reference the appropriate metrics.

  • Addressed an issue where instances not booted from volume would fail to resize. This issue was caused by a missing trailing newline in the SSH key, which led to misinterpretation of the key material during the resize operation. Adding proper handling of SSH keys ensures that the resize process works as intended for all instances.

  • Fixed the OAuth2 Proxy configuration to enable API access using valid JWT tokens without requiring interactive login. Previously, OAuth2 Proxy enforced login for all requests by default. This change lets the Alertmanager API and other services behind OAuth2 Proxy support programmatic access via JWT tokens.

  • Fix OctaviaAmphoraNotOperational monitoring rule to exclude DELETED Amphora status.

  • Fix OctaviaAmphoraNotReady monitoring rule to recognize both READY and ALLOCATED as valid Amphora statuses. Previously, the monitoring rule fired for Amphora instances in ALLOCATED status, which is a normal operational state. The monitoring rule now uses the name OctaviaAmphoraNotOperational to better reflect its purpose of detecting non-operational Amphora instances.

  • Improve alert generation for load balancers that have a non-ACTIVE provisioning state despite an ONLINE operational state. Previously, if a load balancer was in a transitional state such as PENDING_UPDATE (provisioning_state) while still marked as ONLINE (operational_state), the gauge metric openstack_loadbalancer_loadbalancer_status{provisioning_status!="ACTIVE"} did not trigger an alert. This update addresses the issue by ensuring that alerts are properly generated in these scenarios.

  • Add required OVN VPN configuration files to Neutron server so VPN features behave as expected. The Neutron server receives RPC calls from the Neutron OVN VPN agent and executes VPN operations. Therefore, VPN configurations must be present on the Neutron server.

  • When use OVS with DPDK, by default both OVS and OVN run with root user, this may cause issue that QEMU can’t write vhost user socket file in openvswitch runtime directory (/run/openvswitch). This has been fixed by config Open vSwitch and OVN componments to run with non root user id 42424 which is same with QEMU and other OpenStack services inside the container.

  • The CI tooling for pinning images has been fixed to properly work after a regression caused by the introduction of the atmosphere_image_prefix variable.

  • Increased the liveness probe timeouts for the Percona XtraDB Cluster. The configuration now sets timeoutSeconds to 60 and failureThreshold to 100. This change helps the cluster remain responsive and prevents unnecessary restarts during prolonged operations.

  • Changed the liveness check from the MySQL exporter sidecar to a readiness check. The sidecar should wait indefinitely for the main containers and shouldn’t terminate database pods. Especially during long SST operations. This change improves the cluster’s stability during extended operations.

  • Resolve the issue where the QEMU VNC and API TLS certificate fails to renew, preventing access to the virtual machine (VM) console via the dashboard and causing live migration failures.

  • Make sure that Staffeln Cinder policy honors the atmosphere_staffeln_enabled setting with boolean values.

  • The documentation for using the vTPM was pointing to the incorrect metadata properties for images. This has been corrected to point to the correct metadata properties.

  • Fix two redundant securityContext problems in statefulset-compute-ironic.yaml template.

  • Checking DB transaction already starts in barbican kek rewrap. And use nested transaction if DB session already starts it’s root transaction.

  • Fixed an issue preventing automatic certificate renewal for Octavia load balancers. The fix ensures proper TLS certificate mounting for job board communication between Octavia components and Valkey, enabling certificates to renew correctly.

  • Fixed type errors in networking-generic-switch when users pass numeric configuration values as strings. The driver now automatically converts port numbers and timeout values to their correct types (int or float), preventing ConnectHandler failures when establishing connections to network devices.

  • Switched Valkey and Redis exporter images to Bitnami legacy repository due to Bitnami retiring their main registry. The upstream Valkey images don’t work out of the box, so this serves as a temporary workaround.

  • The designate-producer service runs a single replica instead of three to avoid issues with DNS zone serial updates. This is a workaround until the service has proper centralized locking.

  • Upgrade the libvirt Helm chart from 0.1.27 to 1.1.0 to address critical issues with pod termination on systems using newer kernels. The updated chart includes proper mounting of the misc cgroup controller, which resolves failures where pods were unable to terminate correctly. This fix ensures stable pod lifecycle management in environments with modern kernel versions.

  • The Cluster API driver for Magnum is now configured to use the internal endpoints by default in order to avoid going through the ingress and leverage client-side load balancing.

Other Notes

  • Add documentation about database backup and restore procedures.

  • The documentation has been updated to include release notes for all of the current supported Atmosphere releases.

  • Updated Helm Toolkit dependency from version 0.2.69 to 2025.1.8. This update includes improved template consistency, enhanced support for newer Kubernetes versions, and updated helper functions for better maintainability.

  • The Atmosphere collection now uses the new major version of the OpenStack collection as a dependency.

  • The libvirt exporter image switch to use ghcr.io/inovex/prometheus-libvirt-exporter, offering greater stability and performance on libvirt metrics collection.

  • The upload jobs have been removed from the gate pipeline and replaced by the same build jobs since we use the intermediate registry to store the images.

  • The project has adopted the use of reno for release notes, ensuring that all changes include it from now on to ensure proper release notes.

  • The heavy CI jobs are now skipped when release notes are changed.

  • The image build process has been refactored to use docker-bake which allows us to use context/built images from one target to another, allowing for a much easier local building experience. There is no functional change in the images.

  • The images now use the uv tool to create the virtual environment which is faster and more reliable than the previous method.