Resolved -
This incident has been resolved.
Nov 25, 12:30 CET
Monitoring -
We are pleased to confirm that all production tenants have been successfully restored and services are now operational.
We are now addressing any remaining individual tenant-specific concerns through our standard support channels. Our engineering team continues to actively monitor all systems to ensure stability.
We will transition this incident to monitoring status and plan to fully resolve it within the days, provided no new issues arise.
Nov 20, 11:25 CET
Update -
Recovery work continued as planned yesterday. We deployed an additional tenant and made significant progress on the Authentication Hub migration to AWS infrastructure. Module deployment is ongoing and will continue today, followed by testing before the switchover.
The status remains consistent with our previous update - nearly all tenants are operational, with final deployment work continuing for the remaining tenants.
Nov 19, 07:18 CET
Update -
Recovery work is resuming today following the weekend. The status remains consistent with our Friday update - nearly all tenants are operational, with final deployment work continuing for the remaining tenants pending DNS and database configuration.
We will continue with Authentication Hub migration and version 6.2.2 deployment this week.
Nov 18, 13:22 CET
Update -
Recovery progress continues as planned. Nearly all tenants have been successfully restored and are operational.
We are completing deployment of the remaining tenants, pending DNS configuration confirmation for several tenants. We are also migrating Authentication Hub to AWS infrastructure and implementing a Logstash fix.
Nov 14, 09:47 CET
Update -
Nearly all tenants running on version 6 has been successfully restored and is operational. Our team is actively working on the final remaining tenants.
Today, we will deploy version 6.2.2 to all restored tenants to address a critical logging infrastructure issue. This will be performed with minimal service impact.
Recovery work continues for the remaining tenants running on version 5. We will provide updates as this work progresses.
Nov 13, 09:50 CET
Update -
Authentication Hub is restored and functional. We will proceed with recovering the remaining tenants.
Nov 12, 09:55 CET
Update -
We have deployed the first set of tenants and are now monitoring their operation, fine-tuning resources and limits to ensure smooth performance. Once validated, we will proceed with deploying the remaining tenants.
Nov 11, 17:47 CET
Update -
Tenant restoration has begun with the first tenants currently in progress. Our team is working through the restoration process, including database restoration, DNS configuration, and SSL certificate setup. Once the first tenants are fully restored and verified, we will continue systematically with the remaining tenants.
Nov 11, 13:02 CET
Update -
New AWS infrastructure has been prepared over the weekend and is now ready for tenant restoration. We are completing final testing today to ensure stability before beginning the restoration process. Tenant restoration will begin today in a phased approach, starting with tenants that have simpler configurations, then proceeding systematically through all remaining tenants. We're rebuilding Docker images in AWS ECR, with versions 6.2.1, 6.1.9, and 6.1.8 already available, and other versions to follow. Work on recovering the Ceph cluster continues in parallel.
Nov 11, 09:29 CET
Update -
Preparation works to restore the hosted customers continue. In the meantime, the new AWS ECR registry is ready with the latest version (older versions will be restored later). Access credentials will be sent to all self-managed customers shortly.
Nov 10, 13:15 CET
Update -
We have prepared a new AWS cluster over the weekend to restore hosted customers running on version 6 from backups. Restoration will be happening during the day. Our team is also continuing work on the full recovery of the Docker image registry. Additionally, we’re working toward restoring AuthHub later today, though this depends on how the remaining recovery tasks progress.
Nov 10, 10:00 CET
Update -
Recovery of the Ceph storage system has proven more complex and slower than anticipated. We are therefore executing a contingency plan to recreate the cluster in an alternative environment and restore all tenants from backups. The new cluster has been deployed, and we are currently validating the tenant recovery process.
Nov 9, 17:58 CET
Update -
~19% of OSD left to recover. Ceph is slowly recovering. Work on recovering our registry continues.
Nov 8, 10:45 CET
Update -
76% OSDs up. After that ceph cluster needs to rebalance.
Nov 7, 22:03 CET
Update -
Over 60% of OSDs restored. Restoration of Docker images in progress.
Nov 7, 17:02 CET
Update -
We’re still working on the broken OSDs — it’s taking longer than expected to get them all back online. In the meantime, we’ve also started fixing the issues with the Docker registry.
Nov 7, 14:30 CET
Update -
We’ve fixed the root cause affecting our Ceph cluster, and the cluster is now being restored. The issue turned out to be a bug in ceph cluster, where the Python sub-interpreter used by the mgr modules wasn’t loading plugins correctly.
In the meantime, our team has fixed the manager side and is working on restoring the broken OSDs. We expect this to take the next 2-3 hours.
Nov 7, 11:13 CET
Update -
We’re currently experiencing a partial service outage affecting some tenants following our scheduled maintenance. Unfortunately, a few unexpected issues have made recovery take longer than planned.
Our engineering team is fully engaged and working hard to restore all services as quickly as possible. We’re making progress, but it may take more time before everything is back to normal.
We’re very sorry for the ongoing disruption and truly appreciate your patience and understanding while we work to resolve this. We’ll continue to share updates as soon as we have more information.
Nov 7, 08:56 CET
Identified -
During ongoing scheduled maintenance, we identified unexpected issues impacting service availability for some tenants. While the maintenance activities were still in progress, several tenants began experiencing partial outages and degraded performance.
Our engineering team is actively investigating the root cause and working to restore full functionality as quickly as possible. Further updates will be provided as more information becomes available.
Nov 6, 13:15 CET