Re-Architecting a Multi-Tenant Cloud Vulnerability Scanning Platform for Scale, Isolation, and Observability

This figure is just for illustrative purposes.
When a client's primary vulnerability scanning service, a critical backend component for their Cloud Security Posture Module (CSPM), began failing under operational load, Falistro was asked to step in. The legacy system, built on a traditional client-server, pull-based SaltStack architecture, was exhibiting inefficiency, instability, and an inability to handle the required scale.
Falistro's diagnosis confirmed the system's core deficiencies were significant:​
-
Poor Multi-Tenancy: It could not ensure fair resource sharing between tenants, a critical issue when some customer accounts generated 100x more data than others.
-
Lack of Isolation: Failures in scanning one cloud account could cascade, causing resource starvation or logical conflicts that impact other accounts.
-
Low Observability: The client's primary concern was a lack of insight into the system's operational status, which made troubleshooting challenging.
-
Complex Constraints: The system needed to perform scanning on dedicated nodes, support workload prioritization, and function across multiple, disparate environments: a public SaaS, customer on-premise (both connected and air-gapped), and federated scenarios where on-premise clients could utilize the SaaS infrastructure.
​
Architectural Solution & Implementation
Falistro proposed a fundamental architectural shift, moving from the legacy pull-model to an innovative, push-based system. The new design was centered on leveraging Kubernetes priority-based workloads to manage job execution, resource allocation, and fault tolerance.
​
This approach, which the client termed "revolutionary," was meticulously modeled and presented with supporting data to demonstrate its viability. After securing client buy-in, the project involved a complete overhaul and rewrite of this critical component.
​
Falistro engineered the new architecture to address all initial requirements:​
-
Scalability & Fairness: Utilizing a push-based model with priority queues enables the system to dynamically manage workloads, ensuring fair resource distribution and horizontal scalability.
-
Resilience & Isolation: Workloads were containerized and isolated, preventing issues in one tenant's scan from affecting the stability of the entire system.
-
High Observability: The new system was built with deep instrumentation, providing the clear observability the client's team needed to manage operations effectively.
​
Outcome
The resulting architecture not only resolved the existing inefficiencies and bugs but also seamlessly met the three critical and complex deployment scenarios (SaaS, on-premise, and federated). The re-engineered service exceeded client expectations, delivering a highly scalable, observable, and efficient solution that secured a vital component of their cloud security offering.
