top of page

Re-Architecting a Multi-Tenant Cloud Vulnerability Scanning Platform for Scale, Isolation, and Observability

Cloud Vulnerability Scanning Flow

This figure is just for illustrative purposes.

When a client's primary vulnerability scanning service, a critical backend component for their Cloud Security Posture Module (CSPM), began failing under operational load, Falistro was asked to step in. The legacy system, built on a traditional client-server, pull-based SaltStack architecture, was exhibiting inefficiency, instability, and an inability to handle the required scale.

 

Falistro's diagnosis confirmed the system's core deficiencies were significant:
  • Poor Multi-Tenancy: It could not ensure fair resource sharing between tenants, a critical issue when some customer accounts generated 100x more data than others.

  • Lack of Isolation: Failures in scanning one cloud account could cascade, causing resource starvation or logical conflicts that impact other accounts.

  • Low Observability: The client's primary concern was a lack of insight into the system's operational status, which made troubleshooting challenging.

  • Complex Constraints: The system needed to perform scanning on dedicated nodes, support workload prioritization, and function across multiple, disparate environments: a public SaaS, customer on-premise (both connected and air-gapped), and federated scenarios where on-premise clients could utilize the SaaS infrastructure.

Architectural Solution & Implementation

Falistro proposed a fundamental architectural shift, moving from the legacy pull-model to an innovative, push-based system. The new design was centered on leveraging Kubernetes priority-based workloads to manage job execution, resource allocation, and fault tolerance.

This approach, which the client termed "revolutionary," was meticulously modeled and presented with supporting data to demonstrate its viability. After securing client buy-in, the project involved a complete overhaul and rewrite of this critical component.

Falistro engineered the new architecture to address all initial requirements:​
  • Scalability & Fairness: Utilizing a push-based model with priority queues enables the system to dynamically manage workloads, ensuring fair resource distribution and horizontal scalability.

  • Resilience & Isolation: Workloads were containerized and isolated, preventing issues in one tenant's scan from affecting the stability of the entire system.

  • High Observability: The new system was built with deep instrumentation, providing the clear observability the client's team needed to manage operations effectively.

Outcome

The resulting architecture not only resolved the existing inefficiencies and bugs but also seamlessly met the three critical and complex deployment scenarios (SaaS, on-premise, and federated). The re-engineered service exceeded client expectations, delivering a highly scalable, observable, and efficient solution that secured a vital component of their cloud security offering.

Design. Develop. Scale

Registered Address

Basement, S-145 Panchsheel Park, New Delhi, 110017, India

bottom of page