Infrastructure, Sizing & Performance

Meshery’s resource footprint is driven by two largely independent forces: the volume of user-facing API and UI traffic, and the scope of infrastructure it continuously discovers. Sizing a production deployment means accounting for both. This page provides per-component resource guidance, the levers you can pull to scale, and the performance bounds to design around.

What drives Meshery’s footprint

DriverPrimary effectComponents affected
Number and size of managed clustersLarger MeshSync snapshot; more events; larger databaseMeshSync, Broker, Meshery Server, Database
Rate of change in managed clustersHigher event throughputBroker, MeshSync, Meshery Server
Concurrent users and API/GraphQL loadMore request handling; capability lookupsMeshery Server
Designs, models, and registry sizeLarger in-memory registry and databaseMeshery Server
Relationship/policy evaluationCPU during evaluationMeshery Server

The discovery-driven forces are easy to underestimate. A handful of large, rapidly changing clusters can place more sustained load on Meshery than a busy but small user base. Size for your discovery scope first.

Per-component sizing guidance

Use these as conservative starting points for a production deployment, then right-size from observed utilization. Allocate generously where state is held (Meshery Server) and modestly where components are stateless and bursty (Operator, Adapters).

ComponentStarting requestsStarting limitsNotes
Meshery Server500m CPU / 512Mi2 CPU / 2GiScale memory with discovery scope, registry size, and database growth. The largest and most important allocation.
Meshery Operator50m CPU / 64Mi200m CPU / 256MiOne per managed cluster; lightweight reconciliation.
MeshSync100m CPU / 128Mi500m CPU / 512Mi+Scale with cluster size and rate of change; the heaviest discovery component.
Meshery Broker (NATS)100m CPU / 128Mi500m CPU / 512Mi+Memory tracks in-flight, unconsumed messages. Bursts during reconnects and large resyncs.
Meshery Adapter (each, optional)50m CPU / 64Mi200m CPU / 256MiOnly if you deploy adapters; stateless and transactional.

Storage:

  • Meshery Server / Database. Provision disk for the on-disk cache under the Server’s data folder. The database grows with discovery scope and registry size. Because it is a cache (not a system of record), size for working-set performance rather than long-term retention—durable data lives with the Remote Provider.
  • Meshery Broker. No persistent volume is required; the Broker holds messages in memory until consumed. Size memory, not disk, for the Broker.

MeshSync: tiered discovery and scoping

MeshSync is the component most sensitive to the size of managed clusters. It uses tiered discovery to progressively refine identification of resources, balancing granularity against speed and scalability. Two controls let you bound its cost:

  1. Blacklist uninteresting resources. MeshSync’s discovery is governed by an informer_config on the meshsyncs.meshery.io CRD. Editing the CRD to blacklist resource types you do not need to track reduces event volume, database growth, and Server memory pressure. Retrieve the CRD, edit informer_config, and re-apply it:

    kubectl get crd meshsyncs.meshery.io -o yaml > meshsync.yaml
    # edit informer_config to blacklist unwanted resource types
    kubectl apply -f meshsync.yaml
    
  2. Choose the right deployment mode. MeshSync runs in operator mode (deployed into the managed cluster) or embedded mode (a library inside Meshery Server, deploying nothing into the cluster). Embedded mode reduces in-cluster footprint but shifts discovery work into the Server process; it is useful for clusters where you cannot or prefer not to deploy the Operator. The default for new connections is set by MESHSYNC_DEFAULT_DEPLOYMENT_MODE (embedded or operator). See Multi-Cluster & Multi-Cloud.

For very large clusters, blacklisting noisy, high-cardinality resource types is the single most effective lever on Meshery’s footprint.

Broker throughput

The Meshery Broker (NATS) streams discovery data and events between each cluster and the Server. Production guidance:

  • Run one Broker per managed cluster. A single Broker instance can be scaled vertically to absorb a cluster’s data volume; this is independent of the number of clusters.
  • Because messages are held in memory until consumed, sustained Server unavailability or a slow consumer causes Broker memory to grow. Ensure Meshery Server keeps up and that the Broker has memory headroom for reconnect bursts and large resyncs.
  • The Broker recovers messages from its NATS topics across brief connectivity interruptions, which smooths transient Server or network blips.

Meshery Server: API, registry, and policy

  • API/GraphQL load scales with concurrent users and clients. Horizontal replication helps with read/stateless request handling; mind the database caveats in High Availability & Resiliency.
  • Registry and models are held to serve design and relationship operations; larger registries increase baseline memory.
  • Relationship/policy evaluation is CPU-bound and time-boxed by POLICY_EVAL_TIMEOUT (default 3m). If you see evaluations timing out on large designs, raise the timeout or allocate more CPU. The engine selection (USE_GO_POLICY_ENGINE) also affects evaluation characteristics. See the environment variables reference.

Scalability levers, at a glance

LeverEffectWhere configured
Meshery Server CPU/memory requests & limitsHeadroom for traffic, registry, and discovery snapshotHelm resources
Meshery Server replicasMore request-handling capacity (see HA caveats)Helm replicaCount + autoscaling
MeshSync blacklist (informer_config)Lower event volume and database growthmeshsyncs.meshery.io CRD
MeshSync mode (operator vs. embedded)Shifts discovery cost in-cluster vs. into ServerPer connection / MESHSYNC_DEFAULT_DEPLOYMENT_MODE
Broker memoryAbsorbs in-flight message burstsHelm values for the Broker/Operator
POLICY_EVAL_TIMEOUT, CPUTolerance for large policy evaluationsEnv var / Helm

Known performance bounds and caveats

  • The database is a single-writer SQLite/Bitcask cache. It is excellent for a cached working set but is not a horizontally shared, multi-writer datastore. This shapes how far a single Server instance scales and how replicas behave (see HA & Resiliency).
  • Discovery cost is proportional to cluster size and churn. Without blacklisting, very large or high-churn clusters can dominate Server memory and database growth.
  • Broker memory is bounded by consumption. A wedged or far-behind Server consumer lets Broker memory climb; alert on it.
  • Policy evaluation is time-boxed. Large designs may hit POLICY_EVAL_TIMEOUT; tune CPU and the timeout together.

Capacity-planning workflow

  1. Start from the per-component starting points above.
  2. Connect representative clusters and let MeshSync reach steady state.
  3. Measure Server memory/CPU, database size, and Broker memory under real discovery and user load.
  4. Blacklist resource types you do not need; re-measure.
  5. Set requests/limits with headroom (especially Server memory) and configure autoscaling where appropriate.
  6. Add the resulting thresholds to your alerting—see Monitoring, Observability & Health KPIs.