dstack features auto-scaling for services published via the gateway. The general flow is:
- STEP 1:
dstack-gatewayparses nginxaccess.logto collect per-second statistics about requests to the service and request times. - STEP 2:
dstack-gatewayaggregates statistics over a 1-minute window. - STEP 3: The dstack server pulls all service statistics in the
process_gatewaysbackground task. - STEP 4: The
process_runsbackground task passes statistics and current replicas to the autoscaler. - STEP 5: The autoscaler (configured via the
dstack.ymlfile) returns the replica change as an int. - STEP 6:
process_runscallsscale_run_replicasto add or remove replicas. - STEP 7:
scale_run_replicasterminates or starts replicas.SUBMITTEDandPROVISIONINGreplicas get terminated beforeRUNNING.- Replicas are terminated by descending
replica_numand launched by ascendingreplica_num.
RPSAutoscaler implements simple target tracking scaling. The target value represents requests per second per replica (in a 1-minute window).
scale_up_delay tells how much time has to pass since the last upscale or downscale event before the next upscaling. scale_down_delay tells how much time has to pass since the last upscale or downscale event before the next downscaling.