Monitoring and Grafana
What this page covers
This page explains how to enable AsterDrive's built-in Prometheus metrics, how Prometheus scrapes them, and how to import the official Grafana dashboard.
If you are only doing a final pre-launch check, still focus on the "Production Security Boundary" section to confirm the metrics endpoint is not exposed directly to the public internet.
AsterDrive's Prometheus metrics are optional and are not enabled in the default build. Once enabled, the service registers:
GET /health/metricsunder the health check path. The output is Prometheus text exposition and covers HTTP, database, uploads and downloads, background tasks, storage drivers, authentication events, share download rollback, and process resources.
Enable the Metrics Feature
When building from source, use:
cargo build --release --features metricsOr use the build that includes all optional capabilities:
cargo build --release --features fullAfter startup, confirm the instance returns metrics:
curl http://<asterdrive-host>:3000/health/metricsIf it returns 404, the current binary usually was not built with the metrics feature.
If it returns HTML or a proxy error, check the reverse proxy path and backend port first.
Prometheus Scrape Configuration
When Prometheus and AsterDrive are on the same host or in the same network, scrape like this:
global:
scrape_interval: 15s
scrape_configs:
- job_name: asterdrive
metrics_path: /health/metrics
static_configs:
- targets:
- 127.0.0.1:3000If Prometheus runs in Docker and AsterDrive runs on the host, Docker Desktop usually uses:
global:
scrape_interval: 15s
scrape_configs:
- job_name: asterdrive
metrics_path: /health/metrics
static_configs:
- targets:
- host.docker.internal:3000On Linux Docker, if host.docker.internal is unavailable, add this to the Prometheus container:
extra_hosts:
- host.docker.internal:host-gatewayThen still use host.docker.internal:3000 as the target.
Docker Compose Example
Below is a minimal Prometheus + Grafana compose example for the scenario where Prometheus / Grafana run in Docker and AsterDrive listens on host port 3000. For production deployment, adjust the network, ports, persistent directories, and access control to match your environment.
services:
prometheus:
image: prom/prometheus:v3.7.3 # Example pinned to a verified version; validate config compatibility before upgrading in your deployment window.
command:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --web.enable-lifecycle
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus-data:/prometheus
extra_hosts:
- host.docker.internal:host-gateway
grafana:
image: grafana/grafana:12.3.0 # Example pinned to a verified version; validate config compatibility before upgrading in your deployment window.
ports:
- "3300:3000"
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- grafana-data:/var/lib/grafana
depends_on:
- prometheus
volumes:
prometheus-data:
grafana-data:prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: asterdrive
metrics_path: /health/metrics
static_configs:
- targets:
- host.docker.internal:3000After starting the example, open:
- Prometheus:
http://localhost:9090 - Grafana:
http://localhost:3300
The sample Grafana account is admin / admin. In production, change it to an independent strong password and restrict access sources for the Grafana admin entry.
Grafana Dashboard
The AsterDrive docs site provides a Grafana dashboard JSON that can be imported directly. Both the official docs site and self-hosted VitePress docs site distribute this file from the site root:
/grafana/asterdrive-overview.jsonFull URL on the online docs site:
https://asterdrive.docs.esap.cc/grafana/asterdrive-overview.jsonImport steps:
- Open Grafana.
- Go to
Dashboards -> New -> Import. - Paste the JSON URL above, or download the JSON and upload it.
- Select your Prometheus datasource.
- Import
AsterDrive Overview.
This dashboard provides:
- Overview: target health, process uptime, RSS, CPU, HTTP RPS, and 5xx ratio.
- HTTP: request count by route/status, p95/p99 latency, error ratio, and slow routes.
- Database: SeaORM query count, error count, p95 latency, and average latency.
- Storage drivers: operation count, not_found, hard failures, p95 latency, and average latency.
- Transfer: uploads, downloads, upload session lifecycle, and transfer failures.
- Background tasks: backlog, status transitions, retry, and share download rollback queue.
- Authentication: login and refresh token events, authentication failure ratio.
- Process: RSS, CPU core usage, uptime.
Dashboard queries are based on AsterDrive's current metric labels:
jobinstancemethodroutestatusbackendkinddriveroperationmodesourceoutcomerangeactionreasonevent
job and instance come from the Prometheus scrape target. They are not business labels written by AsterDrive itself.
Production Security Boundary
AsterDrive currently does not apply application-layer authentication to /health/metrics. This keeps Prometheus scraping simple and stable. Access control should be handled at the network boundary.
In production, you must ensure:
/health/metricsis not exposed directly to the public internet.- Only scrapers such as Prometheus, Grafana Agent, and VictoriaMetrics Agent can access it.
- The reverse proxy applies source IP restrictions to
/health/metrics, or it only listens on the internal network. - Do not protect metrics with normal browser login state. Scrapers should not depend on browser sessions.
Handle normal user entry points and the metrics endpoint separately in the reverse proxy. The public site can expose /health, but /health/metrics should only be visible to the monitoring network.
What to Watch
On the first day after launch, focus on:
- whether 5xx keeps increasing in
http_requests_total - whether p95/p99 in
http_request_duration_secondsrises over time - whether
db_queries_total{status="error"}is non-zero - whether
db_query_duration_secondsshows abnormal slow query distributions - whether
storage_driver_operations_total{status="failure",kind!="not_found"}is non-zero - whether
background_tasks_pendingkeeps accumulating - whether
background_task_retries_totalkeeps increasing - whether
share_download_rollback_pendingkeeps accumulating - whether
process_memory_rss_bytesgrows in only one direction and does not drop after idle periods
Observe storage_driver_operations_total{status="failure",kind="not_found"} separately. In thumbnail, cache, or object probing scenarios, not_found is often an expected miss. Alert rules should prioritize hard failures such as storage_driver_operations_total{status="failure",kind!="not_found"}.