<--inspect --trace-warnings --signal Prism.js CSS -->
ScoutQuest / Documentation / Troubleshooting

Troubleshooting

Common issues, solutions, and debugging techniques for ScoutQuest deployment and operation.

Common Issues

Service Registration Problems

❌ Problem: "Connection refused" when registering service

Symptoms: Client throws connection errors when trying to register

Error: Request failed with status code ECONNREFUSED
at ScoutQuestClient.registerService (client.js:45)

// Or in Rust:
Error: Network error: Connection refused (os error 61)

Causes & Solutions:

  • ScoutQuest server not running: Start the server with ./scoutquest-server
  • Wrong server URL: Check if using correct host and port (default: http://localhost:8080)
  • Firewall blocking: Ensure port 8080 is open and accessible
  • Network connectivity: Test with curl http://localhost:8080/health

❌ Problem: Service registration succeeds but service not discoverable

Symptoms: Registration returns success but discoverService() fails

// Registration succeeds
const instance = await client.registerService('my-service', 'localhost', 3000);
console.log('Registered:', instance.id); // Works

// Discovery fails
const discovered = await client.discoverService('my-service'); // ServiceNotFoundError

Solutions:

  • Check service name spelling: Service names are case-sensitive
  • Verify health status: Service might be registered but marked unhealthy
  • Database issues: Check server logs for database connection errors
  • Cache issues: Clear discovery cache or wait for TTL expiration

❌ Problem: Health check failures

Symptoms: Services registered but marked as unhealthy

// Health check endpoint not responding
GET http://localhost:3000/health → 404 Not Found

// Or timeout
Error: Health check timeout after 5000ms

Solutions:

  • Implement health endpoint: Add /health endpoint to your service
  • Check endpoint path: Ensure health check URL is correct
  • Verify service is running: Test service accessibility directly
  • Adjust timeout: Increase health check timeout in configuration

Service Discovery Issues

❌ Problem: "ServiceNotFoundError" for existing services

Symptoms: Service exists but cannot be discovered

ScoutQuestError: Service 'user-service' not found
    at ServiceDiscoveryClient.discoverService

Debug Steps:

# 1. List all services
curl http://localhost:8080/api/services

# 2. Check specific service instances
curl http://localhost:8080/api/services/user-service/instances

# 3. Verify service registration
curl -X POST http://localhost:8080/api/services \
  -H "Content-Type: application/json" \
  -d '{
    "name": "user-service",
    "host": "localhost",
    "port": 3000
  }'

❌ Problem: Load balancing not working

Symptoms: Always get same instance, no distribution

Solutions:

  • Multiple instances: Ensure multiple healthy instances are registered
  • Check strategy: Verify load balancing strategy configuration
  • Clear cache: Discovery cache might return same cached instance
// Force fresh discovery (bypass cache)
const options = {
  bypassCache: true,
  loadBalancingStrategy: 'round_robin'
};
const instance = await client.discoverServiceWithOptions('user-service', options);

HTTP Communication Problems

❌ Problem: HTTP requests through ScoutQuest fail

Symptoms: Direct calls work, but calls through discovery fail

// Direct call works
const response = await fetch('http://localhost:3000/api/users');

// Through ScoutQuest fails
const response = await client.getService('user-service', '/api/users'); // Error

Debug approach:

// 1. Verify service discovery works
const instance = await client.discoverService('user-service');
console.log('Discovered:', instance);

// 2. Test direct call to discovered instance
const directResponse = await fetch(`http://${instance.host}:${instance.port}/api/users`);
console.log('Direct call status:', directResponse.status);

// 3. Enable debug logging
const client = new ScoutQuestClient({
  serverUrl: 'http://localhost:8080',
  debug: true
});

❌ Problem: Timeout errors

Symptoms: Requests timeout inconsistently

ScoutQuestError: Request timeout after 5000ms

Solutions:

  • Increase timeout: Adjust client timeout configuration
  • Check service performance: Monitor target service response times
  • Network issues: Verify network connectivity and latency
// Increase timeout for specific calls
const response = await client.getService('slow-service', '/api/data', {
  timeout: 30000 // 30 seconds
});

Server-Side Issues

❌ Problem: ScoutQuest server won't start

Common error messages and solutions:

# Error: Address already in use
Error: Address already in use (os error 48)

Solution: Another process using port 8080

# Find process using port 8080
lsof -i :8080
kill -9 

# Or use different port
SCOUTQUEST_PORT=8081 ./scoutquest-server
# Error: Database connection failed
Error: Failed to connect to database: Connection refused

Solutions:

  • Start PostgreSQL: brew services start postgresql (macOS) or sudo systemctl start postgresql (Linux)
  • Create database: createdb scoutquest
  • Check connection string in config file

❌ Problem: High memory usage

Symptoms: Server memory consumption grows over time

Diagnostic steps:

# Check memory usage
ps aux | grep scoutquest-server

# Monitor over time
watch -n 5 'ps aux | grep scoutquest-server'

# Check metrics endpoint
curl http://localhost:8080/metrics | grep memory

Solutions:

  • Reduce cache size: Lower cache TTL and max size in configuration
  • Connection pool limits: Reduce database connection pool size
  • Service cleanup: Implement automatic deregistration of dead services

Debugging Techniques

Enable Debug Logging

// Enable debug logging in JavaScript
const client = new ScoutQuestClient({
  serverUrl: 'http://localhost:8080',
  debug: true,
  logLevel: 'debug'
});

// Or set environment variable
process.env.SCOUTQUEST_DEBUG = 'true';
process.env.SCOUTQUEST_LOG_LEVEL = 'debug';
// Enable debug logging in Rust
RUST_LOG=debug cargo run

// Or specific to scoutquest
RUST_LOG=scoutquest_rust=debug cargo run

// In code
use tracing::{debug, info, warn, error};

let client = ServiceDiscoveryClient::builder()
    .server_url("http://localhost:8080")
    .enable_debug_logging(true)
    .build()?;
# Enable server debug logging
RUST_LOG=debug ./scoutquest-server

# Or via configuration
[logging]
level = "debug"
format = "text"  # Easier to read during debugging

Network Debugging

# Test ScoutQuest server connectivity
curl -v http://localhost:8080/health

# Test service registration
curl -X POST http://localhost:8080/api/services \
  -H "Content-Type: application/json" \
  -d '{
    "name": "test-service",
    "host": "localhost",
    "port": 3000,
    "metadata": {"version": "1.0.0"}
  }'

# Test service discovery
curl http://localhost:8080/api/services/test-service

# Test health checks
curl http://localhost:3000/health

# Monitor network traffic
sudo tcpdump -i lo0 port 8080

Database Debugging

# Connect to PostgreSQL database
psql -h localhost -U scoutquest -d scoutquest

# Check services table
SELECT * FROM services;

# Check service instances
SELECT * FROM service_instances;

# Check health check status
SELECT si.*, hc.status, hc.last_check
FROM service_instances si
LEFT JOIN health_checks hc ON si.id = hc.service_instance_id;

Performance Analysis

# Server metrics
curl http://localhost:8080/metrics

# Specific metrics to check
curl -s http://localhost:8080/metrics | grep -E "(http_requests|service_discovery|health_check)"

# Load testing with Apache Bench
ab -n 1000 -c 10 http://localhost:8080/api/services

# Monitor with htop
htop -p $(pgrep scoutquest-server)

Common Error Messages

Client Errors

ServiceNotFoundError: Service 'xyz' not found

Service is not registered or all instances are unhealthy

ServiceUnavailableError: Service 'xyz' is unavailable

Service registered but no healthy instances available

NetworkError: Connection refused

Cannot connect to ScoutQuest server

TimeoutError: Request timeout after Xms

Request took longer than configured timeout

CircuitBreakerOpenError: Circuit breaker is open

Too many failures, circuit breaker protecting service

Server Errors

Address already in use (os error 48)

Port 8080 is already in use by another process

Failed to connect to database

PostgreSQL is not running or connection configuration is incorrect

Redis connection failed

Redis server is not accessible (if using Redis for caching)

TLS certificate error

Invalid or expired TLS certificates

Health Check Debugging

Manual Health Check Testing

# Test health endpoint directly
curl -v http://localhost:3000/health

# Expected response
HTTP/1.1 200 OK
Content-Type: application/json

{"status": "healthy", "timestamp": "2024-01-15T10:30:00Z"}

Health Check Implementation Examples

app.get('/health', (req, res) => {
  // Basic health check
  res.json({
    status: 'healthy',
    timestamp: new Date().toISOString(),
    uptime: process.uptime(),
    version: process.env.npm_package_version
  });
});

// Advanced health check with dependencies
app.get('/health', async (req, res) => {
  const checks = {};
  let healthy = true;

  // Database check
  try {
    await db.query('SELECT 1');
    checks.database = 'healthy';
  } catch (error) {
    checks.database = 'unhealthy';
    healthy = false;
  }

  // Redis check
  try {
    await redis.ping();
    checks.redis = 'healthy';
  } catch (error) {
    checks.redis = 'unhealthy';
    healthy = false;
  }

  const status = healthy ? 200 : 503;
  res.status(status).json({
    status: healthy ? 'healthy' : 'unhealthy',
    checks,
    timestamp: new Date().toISOString()
  });
});
use axum::{http::StatusCode, response::Json, routing::get, Router};
use serde_json::{json, Value};

async fn health_check() -> Result, StatusCode> {
    // Basic health check
    Ok(Json(json!({
        "status": "healthy",
        "timestamp": chrono::Utc::now().to_rfc3339(),
        "version": env!("CARGO_PKG_VERSION")
    })))
}

// Advanced health check
async fn advanced_health_check(
    Extension(db_pool): Extension,
    Extension(redis_pool): Extension>
) -> Result, StatusCode> {
    let mut checks = serde_json::Map::new();
    let mut healthy = true;

    // Database check
    match sqlx::query("SELECT 1").fetch_one(&db_pool).await {
        Ok(_) => checks.insert("database".to_string(), json!("healthy")),
        Err(_) => {
            checks.insert("database".to_string(), json!("unhealthy"));
            healthy = false;
        }
    };

    // Redis check
    let redis_status = match redis_pool.get() {
        Ok(mut conn) => {
            match redis::cmd("PING").query::(&mut conn) {
                Ok(_) => "healthy",
                Err(_) => { healthy = false; "unhealthy" }
            }
        }
        Err(_) => { healthy = false; "unhealthy" }
    };
    checks.insert("redis".to_string(), json!(redis_status));

    let status_code = if healthy { StatusCode::OK } else { StatusCode::SERVICE_UNAVAILABLE };

    Ok(Json(json!({
        "status": if healthy { "healthy" } else { "unhealthy" },
        "checks": checks,
        "timestamp": chrono::Utc::now().to_rfc3339()
    })))
}

Performance Issues

❌ Problem: Slow service discovery

Solutions:

  • Enable caching: Use discovery cache with appropriate TTL
  • Database optimization: Ensure proper indexes on services table
  • Connection pooling: Use database connection pooling
// Enable client-side caching
const client = new ScoutQuestClient({
  serverUrl: 'http://localhost:8080',
  cache: {
    enabled: true,
    ttl: 60000, // 1 minute
    maxSize: 1000
  }
});

❌ Problem: High latency HTTP calls

Diagnostic approach:

# Measure different components
time curl http://localhost:8080/api/services/user-service  # Discovery time
time curl http://localhost:3000/api/users  # Direct service time
time scoutquest-client get user-service /api/users  # Through ScoutQuest

Solutions:

  • Keep-alive connections: Enable HTTP keep-alive
  • Connection pooling: Reuse HTTP connections
  • Local instances: Prefer geographically closer instances

Getting Help

Before Opening an Issue

  1. Check this troubleshooting guide
  2. Search existing GitHub issues
  3. Enable debug logging and gather relevant logs
  4. Prepare a minimal reproducible example
  5. Include your environment details (OS, ScoutQuest version, etc.)
<--inspect --trace-warnings --signal Prism.js JavaScript -->