Troubleshooting
Common issues and solutions when working with AI Agents Management.
WebSocket Connection Issues
Problem: "WebSocket closed (1006)" Error
Symptoms:
- Agent instances show "Offline" status
- Configuration changes don't take effect
- Real-time updates not working
Solutions:
- Check WebSocket URL: Ensure the WebSocket endpoint is accessible
- Verify Authentication: Make sure JWT tokens are valid and not expired
- Check Network: Verify firewall settings allow WebSocket connections
- Restart Instance: Try stopping and starting the agent instance
Problem: "No Active WebSocket" Error
Symptoms:
- Cannot send configuration updates
- Instance appears disconnected
Solutions:
- Wait for Reconnection: WebSocket connections auto-reconnect every 30 seconds
- Check Instance Status: Ensure the instance is in "Running" state
- Verify DO_SYNC_SECRET: Check that the secret is properly configured
Agent Instance Issues
Problem: Instance Stuck in "Starting" State
Symptoms:
- Instance shows "Starting" for more than 5 minutes
- No logs appear in the system
Solutions:
- Check Agent Code: Verify the agent code doesn't have infinite loops or blocking operations
- Review Dependencies: Ensure all required packages are available
- Check Memory Limits: Verify the instance has sufficient memory
- Restart Instance: Stop and start the instance again
Problem: Instance Shows "Error" Status
Symptoms:
- Red error indicator in the dashboard
- Agent stops responding
Solutions:
- Check System Logs: Go to Admin Panel → System Logs to view error details
- Review Agent Code: Look for runtime errors or exceptions
- Verify Configuration: Check environment variables and settings
- Check Resource Usage: Ensure CPU and memory usage are within limits
Scheduler Issues
Problem: Cron Expression Validation Fails
Symptoms:
- "Invalid cron expression" error message
- Cannot save scheduler configuration
Solutions:
- Check Format: Ensure you're using either 5-field or 6-field format consistently
- Validate Syntax: Use online cron validators to check your expression
- Common Mistakes:
- Using
*/5in 5-field format (should be0 */5 * * *) - Invalid day of week values (use 0-7, not 1-7)
- Invalid month values (use 1-12, not 0-11)
- Using
Problem: Scheduled Tasks Not Running
Symptoms:
- Cron expression is valid but tasks don't execute
- No execution logs in the system
Solutions:
- Check Instance Status: Ensure the instance is "Running"
- Verify Timezone: All cron expressions use UTC time
- Check Next Execution: Verify the next scheduled time is correct
- Review Agent Code: Ensure the agent handles scheduled execution properly
Performance Issues
Problem: High CPU Usage
Symptoms:
- Instance shows high CPU time in metrics
- Slow response times
Solutions:
- Optimize Agent Code: Review loops and recursive functions
- Add Delays: Use
await new Promise(resolve => setTimeout(resolve, 1000))to add delays - Batch Operations: Process data in smaller chunks
- Check External APIs: Verify third-party services aren't causing delays
Problem: Memory Leaks
Symptoms:
- Instance memory usage keeps increasing
- Instance becomes unresponsive over time
Solutions:
- Review Variable Scope: Ensure variables are properly cleaned up
- Close Connections: Always close database and HTTP connections
- Clear Intervals: Use
clearInterval()andclearTimeout()properly - Monitor Objects: Avoid keeping large objects in memory unnecessarily
Authentication Issues
Problem: "Unauthorized" Errors
Symptoms:
- Cannot access instance details
- API calls return 401 errors
Solutions:
- Check JWT Token: Verify the token is valid and not expired
- Verify Permissions: Ensure your account has access to the instance
- Re-login: Try logging out and logging back in
- Check CLI Authentication: For CLI users, run
lifectl auth login
Data Issues
Problem: Instance Data Not Persisting
Symptoms:
- Configuration changes are lost after restart
- Instance state resets unexpectedly
Solutions:
- Check Durable Objects: Verify DO storage is working properly
- Review Save Operations: Ensure data is properly saved in agent code
- Check Storage Limits: Verify storage quotas aren't exceeded
- Use Proper APIs: Use the correct storage APIs for persistence
Getting Help
If you continue to experience issues:
- Check System Logs: Always review the logs first for error details
- Document the Issue: Note the exact error messages and steps to reproduce
- Check Instance Details: Include instance ID, agent version, and configuration
- Contact Support: Provide all relevant information when requesting help
Preventive Measures
- Regular Monitoring: Check instance status and logs regularly
- Resource Limits: Set appropriate CPU and memory limits
- Error Handling: Implement proper error handling in agent code
- Testing: Test agents thoroughly before deploying to production
- Backup Configuration: Keep backups of important configurations