AWS re:Invent 2025 - Chaos & Continuity: Using Gen AI to improve humanitarian workload resilience
At AWS re:Invent 2025, Mike George unveiled a compelling approach to cloud resilience, especially vital for humanitarian workloads. He cleverly uses the acronym SEEMS – Single points of failure, Exc...
At AWS re:Invent 2025, Mike George unveiled a compelling approach to cloud resilience, especially vital for humanitarian workloads. He cleverly uses the acronym SEEMS – Single points of failure, Excessive load, Excessive latency, Misconfiguration & bugs, and Shared fate – as the five key principles for building robust systems. Forget simply adding more instances; true resilience means understanding and mitigating these potential pitfalls. The talk emphasizes planning for failure as a core tenet, ensuring systems work when they're needed most.
The highlight? A live demo of an agentic resilience advisor powered by Amazon Bedrock and the Strands Agent SDK. This AI-driven tool analyzes AWS workloads against the SEEMS principles. It leverages tools like "use AWS" (to inspect resources), "calculate letter grade" (for a quick resilience assessment), and "AWS documentation MCP server" (for detailed insights). The agent crafts architecture diagrams in Mermaid format, offers observability recommendations, and generates operational runbooks tailored to specific RTO and RPO needs. Imagine: automatically assessing your system's resilience and getting actionable steps to improve it – all thanks to the power of Gen AI.
The demo showcases the agent's ability to assess a workload, assign resilience grades, and generate a Mermaid diagram of the architecture. It then provides concrete observability suggestions, such as implementing comprehensive CloudWatch alarms, enabling X-Ray for tracing, and setting up enhanced logging with Log Insights – even providing direct links to specific documentation pages. Finally, the agent crafts a runbook for recovering from operational events, outlining steps for various components like Lambda functions and S3 buckets. This hands-on approach demonstrates how Gen AI can transform resilience planning from a theoretical exercise to a practical, automated process.
📰 Original article: https://dev.to/kazuya_dev/aws-reinvent-2025-chaos-continuity-using-gen-ai-to-improve-humanitarian-workload-resilience-4924
This content has been curated and summarized for Code Crafts readers.