Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction to AIOps
- Understanding AIOps and its significance.
- Contrasting traditional monitoring with AIOps-driven observability.
- Exploring AIOps architecture and essential components.
Collecting and Normalizing Operational Data
- Identifying types of observability data: metrics, logs, and traces.
- Ingesting data from diverse sources such as servers, containers, and cloud environments.
- Utilizing agents and exporters like Prometheus, Beats, and Fluentd.
Data Correlation and Anomaly Detection
- Employing time series correlation and statistical methods.
- Applying ML models for effective anomaly detection.
- Identifying incidents across distributed systems.
Alerting and Noise Reduction
- Designing intelligent alert rules and thresholds.
- Implementing suppression, deduplication, and alert grouping strategies.
- Integrating with platforms such as Alertmanager, Slack, PagerDuty, or Opsgenie.
Root Cause Analysis and Visualization
- Utilizing dashboards to visualize metrics and identify trends.
- Examining events and timelines to facilitate RCA (Root Cause Analysis).
- Tracing issues across layers using distributed tracing tools.
Automation and Remediation
- Triggering automated scripts or workflows triggered by incidents.
- Integrating with ITSM systems like ServiceNow and Jira.
- Reviewing use cases such as self-healing, scaling, and traffic rerouting.
Open Source and Commercial AIOps Platforms
- Overview of tools including Prometheus, Grafana, ELK, Moogsoft, and Dynatrace.
- Establishing evaluation criteria for selecting an appropriate AIOps platform.
- Participating in a demo and hands-on session with a selected stack.
Summary and Next Steps
Requirements
- A foundational understanding of IT operations and system monitoring concepts.
- Prior experience with monitoring tools or dashboards.
- Familiarity with basic log and metric formats.
Audience
- Operations teams managing infrastructure and applications.
- Site Reliability Engineers (SREs).
- Teams focused on IT monitoring and observability.
14 Hours