CloudWatch

If running on AWS you can use cloudwatch() to access AWS metrics easily.

cloudwatch(region=None, assume_role_arn=None)

Initialize CloudWatch wrapper.

Parameters:
  • region (str) – AWS region for CloudWatch queries. Will be auto-detected if not supplied.
  • assume_role_arn (str) – AWS IAM role ARN to be assumed. This can be useful in cross-account CloudWatch queries.

Methods of Cloudwatch

query_one(dimensions, metric_name, statistics, namespace, period=60, minutes=5, start=None, end=None, extended_statistics=None)

Query a single AWS CloudWatch metric and return a single scalar value (float). Metric will be aggregated over the last five minutes using the provided aggregation type.

This method is a more low-level variant of the query method: all parameters, including all dimensions need to be known.

Parameters:
  • dimensions (dict) – Cloudwatch dimensions. Example {'LoadBalancerName': 'my-elb-name'}
  • metric_name (str) – Cloudwatch metric. Example 'Latency'.
  • statistics (list) – Cloudwatch metric statistics. Example 'Sum'
  • namespace (str) – Cloudwatch namespace. Example 'AWS/ELB'
  • period (int) – Cloudwatch statistics granularity in seconds. Default is 60.
  • minutes (int) – Used to determine start time of the Cloudwatch query. Default is 5. Ignored if start is supplied.
  • start (int) – Cloudwatch start timestamp. Default is None.
  • end (int) – Cloudwatch end timestamp. Default is None. If not supplied, then end time is now.
  • extended_statistics (list) – Cloudwatch ExtendedStatistics for percentiles query. Example ['p95', 'p99'].
Returns:

Return a float if single value, dict otherwise.

Return type:

float, dict

Example query with percentiles for AWS ALB:

cloudwatch().query_one({'LoadBalancer': 'app/my-alb/1234'}, 'TargetResponseTime', 'Average', 'AWS/ApplicationELB', extended_statistics=['p95', 'p99', 'p99.45'])
{
    'Average': 0.224,
    'p95': 0.245,
    'p99': 0.300,
    'p99.45': 0.500
}

Note

In very rare cases, e.g. for ELB metrics, you may see only 1/2 or 1-2/3 of the value in ZMON due to a race condition of what data is already present in cloud watch. To fix this click “evaluate” on the alert, this will trigger the check and move its execution time to a new start time.

query(dimensions, metric_name, statistics='Sum', namespace=None, period=60, minutes=5)

Query AWS CloudWatch for metrics. Metrics will be aggregated over the last five minutes using the provided aggregation type (default “Sum”).

dimensions is a dictionary to filter the metrics to query. See the list_metrics boto documentation. You can provide the special value “NOT_SET” for a dimension to only query metrics where the given key is not set. This makes sense e.g. for ELB metrics as they are available both per AZ (“AvailabilityZone” has a value) and aggregated over all AZs (“AvailabilityZone” not set). Additionally you can include the special “*” character in a dimension value to do fuzzy (shell globbing) matching.

metric_name is the name of the metric to filter against (e.g. “RequestCount”).

namespace is an optional namespace filter (e.g. “AWS/EC2).

To query an ELB for requests per second:

# both using special "NOT_SET" and "*" in dimensions here:
val = cloudwatch().query({'AvailabilityZone': 'NOT_SET', 'LoadBalancerName': 'pierone-*'}, 'RequestCount', 'Sum')['RequestCount']
requests_per_second = val / 60

You can find existing metrics with the AWS CLI tools:

$ aws cloudwatch list-metrics --namespace "AWS/EC2"

Use the “dimensions” argument to select on what dimension(s) to aggregate over:

$ aws cloudwatch list-metrics --namespace "AWS/EC2" --dimensions Name=AutoScalingGroupName,Value=my-asg-FEYBCZF

The desired metric can now be queried in ZMON:

cloudwatch().query({'AutoScalingGroupName': 'my-asg-*'}, 'DiskReadBytes', 'Sum')
alarms(alarm_names=None, alarm_name_prefix=None, state_value=STATE_ALARM, action_prefix=None, max_records=50)

Retrieve cloudwatch alarms filtered by state value.

See describe_alarms boto documentation for more details.

Parameters:
  • alarm_names (list) – List of alarm names.
  • alarm_name_prefix (str) – Prefix of alarms. Cannot be specified if alarm_names is specified.
  • state_value (str) – State value used in alarm filtering. Available values are OK, ALARM (default) and INSUFFICIENT_DATA.
  • action_prefix (str) – Action name prefix. Example arn:aws:autoscaling: to filter results for all autoscaling related alarms.
  • max_records (int) – Maximum records to be returned. Default is 50.
Returns:

List of MetricAlarms.

Return type:

list

cloudwatch().alarms(state_value='ALARM')[0]
{
    'ActionsEnabled': True,
    'AlarmActions': ['arn:aws:autoscaling:...'],
    'AlarmArn': 'arn:aws:cloudwatch:...',
    'AlarmConfigurationUpdatedTimestamp': datetime.datetime(2016, 5, 12, 10, 44, 15, 707000, tzinfo=tzutc()),
    'AlarmDescription': 'Scale-down if CPU < 50% for 10.0 minutes (Average)',
    'AlarmName': 'metric-alarm-for-service-x',
    'ComparisonOperator': 'LessThanThreshold',
    'Dimensions': [
        {
            'Name': 'AutoScalingGroupName',
            'Value': 'service-x-asg'
        }
    ],
    'EvaluationPeriods': 2,
    'InsufficientDataActions': [],
    'MetricName': 'CPUUtilization',
    'Namespace': 'AWS/EC2',
    'OKActions': [],
    'Period': 300,
    'StateReason': 'Threshold Crossed: 1 datapoint (36.1) was less than the threshold (50.0).',
    'StateReasonData': '{...}',
    'StateUpdatedTimestamp': datetime.datetime(2016, 5, 12, 10, 44, 16, 294000, tzinfo=tzutc()),
    'StateValue': 'ALARM',
    'Statistic': 'Average',
    'Threshold': 50.0
}