CloudWatch¶
If running on AWS you can use cloudwatch()
to access AWS metrics easily.
-
cloudwatch
(region=None, assume_role_arn=None) Initialize CloudWatch wrapper.
Parameters:
Methods of Cloudwatch¶
-
query_one
(dimensions, metric_name, statistics, namespace, period=60, minutes=5, start=None, end=None, extended_statistics=None)¶ Query a single AWS CloudWatch metric and return a single scalar value (float). Metric will be aggregated over the last five minutes using the provided aggregation type.
This method is a more low-level variant of the
query
method: all parameters, including all dimensions need to be known.Parameters: - dimensions (dict) – Cloudwatch dimensions. Example
{'LoadBalancerName': 'my-elb-name'}
- metric_name (str) – Cloudwatch metric. Example
'Latency'
. - statistics (list) – Cloudwatch metric statistics. Example
'Sum'
- namespace (str) – Cloudwatch namespace. Example
'AWS/ELB'
- period (int) – Cloudwatch statistics granularity in seconds. Default is 60.
- minutes (int) – Used to determine
start
time of the Cloudwatch query. Default is 5. Ignored ifstart
is supplied. - start (int) – Cloudwatch start timestamp. Default is
None
. - end (int) – Cloudwatch end timestamp. Default is
None
. If not supplied, then end time is now. - extended_statistics (list) – Cloudwatch ExtendedStatistics for percentiles query. Example
['p95', 'p99']
.
Returns: Return a float if single value, dict otherwise.
Return type: Example query with percentiles for AWS ALB:
cloudwatch().query_one({'LoadBalancer': 'app/my-alb/1234'}, 'TargetResponseTime', 'Average', 'AWS/ApplicationELB', extended_statistics=['p95', 'p99', 'p99.45']) { 'Average': 0.224, 'p95': 0.245, 'p99': 0.300, 'p99.45': 0.500 }
- dimensions (dict) – Cloudwatch dimensions. Example
Note
In very rare cases, e.g. for ELB metrics, you may see only 1/2 or 1-2/3 of the value in ZMON due to a race condition of what data is already present in cloud watch. To fix this click “evaluate” on the alert, this will trigger the check and move its execution time to a new start time.
-
query
(dimensions, metric_name, statistics='Sum', namespace=None, period=60, minutes=5)¶ Query AWS CloudWatch for metrics. Metrics will be aggregated over the last five minutes using the provided aggregation type (default “Sum”).
dimensions is a dictionary to filter the metrics to query. See the list_metrics boto documentation. You can provide the special value “NOT_SET” for a dimension to only query metrics where the given key is not set. This makes sense e.g. for ELB metrics as they are available both per AZ (“AvailabilityZone” has a value) and aggregated over all AZs (“AvailabilityZone” not set). Additionally you can include the special “*” character in a dimension value to do fuzzy (shell globbing) matching.
metric_name is the name of the metric to filter against (e.g. “RequestCount”).
namespace is an optional namespace filter (e.g. “AWS/EC2).
To query an ELB for requests per second:
# both using special "NOT_SET" and "*" in dimensions here: val = cloudwatch().query({'AvailabilityZone': 'NOT_SET', 'LoadBalancerName': 'pierone-*'}, 'RequestCount', 'Sum')['RequestCount'] requests_per_second = val / 60
You can find existing metrics with the AWS CLI tools:
$ aws cloudwatch list-metrics --namespace "AWS/EC2"
Use the “dimensions” argument to select on what dimension(s) to aggregate over:
$ aws cloudwatch list-metrics --namespace "AWS/EC2" --dimensions Name=AutoScalingGroupName,Value=my-asg-FEYBCZF
The desired metric can now be queried in ZMON:
cloudwatch().query({'AutoScalingGroupName': 'my-asg-*'}, 'DiskReadBytes', 'Sum')
-
alarms
(alarm_names=None, alarm_name_prefix=None, state_value=STATE_ALARM, action_prefix=None, max_records=50)¶ Retrieve cloudwatch alarms filtered by state value.
See describe_alarms boto documentation for more details.
Parameters: - alarm_names (list) – List of alarm names.
- alarm_name_prefix (str) – Prefix of alarms. Cannot be specified if
alarm_names
is specified. - state_value (str) – State value used in alarm filtering. Available values are
OK
,ALARM
(default) andINSUFFICIENT_DATA
. - action_prefix (str) – Action name prefix. Example
arn:aws:autoscaling:
to filter results for all autoscaling related alarms. - max_records (int) – Maximum records to be returned. Default is 50.
Returns: List of MetricAlarms.
Return type: list
cloudwatch().alarms(state_value='ALARM')[0]
{
'ActionsEnabled': True,
'AlarmActions': ['arn:aws:autoscaling:...'],
'AlarmArn': 'arn:aws:cloudwatch:...',
'AlarmConfigurationUpdatedTimestamp': datetime.datetime(2016, 5, 12, 10, 44, 15, 707000, tzinfo=tzutc()),
'AlarmDescription': 'Scale-down if CPU < 50% for 10.0 minutes (Average)',
'AlarmName': 'metric-alarm-for-service-x',
'ComparisonOperator': 'LessThanThreshold',
'Dimensions': [
{
'Name': 'AutoScalingGroupName',
'Value': 'service-x-asg'
}
],
'EvaluationPeriods': 2,
'InsufficientDataActions': [],
'MetricName': 'CPUUtilization',
'Namespace': 'AWS/EC2',
'OKActions': [],
'Period': 300,
'StateReason': 'Threshold Crossed: 1 datapoint (36.1) was less than the threshold (50.0).',
'StateReasonData': '{...}',
'StateUpdatedTimestamp': datetime.datetime(2016, 5, 12, 10, 44, 16, 294000, tzinfo=tzutc()),
'StateValue': 'ALARM',
'Statistic': 'Average',
'Threshold': 50.0
}