Post

AWS ECS Insights: A Comprehensive Guide to Calculating CPU Utilization from ECS Endpoint

Introduction / Summary

This article will go though how to check and calculate the container CPU utilization from ECS metadata endpoint.

Scenario

Sometimes people want to know how to calculate the performance usages of the containers. They can use sidecar container to retrieve the data from the metrics provided by the ECS metadata endpoint. Thus they can collect, calculate and export the performance metrics like CPU utilization.

Explanation

The CPU metrics provided by ECS metadata endpoint

The ECS metadata document mentions the following metadata path regarding the resource usage of container/task:

  • ${ECS_CONTAINER_METADATA_URI_V4}/stats: This path returns Docker stats for the specific container.
  • ${ECS_CONTAINER_METADATA_URI_V4}/task/stats : This path returns Docker stats for all of the containers associated with the task.

By accessing the endpoint ${ECS_CONTAINER_METADATA_URI_V4}/stats inside a container, we can see the following output about CPU statistic:

1
$ curl ${ECS_CONTAINER_METADATA_URI_V4}/stats | jq '.cpu_stats'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
  "cpu_usage": {
    "total_usage": 6854309182,
    "percpu_usage": [
      3836064097,
      3018245085
    ],
    "usage_in_kernelmode": 3510000000,
    "usage_in_usermode": 2790000000
  },
  "system_cpu_usage": 1139128190000000,
  "online_cpus": 2,
  "throttling_data": {
    "periods": 0,
    "throttled_periods": 0,
    "throttled_time": 0
  }
}

We can see another field called precpu_stats. The Docker API ContainerStats document mentioned that the precpu_stats is the CPU statistic of the previous read, and is used to calculate the CPU usage percentage. precpu_stats is different from the cpu_stats field.

1
$ curl ${ECS_CONTAINER_METADATA_URI_V4}/stats | jq '.precpu_stats'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
  "cpu_usage": {
    "total_usage": 6854629990,
    "percpu_usage": [
      3836187199,
      3018442791
    ],
    "usage_in_kernelmode": 3510000000,
    "usage_in_usermode": 2790000000
  },
  "system_cpu_usage": 1139150390000000,
  "online_cpus": 2,
  "throttling_data": {
    "periods": 0,
    "throttled_periods": 0,
    "throttled_time": 0
  }
}

How to calculate CPU utilization?

As ECS metadata document mentions that these data are retrieved via Docker API ContainerStats , we can see the Docker API ContainerStats document mention how to calculate the CPU uilization:

  • cpu_delta = cpu_stats.cpu_usage.total_usage - precpu_stats.cpu_usage.total_usage
  • system_cpu_delta = cpu_stats.system_cpu_usage - precpu_stats.system_cpu_usage
  • number_cpus = lenght(cpu_stats.cpu_usage.percpu_usage) or cpu_stats.online_cpus
  • CPU usage % = (cpu_delta / system_cpu_delta) * number_cpus * 100.0

With the example above, we can then calculate the average CPU utilization:

  • cpu_delta = cpu_stats.cpu_usage.total_usage - precpu_stats.cpu_usage.total_usage
    • cpu_delta = 6854309182 - 6854629990
  • system_cpu_delta = cpu_stats.system_cpu_usage - precpu_stats.system_cpu_usage
    • system_cpu_delta = 1139128190000000 - 1139150390000000
  • number_cpus = lenght(cpu_stats.cpu_usage.percpu_usage) or cpu_stats.online_cpus
    • number_cpus = 2
  • CPU usage % = (cpu_delta / system_cpu_delta) * number_cpus * 100.0
    • CPU usage % = (6854309182 - 6854629990) / (1139128190000000 - 1139150390000000) * 2 * 100

Is system_cpu_usage considered as “shared” or “separated” for the containers ?

Some people are wondering if the system_cpu_usage is the same for each container stat output or the metrics is different for each container. For this question, we can check the related code, I can see that the system_cpu_usage is system-level:

  1. Find the system_cpu_usage: code
1
2
3
// System Usage. Linux only.

SystemUsage uint64 `json:"system_cpu_usage,omitempty"`
  1. Find that getSystemCPUUsage will return the CPU usage: code
1
2
3
4
5
6
7
// Sample system CPU usage close to container usage to avoid
// noise in metric calculations.
systemUsage, err := s.getSystemCPUUsage()
if err != nil {
  logrus.WithError(err).WithField("container_id", pair.container.ID).Errorf("collecting system cpu usage")
  continue
}
  1. Confirm the system CPU usage is retrieved from /proc/stat: code
1
2
3
4
func (s *Collector) getSystemCPUUsage() (uint64, error) {
	f, err := os.Open("/proc/stat")
...
...
  1. /proc/stat is kernel/system-wise static: proc(5) - Linux manual page
1
2
3
4
[root@ip-172-31-83-85 ~]# cat /proc/stat
cpu  1858672 145 937683 147711096 42100 0 8871 790 0 0
cpu0 948377 51 470796 73832328 20560 0 5919 200 0 0
cpu1 910294 94 466886 73878768 21539 0 2951 589 0 0
  1. Confirm that system_cpu_usage is the same for both containers in the same task by accessing ${ECS_CONTAINER_METADATA_URI_V4}/task/stats:
1
2
3
4
5
6
7
8
9
10
11
12
$ curl ${ECS_CONTAINER_METADATA_URI_V4}/task/stats

  "5f71e2b846442b1cc146d399135f490fdbbb5f3936b38119ffcf3c5d85308ac5": {
    "cpu_stats": {
...
      "system_cpu_usage": 1478570440000000,
...
  "b0f919cbff4c787ae765cbcc8b094d4375bdb002848d78e9e635356bd94d8484": {
    "cpu_stats": {
...
      "system_cpu_usage": 1478570450000000,
...

We can see system_cpu_usage is the system-level metrics and this metric will be the same for both containers.

Summary

ECS metadata endpoint provides metrics of container resources so people have the flexibility to collect, calculate the container performance by their own application or other third-party application like datadog.

We can also see that if we want to know deeper about the metrics, we can dig into the code and find the metrics resources to get more insight about the metrics.

This post is licensed under CC BY 4.0 by the author.