diff options
author | Zuma copybara merger <zuma-automerger@google.com> | 2023-03-21 05:02:50 +0000 |
---|---|---|
committer | Copybara-Service <copybara-worker@google.com> | 2023-03-23 22:03:43 -0700 |
commit | 0a36ddb6b5b8c8b0de49b604b70eddd2d8b7ad54 (patch) | |
tree | 387863340c4f5e8e19fcd68494dabd6c2f3594c3 | |
parent | 7bb5660317e125dee80c6adbb27d1a3b136c15f4 (diff) | |
download | rio-0a36ddb6b5b8c8b0de49b604b70eddd2d8b7ad54.tar.gz |
[Copybara Auto Merge] Merge branch zuma into android14-gs-pixel-5.15
edgetpu: Continue powering up if the block is still on
Bug: 272701322
edgetpu: Add ABI documentation
edgetpu: usage_stats add cluster reconfigurations counters
Bug: 271372136
Bug: 271374892
edgetpu: usage_stats: process metrics v2 data
Bug: 271372136 (repeat)
Bug: 271374892 (repeat)
Signed-off-by: Zuma copybara merger <zuma-automerger@google.com>
GitOrigin-RevId: 599a31d4efcc191a247d4918b802758ec12e97ef
Change-Id: Ice5e0584766693ecf7a760cca9336d51b556f5ea
-rw-r--r-- | Documentation/ABI/stable/sysfs-class-edgetpu | 205 | ||||
-rw-r--r-- | Documentation/ABI/stable/thermal-cdev | 13 | ||||
-rw-r--r-- | drivers/edgetpu/edgetpu-usage-stats.c | 335 | ||||
-rw-r--r-- | drivers/edgetpu/edgetpu-usage-stats.h | 24 | ||||
-rw-r--r-- | drivers/edgetpu/mobile-pm.c | 2 | ||||
-rw-r--r-- | drivers/edgetpu/rio/config.h | 3 |
6 files changed, 415 insertions, 167 deletions
diff --git a/Documentation/ABI/stable/sysfs-class-edgetpu b/Documentation/ABI/stable/sysfs-class-edgetpu new file mode 100644 index 0000000..ad63661 --- /dev/null +++ b/Documentation/ABI/stable/sysfs-class-edgetpu @@ -0,0 +1,205 @@ +# SPDX-License-Identifier: GPL-2.0-only + +# Firmware Management + +What: /sys/class/edgetpu/edgetpu-soc/device/load_firmware +Date: January 2020 +Description: + To load a firmware file, echo the OS firmware location-relative path of the firmware + image file to load to this attribute. For example: + # echo google/my-test.fw > /sys/class/edgetpu/edgetpu-soc/device/load_firmware + cat this file to see the name of the currently-loaded firmware image. +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/firmware_type +Date: August 2020 +Description: + “prod” or “test” firmware type/flavor. (Or “unknown” or “custom” or “stage 2 + bootloader”.) +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/firmware_version +Date: November 2020 +Description: + Firmware major.minor version from the header, plus VII and KCI version numbers and + google3 build CL: + # cat /sys/class/edgetpu/hermosa.0.0/device/firmware_version + 1.0 vii=2 kci=1 cl=371245025 +Users: Edge TPU runtime library (libedgetpu) + +# General Status + +What: /sys/class/edgetpu/edgetpu-soc/device/clients +Date: July 2021 +Description: + List clients that have opened the device by process and thread IDs. Also shows + current wakelock counts for debugging which client is holding the device powered. +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/groups +Date: August 2021 +Description: + List currently formed, forming, and disbanding device groups, with client PIDs and + amount of host and dma-buf memory mapped to the TPU, plus errors and VCIDs. +Users: Edge TPU runtime library (libedgetpu) + +# Error Statistics +# These statistics are maintained by the kernel driver. + +What: /sys/class/edgetpu/edgetpu-soc/device/firmware_crash_count +Date: April 2021 +Description: + Count of “unrecoverable” firmware crash events; does not include “non-fatal” crashes + in non-privileged VII job processing code from which the firmware indicates it can + recover (that is, it only counts crashes in privileged firmware processing). (No + clear action.) +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/watchdog_timeout_count +Date: April 2021 +Description: + Count of watchdog timeout events, including both host software watchdog timeouts + (that is, firmware fails to respond to periodic query by the host kernel) and + device-side watchdog timeout events sent from firmware (on Hermosa). (No clear + action.) +Users: Edge TPU runtime library (libedgetpu) + +# Performance/Usage Statistics +# These stats are gathered from the firmware periodically while the device is powered up, and also +# at mobile power down time (or Hermosa device group disband time). Reading the sysfs file will +# immediately poll for updated values if the TPU device is currently powered on; if the (mobile) +# device is powered down then the last received value is returned. Some of these attributes are +# only provided for certain chipsets as noted below. + +What: /sys/class/edgetpu/edgetpu-soc/device/tpu_usage +Date: January 2021 +Description: + TPU usage duration in microseconds per “UID” (an Android app context ID for + Android/Pixel; on Hermosa the UID is always zero). Write to clear. Used for + Android battery consumption blaming. +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/tpu_utilization +Date: February 2021 +Description: + TPU (GCB only) utilization as a percentage of time. (No clear action.) +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/device_utilization +Date: February 2021 +Description: + Whole TPU device utilization as a percentage of time. (No clear action.) +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/tpu_active_cycle_count +Date: March 2021 +Description: + Number of active TPU cycles since last reset (Mobile power down or Hermosa device + group disband). Write to clear. +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/tpu_throttle_stall_count +Date: March 2021 +Description: + Number of hardware throttling stall cycles inserted since last reset (Mobile power + down or Hermosa device group disband). Write to clear. (Always zero on Abrolhos.) +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/inference_count +Date: April 2021 +Description: + Number of graph invocations. Write to clear. +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/tpu_op_count +Date: April 2021 +Description: + Number of TPU offload op invocations. Write to clear. +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/param_cache_hit_count +Date: April 2021 +Description: + Number of times a TPU op invocation used its cached parameters. Write to clear. +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/param_cache_miss_count +Date: April 2021 +Description: + Number of times a TPU op invocation had to cache its parameters. Write to clear. +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/context_preempt_count +Date: April 2021 +Description: + Number of times an application/client context was preempted by another context. + Write to clear. +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/outstanding_commands_max +Date: April 2021 +Description: + Maximum number of outstanding commands. Write to clear. +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/preempt_depth_max +Date: April 2021 +Description: + Maximum number of preempted application/client contexts at any time. Write to + clear. +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/fw_thread_stats +Date: April 2021 +Description: + Maximum stack depth per thread id. Write to clear. +Users: Edge TPU runtime library (libedgetpu) + +# The following are not present or not meaningful (always zero) on Abrolhos or Hermosa; are present +# on mobile Janeiro and beyond. + +What: /sys/class/edgetpu/edgetpu-soc/device/hardware_preempt_count +Date: November 2021 +Description: + Number of times a hardware preemption occurred. Write to clear. +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/hardware_ctx_save_time +Date: April 2022 +Description: + Hardware context save time in usecs. Write to clear. +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/hardware_ctx_save_time_max +Date: April 2022 +Description: + Maximum time spent saving a hardware context, in usecs. Write to clear. +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/scalar_fence_wait_time +Date: April 2022 +Description: + Total time spent waiting to hit a scalar fence during hardware preemption, in usecs. + Write to clear. +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/scalar_fence_wait_time_max +Date: April 2022 +Description: + Maximum time spent waiting to hit a scalar fence during hardware preemption, in + usecs. Write to clear. +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/long_suspend_count +Date: April 2022 +Description: + Count of “long” suspends (“number of times Pipeline::Suspend takes longer than + 5ms”). Write to clear. +Users: Edge TPU runtime library (libedgetpu) + +What: /sys/class/edgetpu/edgetpu-soc/device/suspend_time_max +Date: April 2022 +Description: + Maximum suspend time (“high water mark for time spent in Pipeline::Suspend”). Write + to clear. +Users: Edge TPU runtime library (libedgetpu) diff --git a/Documentation/ABI/stable/thermal-cdev b/Documentation/ABI/stable/thermal-cdev new file mode 100644 index 0000000..3c457e2 --- /dev/null +++ b/Documentation/ABI/stable/thermal-cdev @@ -0,0 +1,13 @@ +# SPDX-License-Identifier: GPL-2.0-only + +What: /dev/thermal/cdev-by-name/tpu_cooling/user_vote +Date: October 2021 +Description: + To set a thermal state vote. +Users: Tj Thermal + +What: /dev/thermal/cdev-by-name/tpu_cooling/state2power_table +Date: March 2023 +Description: + Thermal state to power consumption table for thermal throttliing. +Users: Tj Thermal diff --git a/drivers/edgetpu/edgetpu-usage-stats.c b/drivers/edgetpu/edgetpu-usage-stats.c index 60751dd..9934ca6 100644 --- a/drivers/edgetpu/edgetpu-usage-stats.c +++ b/drivers/edgetpu/edgetpu-usage-stats.c @@ -74,6 +74,7 @@ int edgetpu_usage_add(struct edgetpu_dev *etdev, struct tpu_usage *tpu_usage) if (!ustats) return 0; + /* Note: as of metrics v2 the cluster_id is always zero and is ignored. */ etdev_dbg(etdev, "%s: uid=%u state=%u dur=%u", __func__, tpu_usage->uid, tpu_usage->power_state, tpu_usage->duration_us); @@ -125,63 +126,78 @@ static void edgetpu_utilization_update( mutex_unlock(&ustats->usage_stats_lock); } -static void edgetpu_counter_update( - struct edgetpu_dev *etdev, - struct edgetpu_usage_counter *counter) +static void edgetpu_counter_update(struct edgetpu_dev *etdev, struct edgetpu_usage_counter *counter, + uint version) { struct edgetpu_usage_stats *ustats = etdev->usage_stats; + uint component = version > 1 ? counter->component_id : 0; if (!ustats) return; - etdev_dbg(etdev, "%s: type=%d value=%llu\n", __func__, - counter->type, counter->value); + etdev_dbg(etdev, "%s: type=%d value=%llu comp=%u\n", __func__, counter->type, + counter->value, component); mutex_lock(&ustats->usage_stats_lock); if (counter->type >= 0 && counter->type < EDGETPU_COUNTER_COUNT) - ustats->counter[counter->type] += counter->value; + ustats->counter[counter->type][component] += counter->value; mutex_unlock(&ustats->usage_stats_lock); } -static void edgetpu_counter_clear( - struct edgetpu_dev *etdev, - enum edgetpu_usage_counter_type counter_type) +static void edgetpu_counter_clear(struct edgetpu_dev *etdev, + enum edgetpu_usage_counter_type counter_type) { struct edgetpu_usage_stats *ustats = etdev->usage_stats; + int i; - if (!ustats) - return; if (counter_type >= EDGETPU_COUNTER_COUNT) return; mutex_lock(&ustats->usage_stats_lock); - ustats->counter[counter_type] = 0; + for (i = 0; i < EDGETPU_TPU_CLUSTER_COUNT; i++) + ustats->counter[counter_type][i] = 0; mutex_unlock(&ustats->usage_stats_lock); } -static void edgetpu_max_watermark_update( - struct edgetpu_dev *etdev, - struct edgetpu_usage_max_watermark *max_watermark) +static void edgetpu_max_watermark_update(struct edgetpu_dev *etdev, + struct edgetpu_usage_max_watermark *max_watermark, + uint version) { struct edgetpu_usage_stats *ustats = etdev->usage_stats; + uint component = version > 1 ? max_watermark->component_id : 0; if (!ustats) return; - etdev_dbg(etdev, "%s: type=%d value=%llu\n", __func__, - max_watermark->type, max_watermark->value); + etdev_dbg(etdev, "%s: type=%d value=%llu comp=%u\n", __func__, max_watermark->type, + max_watermark->value, component); if (max_watermark->type < 0 || max_watermark->type >= EDGETPU_MAX_WATERMARK_TYPE_COUNT) return; mutex_lock(&ustats->usage_stats_lock); - if (max_watermark->value > ustats->max_watermark[max_watermark->type]) - ustats->max_watermark[max_watermark->type] = + if (max_watermark->value > ustats->max_watermark[max_watermark->type][component]) + ustats->max_watermark[max_watermark->type][component] = max_watermark->value; mutex_unlock(&ustats->usage_stats_lock); } +static void edgetpu_max_watermark_clear(struct edgetpu_dev *etdev, + enum edgetpu_usage_max_watermark_type max_watermark_type) +{ + struct edgetpu_usage_stats *ustats = etdev->usage_stats; + int i; + + if (max_watermark_type < 0 || max_watermark_type >= EDGETPU_MAX_WATERMARK_TYPE_COUNT) + return; + + mutex_lock(&ustats->usage_stats_lock); + for (i = 0; i < EDGETPU_TPU_CLUSTER_COUNT; i++) + ustats->max_watermark[max_watermark_type][i] = 0; + mutex_unlock(&ustats->usage_stats_lock); +} + static void edgetpu_thread_stats_update( struct edgetpu_dev *etdev, struct edgetpu_thread_stats *thread_stats) @@ -288,19 +304,16 @@ void edgetpu_usage_stats_process_buffer(struct edgetpu_dev *etdev, void *buf) etdev, &metric->component_activity); break; case EDGETPU_METRIC_TYPE_COUNTER: - edgetpu_counter_update(etdev, &metric->counter); + edgetpu_counter_update(etdev, &metric->counter, version); break; case EDGETPU_METRIC_TYPE_MAX_WATERMARK: - edgetpu_max_watermark_update( - etdev, &metric->max_watermark); + edgetpu_max_watermark_update(etdev, &metric->max_watermark, version); break; case EDGETPU_METRIC_TYPE_THREAD_STATS: - edgetpu_thread_stats_update( - etdev, &metric->thread_stats); + edgetpu_thread_stats_update(etdev, &metric->thread_stats); break; case EDGETPU_METRIC_TYPE_DVFS_FREQUENCY_INFO: - edgetpu_dvfs_frequency_update( - etdev, metric->dvfs_frequency_info); + edgetpu_dvfs_frequency_update(etdev, metric->dvfs_frequency_info); break; default: etdev_dbg(etdev, "%s: %d: skip unknown type=%u", @@ -328,36 +341,72 @@ int edgetpu_usage_get_utilization(struct edgetpu_dev *etdev, return val; } -static int64_t edgetpu_usage_get_counter( - struct edgetpu_dev *etdev, - enum edgetpu_usage_counter_type counter_type) +/* + * Resyncs firmware stats and formats the requested counter in the supplied buffer. + * + * If @report_per_cluster is true, and if the firmware implements metrics V2 or higher, + * then one value is formatted per cluster (for chips with only one cluster only one value is + * formatted). + * + * Returns the number of bytes written to buf. + */ +static ssize_t edgetpu_usage_format_counter(struct edgetpu_dev *etdev, char *buf, + enum edgetpu_usage_counter_type counter_type, + bool report_per_cluster) { struct edgetpu_usage_stats *ustats = etdev->usage_stats; - int64_t val; + uint ncomponents = report_per_cluster && !etdev->usage_stats->use_metrics_v1 ? + EDGETPU_TPU_CLUSTER_COUNT : 1; + uint i; + ssize_t ret = 0; if (counter_type >= EDGETPU_COUNTER_COUNT) - return -1; + return 0; edgetpu_kci_update_usage(etdev); mutex_lock(&ustats->usage_stats_lock); - val = ustats->counter[counter_type]; + for (i = 0; i < ncomponents; i++) { + if (i) + ret += scnprintf(buf + ret, PAGE_SIZE - ret, " "); + ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%llu", + ustats->counter[counter_type][i]); + } mutex_unlock(&ustats->usage_stats_lock); - return val; + ret += scnprintf(buf + ret, PAGE_SIZE - ret, "\n"); + return ret; } -static int64_t edgetpu_usage_get_max_watermark( - struct edgetpu_dev *etdev, - enum edgetpu_usage_max_watermark_type max_watermark_type) +/* + * Resyncs firmware stats and formats the requested max watermark in the supplied buffer. + * + * If @report_per_cluster is true, and if the firmware implements metrics V2 or higher, + * then one value is formatted per cluster (for chips with only one cluster only one value is + * formatted). + * + * Returns the number of bytes written to buf. + */ +static ssize_t edgetpu_usage_format_max_watermark( + struct edgetpu_dev *etdev, char *buf, + enum edgetpu_usage_max_watermark_type max_watermark_type, bool report_per_cluster) { struct edgetpu_usage_stats *ustats = etdev->usage_stats; - int64_t val; + uint ncomponents = report_per_cluster && !etdev->usage_stats->use_metrics_v1 ? + EDGETPU_TPU_CLUSTER_COUNT : 1; + uint i; + ssize_t ret = 0; if (max_watermark_type >= EDGETPU_MAX_WATERMARK_TYPE_COUNT) - return -1; + return 0; edgetpu_kci_update_usage(etdev); mutex_lock(&ustats->usage_stats_lock); - val = ustats->max_watermark[max_watermark_type]; + for (i = 0; i < ncomponents; i++) { + if (i) + ret += scnprintf(buf + ret, PAGE_SIZE - ret, " "); + ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%llu", + ustats->max_watermark[max_watermark_type][i]); + } mutex_unlock(&ustats->usage_stats_lock); - return val; + ret += scnprintf(buf + ret, PAGE_SIZE - ret, "\n"); + return ret; } static ssize_t tpu_usage_show(struct device *dev, @@ -471,11 +520,8 @@ static ssize_t tpu_active_cycle_count_show(struct device *dev, char *buf) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - int64_t val; - val = edgetpu_usage_get_counter(etdev, - EDGETPU_COUNTER_TPU_ACTIVE_CYCLES); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return edgetpu_usage_format_counter(etdev, buf, EDGETPU_COUNTER_TPU_ACTIVE_CYCLES, false); } static ssize_t tpu_active_cycle_count_store(struct device *dev, @@ -496,11 +542,8 @@ static ssize_t tpu_throttle_stall_count_show(struct device *dev, char *buf) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - int64_t val; - val = edgetpu_usage_get_counter(etdev, - EDGETPU_COUNTER_TPU_THROTTLE_STALLS); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return edgetpu_usage_format_counter(etdev, buf, EDGETPU_COUNTER_TPU_THROTTLE_STALLS, false); } static ssize_t tpu_throttle_stall_count_store(struct device *dev, @@ -521,11 +564,8 @@ static ssize_t inference_count_show(struct device *dev, struct device_attribute *attr, char *buf) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - int64_t val; - val = edgetpu_usage_get_counter(etdev, - EDGETPU_COUNTER_INFERENCES); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return edgetpu_usage_format_counter(etdev, buf, EDGETPU_COUNTER_INFERENCES, true); } static ssize_t inference_count_store(struct device *dev, @@ -541,21 +581,15 @@ static ssize_t inference_count_store(struct device *dev, static DEVICE_ATTR(inference_count, 0664, inference_count_show, inference_count_store); -static ssize_t tpu_op_count_show(struct device *dev, - struct device_attribute *attr, char *buf) +static ssize_t tpu_op_count_show(struct device *dev, struct device_attribute *attr, char *buf) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - int64_t val; - val = edgetpu_usage_get_counter(etdev, - EDGETPU_COUNTER_TPU_OPS); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return edgetpu_usage_format_counter(etdev, buf, EDGETPU_COUNTER_TPU_OPS, true); } -static ssize_t tpu_op_count_store(struct device *dev, - struct device_attribute *attr, - const char *buf, - size_t count) +static ssize_t tpu_op_count_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); @@ -564,22 +598,16 @@ static ssize_t tpu_op_count_store(struct device *dev, } static DEVICE_ATTR(tpu_op_count, 0664, tpu_op_count_show, tpu_op_count_store); -static ssize_t param_cache_hit_count_show(struct device *dev, - struct device_attribute *attr, +static ssize_t param_cache_hit_count_show(struct device *dev, struct device_attribute *attr, char *buf) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - int64_t val; - val = edgetpu_usage_get_counter(etdev, - EDGETPU_COUNTER_PARAM_CACHE_HITS); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return edgetpu_usage_format_counter(etdev, buf, EDGETPU_COUNTER_PARAM_CACHE_HITS, false); } -static ssize_t param_cache_hit_count_store(struct device *dev, - struct device_attribute *attr, - const char *buf, - size_t count) +static ssize_t param_cache_hit_count_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); @@ -589,22 +617,16 @@ static ssize_t param_cache_hit_count_store(struct device *dev, static DEVICE_ATTR(param_cache_hit_count, 0664, param_cache_hit_count_show, param_cache_hit_count_store); -static ssize_t param_cache_miss_count_show(struct device *dev, - struct device_attribute *attr, +static ssize_t param_cache_miss_count_show(struct device *dev, struct device_attribute *attr, char *buf) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - int64_t val; - val = edgetpu_usage_get_counter(etdev, - EDGETPU_COUNTER_PARAM_CACHE_MISSES); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return edgetpu_usage_format_counter(etdev, buf, EDGETPU_COUNTER_PARAM_CACHE_MISSES, false); } -static ssize_t param_cache_miss_count_store(struct device *dev, - struct device_attribute *attr, - const char *buf, - size_t count) +static ssize_t param_cache_miss_count_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); @@ -614,22 +636,16 @@ static ssize_t param_cache_miss_count_store(struct device *dev, static DEVICE_ATTR(param_cache_miss_count, 0664, param_cache_miss_count_show, param_cache_miss_count_store); -static ssize_t context_preempt_count_show(struct device *dev, - struct device_attribute *attr, +static ssize_t context_preempt_count_show(struct device *dev, struct device_attribute *attr, char *buf) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - int64_t val; - val = edgetpu_usage_get_counter(etdev, - EDGETPU_COUNTER_CONTEXT_PREEMPTS); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return edgetpu_usage_format_counter(etdev, buf, EDGETPU_COUNTER_CONTEXT_PREEMPTS, true); } -static ssize_t context_preempt_count_store(struct device *dev, - struct device_attribute *attr, - const char *buf, - size_t count) +static ssize_t context_preempt_count_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); @@ -643,10 +659,8 @@ static ssize_t hardware_preempt_count_show(struct device *dev, struct device_att char *buf) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - int64_t val; - val = edgetpu_usage_get_counter(etdev, EDGETPU_COUNTER_HARDWARE_PREEMPTS); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return edgetpu_usage_format_counter(etdev, buf, EDGETPU_COUNTER_HARDWARE_PREEMPTS, true); } static ssize_t hardware_preempt_count_store(struct device *dev, struct device_attribute *attr, @@ -664,10 +678,9 @@ static ssize_t hardware_ctx_save_time_show(struct device *dev, struct device_att char *buf) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - int64_t val; - val = edgetpu_usage_get_counter(etdev, EDGETPU_COUNTER_HARDWARE_CTX_SAVE_TIME_US); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return edgetpu_usage_format_counter(etdev, buf, EDGETPU_COUNTER_HARDWARE_CTX_SAVE_TIME_US, + true); } static ssize_t hardware_ctx_save_time_store(struct device *dev, struct device_attribute *attr, @@ -685,10 +698,9 @@ static ssize_t scalar_fence_wait_time_show(struct device *dev, struct device_att char *buf) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - int64_t val; - val = edgetpu_usage_get_counter(etdev, EDGETPU_COUNTER_SCALAR_FENCE_WAIT_TIME_US); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return edgetpu_usage_format_counter(etdev, buf, EDGETPU_COUNTER_SCALAR_FENCE_WAIT_TIME_US, + true); } static ssize_t scalar_fence_wait_time_store(struct device *dev, struct device_attribute *attr, @@ -703,13 +715,11 @@ static DEVICE_ATTR(scalar_fence_wait_time, 0664, scalar_fence_wait_time_show, scalar_fence_wait_time_store); static ssize_t long_suspend_count_show(struct device *dev, struct device_attribute *attr, - char *buf) + char *buf) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - int64_t val; - val = edgetpu_usage_get_counter(etdev, EDGETPU_COUNTER_LONG_SUSPEND); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return edgetpu_usage_format_counter(etdev, buf, EDGETPU_COUNTER_LONG_SUSPEND, false); } static ssize_t long_suspend_count_store(struct device *dev, struct device_attribute *attr, @@ -723,15 +733,53 @@ static ssize_t long_suspend_count_store(struct device *dev, struct device_attrib static DEVICE_ATTR(long_suspend_count, 0664, long_suspend_count_show, long_suspend_count_store); +#if EDGETPU_TPU_CLUSTER_COUNT > 1 +static ssize_t reconfigurations_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct edgetpu_dev *etdev = dev_get_drvdata(dev); + + return edgetpu_usage_format_counter(etdev, buf, EDGETPU_COUNTER_RECONFIGURATIONS, false); +} + +static ssize_t reconfigurations_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct edgetpu_dev *etdev = dev_get_drvdata(dev); + + edgetpu_counter_clear(etdev, EDGETPU_COUNTER_RECONFIGURATIONS); + return count; +} +static DEVICE_ATTR(reconfigurations, 0664, reconfigurations_show, reconfigurations_store); + +static ssize_t preempt_reconfigurations_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct edgetpu_dev *etdev = dev_get_drvdata(dev); + + return edgetpu_usage_format_counter(etdev, buf, EDGETPU_COUNTER_PREEMPT_RECONFIGURATIONS, + false); +} + +static ssize_t preempt_reconfigurations_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct edgetpu_dev *etdev = dev_get_drvdata(dev); + + edgetpu_counter_clear(etdev, EDGETPU_COUNTER_PREEMPT_RECONFIGURATIONS); + return count; +} +static DEVICE_ATTR(preempt_reconfigurations, 0664, preempt_reconfigurations_show, + preempt_reconfigurations_store); +#endif /* EDGETPU_TPU_CLUSTER_COUNT > 1 */ + + static ssize_t outstanding_commands_max_show( struct device *dev, struct device_attribute *attr, char *buf) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - int64_t val; - val = edgetpu_usage_get_max_watermark( - etdev, EDGETPU_MAX_WATERMARK_OUT_CMDS); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return edgetpu_usage_format_max_watermark(etdev, buf, EDGETPU_MAX_WATERMARK_OUT_CMDS, + false); } static ssize_t outstanding_commands_max_store( @@ -739,14 +787,8 @@ static ssize_t outstanding_commands_max_store( const char *buf, size_t count) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - struct edgetpu_usage_stats *ustats = etdev->usage_stats; - - if (ustats) { - mutex_lock(&ustats->usage_stats_lock); - ustats->max_watermark[EDGETPU_MAX_WATERMARK_OUT_CMDS] = 0; - mutex_unlock(&ustats->usage_stats_lock); - } + edgetpu_max_watermark_clear(etdev, EDGETPU_MAX_WATERMARK_OUT_CMDS); return count; } static DEVICE_ATTR(outstanding_commands_max, 0664, @@ -757,11 +799,9 @@ static ssize_t preempt_depth_max_show( struct device *dev, struct device_attribute *attr, char *buf) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - int64_t val; - val = edgetpu_usage_get_max_watermark( - etdev, EDGETPU_MAX_WATERMARK_PREEMPT_DEPTH); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return edgetpu_usage_format_max_watermark(etdev, buf, EDGETPU_MAX_WATERMARK_PREEMPT_DEPTH, + true); } static ssize_t preempt_depth_max_store( @@ -769,14 +809,8 @@ static ssize_t preempt_depth_max_store( const char *buf, size_t count) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - struct edgetpu_usage_stats *ustats = etdev->usage_stats; - - if (ustats) { - mutex_lock(&ustats->usage_stats_lock); - ustats->max_watermark[EDGETPU_MAX_WATERMARK_PREEMPT_DEPTH] = 0; - mutex_unlock(&ustats->usage_stats_lock); - } + edgetpu_max_watermark_clear(etdev, EDGETPU_MAX_WATERMARK_PREEMPT_DEPTH); return count; } static DEVICE_ATTR(preempt_depth_max, 0664, preempt_depth_max_show, @@ -786,11 +820,10 @@ static ssize_t hardware_ctx_save_time_max_show( struct device *dev, struct device_attribute *attr, char *buf) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - int64_t val; - val = edgetpu_usage_get_max_watermark( - etdev, EDGETPU_MAX_WATERMARK_HARDWARE_CTX_SAVE_TIME_US); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return edgetpu_usage_format_max_watermark(etdev, buf, + EDGETPU_MAX_WATERMARK_HARDWARE_CTX_SAVE_TIME_US, + true); } static ssize_t hardware_ctx_save_time_max_store( @@ -798,14 +831,8 @@ static ssize_t hardware_ctx_save_time_max_store( const char *buf, size_t count) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - struct edgetpu_usage_stats *ustats = etdev->usage_stats; - - if (ustats) { - mutex_lock(&ustats->usage_stats_lock); - ustats->max_watermark[EDGETPU_MAX_WATERMARK_HARDWARE_CTX_SAVE_TIME_US] = 0; - mutex_unlock(&ustats->usage_stats_lock); - } + edgetpu_max_watermark_clear(etdev, EDGETPU_MAX_WATERMARK_HARDWARE_CTX_SAVE_TIME_US); return count; } static DEVICE_ATTR(hardware_ctx_save_time_max, 0664, hardware_ctx_save_time_max_show, @@ -815,11 +842,9 @@ static ssize_t scalar_fence_wait_time_max_show( struct device *dev, struct device_attribute *attr, char *buf) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - int64_t val; - val = edgetpu_usage_get_max_watermark( - etdev, EDGETPU_MAX_WATERMARK_SCALAR_FENCE_WAIT_TIME_US); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return edgetpu_usage_format_max_watermark( + etdev, buf, EDGETPU_MAX_WATERMARK_SCALAR_FENCE_WAIT_TIME_US, true); } static ssize_t scalar_fence_wait_time_max_store( @@ -827,14 +852,8 @@ static ssize_t scalar_fence_wait_time_max_store( const char *buf, size_t count) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - struct edgetpu_usage_stats *ustats = etdev->usage_stats; - - if (ustats) { - mutex_lock(&ustats->usage_stats_lock); - ustats->max_watermark[EDGETPU_MAX_WATERMARK_SCALAR_FENCE_WAIT_TIME_US] = 0; - mutex_unlock(&ustats->usage_stats_lock); - } + edgetpu_max_watermark_clear(etdev, EDGETPU_MAX_WATERMARK_SCALAR_FENCE_WAIT_TIME_US); return count; } static DEVICE_ATTR(scalar_fence_wait_time_max, 0664, scalar_fence_wait_time_max_show, @@ -844,11 +863,9 @@ static ssize_t suspend_time_max_show( struct device *dev, struct device_attribute *attr, char *buf) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - int64_t val; - val = edgetpu_usage_get_max_watermark( - etdev, EDGETPU_MAX_WATERMARK_SUSPEND_TIME_US); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return edgetpu_usage_format_max_watermark(etdev, buf, EDGETPU_MAX_WATERMARK_SUSPEND_TIME_US, + false); } static ssize_t suspend_time_max_store( @@ -856,14 +873,8 @@ static ssize_t suspend_time_max_store( const char *buf, size_t count) { struct edgetpu_dev *etdev = dev_get_drvdata(dev); - struct edgetpu_usage_stats *ustats = etdev->usage_stats; - - if (ustats) { - mutex_lock(&ustats->usage_stats_lock); - ustats->max_watermark[EDGETPU_MAX_WATERMARK_SUSPEND_TIME_US] = 0; - mutex_unlock(&ustats->usage_stats_lock); - } + edgetpu_max_watermark_clear(etdev, EDGETPU_MAX_WATERMARK_SUSPEND_TIME_US); return count; } static DEVICE_ATTR(suspend_time_max, 0664, suspend_time_max_show, @@ -924,6 +935,10 @@ static struct attribute *usage_stats_dev_attrs[] = { &dev_attr_hardware_ctx_save_time.attr, &dev_attr_scalar_fence_wait_time.attr, &dev_attr_long_suspend_count.attr, +#if EDGETPU_TPU_CLUSTER_COUNT > 1 + &dev_attr_reconfigurations.attr, + &dev_attr_preempt_reconfigurations.attr, +#endif &dev_attr_outstanding_commands_max.attr, &dev_attr_preempt_depth_max.attr, &dev_attr_hardware_ctx_save_time_max.attr, diff --git a/drivers/edgetpu/edgetpu-usage-stats.h b/drivers/edgetpu/edgetpu-usage-stats.h index ee908e1..2d97043 100644 --- a/drivers/edgetpu/edgetpu-usage-stats.h +++ b/drivers/edgetpu/edgetpu-usage-stats.h @@ -13,6 +13,9 @@ /* The highest version of usage metrics handled by this driver. */ #define EDGETPU_USAGE_METRIC_VERSION 2 +/* Max # of TPU clusters accounted for in the highest supported metrics version. */ +#define EDGETPU_USAGE_CLUSTERS_MAX 3 + /* * Size in bytes of usage metric v1. * If fewer bytes than this are received then discard the invalid buffer. @@ -54,6 +57,7 @@ struct tpu_usage { /* Compute Core: TPU cluster ID. */ /* Called core_id in FW. */ + /* Note: as of metrics v2 the cluster_id is always zero and is ignored. */ uint8_t cluster_id; /* Reserved. Filling out the next 32-bit boundary. */ uint8_t reserved[3]; @@ -69,6 +73,7 @@ enum edgetpu_usage_component { /* Just the TPU core (scalar core and tiles) */ EDGETPU_USAGE_COMPONENT_TPU = 1, /* Control core (ARM Cortex-R52 CPU) */ + /* Note: this component is not reported as of metrics v2. */ EDGETPU_USAGE_COMPONENT_CONTROLCORE = 2, EDGETPU_USAGE_COMPONENT_COUNT = 3, /* number of components above */ @@ -114,10 +119,16 @@ enum edgetpu_usage_counter_type { /* The following counters are added in metrics v2. */ - /* Number of context switches on a compute core. */ + /* Counter 11 not used on TPU. */ EDGETPU_COUNTER_CONTEXT_SWITCHES = 11, - EDGETPU_COUNTER_COUNT = 12, /* number of counters above */ + /* Number of TPU Cluster Reconfigurations. */ + EDGETPU_COUNTER_RECONFIGURATIONS = 12, + + /* Number of TPU Cluster Reconfigurations motivated exclusively by a preemption. */ + EDGETPU_COUNTER_PREEMPT_RECONFIGURATIONS = 13, + + EDGETPU_COUNTER_COUNT = 14, /* number of counters above */ }; /* Generic counter. Only reported if it has a value larger than 0. */ @@ -173,10 +184,11 @@ struct __packed edgetpu_usage_max_watermark { /* Must be kept in sync with firmware enum class UsageTrackerThreadId. */ enum edgetpu_usage_threadid { /* Individual thread IDs do not have identifiers assigned. */ - /* Thread ID 14, used for other IP, is not used for TPU */ + + /* Thread ID 14 is not used for TPU */ /* Number of task identifiers. */ - EDGETPU_FW_THREAD_COUNT = 14, + EDGETPU_FW_THREAD_COUNT = 17, }; /* Statistics related to a single thread in firmware. */ @@ -225,8 +237,8 @@ struct edgetpu_usage_stats { DECLARE_HASHTABLE(uid_hash_table, UID_HASH_BITS); /* component utilization values reported by firmware */ int32_t component_utilization[EDGETPU_USAGE_COMPONENT_COUNT]; - int64_t counter[EDGETPU_COUNTER_COUNT]; - int64_t max_watermark[EDGETPU_MAX_WATERMARK_TYPE_COUNT]; + int64_t counter[EDGETPU_COUNTER_COUNT][EDGETPU_USAGE_CLUSTERS_MAX]; + int64_t max_watermark[EDGETPU_MAX_WATERMARK_TYPE_COUNT][EDGETPU_USAGE_CLUSTERS_MAX]; int32_t thread_stack_max[EDGETPU_FW_THREAD_COUNT]; struct mutex usage_stats_lock; }; diff --git a/drivers/edgetpu/mobile-pm.c b/drivers/edgetpu/mobile-pm.c index 53571e0..1e0cbb5 100644 --- a/drivers/edgetpu/mobile-pm.c +++ b/drivers/edgetpu/mobile-pm.c @@ -213,7 +213,7 @@ static int mobile_power_up(void *data) usleep_range(BLOCK_DOWN_MIN_DELAY_US, BLOCK_DOWN_MAX_DELAY_US); } while (++times < BLOCK_DOWN_RETRY_TIMES); if (times >= BLOCK_DOWN_RETRY_TIMES && !platform_pwr->is_block_down(etdev)) - return -EAGAIN; + etdev_warn(etdev, "Block is still on\n"); } etdev_info(etdev, "Powering up\n"); diff --git a/drivers/edgetpu/rio/config.h b/drivers/edgetpu/rio/config.h index 4434a2e..db6ae79 100644 --- a/drivers/edgetpu/rio/config.h +++ b/drivers/edgetpu/rio/config.h @@ -26,6 +26,9 @@ /* Pre-allocate 1 IOMMU domain per VCID */ #define EDGETPU_NUM_PREALLOCATED_DOMAINS EDGETPU_NUM_VCIDS +/* Number of TPU clusters for metrics handling. */ +#define EDGETPU_TPU_CLUSTER_COUNT 3 + /* Placeholder value */ #define EDGETPU_TZ_MAILBOX_ID 31 |