Availability metrics is defined as the "Percentage availability of the CDR platform over time".
Using the picture as a reference, the availability of the platform refers to the entire CDR solution stack and not just the various API endpoints (Banking, Common, Infosec, and admin).
How are the Big four and other data holders measuring this availability?
A number of approaches can be used, including:
Sending "Synthetic requests" to these API endpoints and receiving a 200 OK response and using that to measure availability metrics.
Pros: Ideal, as it measures the full response to a valid request.
Cons: This approach is not easy to realize. The Data Holder may have its own ADR registered in the ecosystem, or else may partner with an ADR for this activity, or employ some smart by-pass for requests from its synthetic client..
Sending API requests using a non-existent ADR (i.e. do not have valid consent/access tokens). When an endpoint returns an error such as invalid token use or mTLS failure, the test infers that the APIs are up and running.
Pros: This is simple to implement.
Cons: It confirms only that the API endpoint is running and can respond with an error to a particular invalid request.
Measuring the availability of the CDR Platform from within the solution. For example the DH might instrument all the solution components using an APM (Application Performance Management) tool.
Pros - Simple to implement. Typically most applications are monitored using one or more tools.
Cons: This validates that the solution components are running. However edge related issues would not be visible (e.g. some network issue preventing requests reaching the API gateway exposing the APIs).
How have the various Data Holders have approached this?
Can DSB or ACCC provide clariification on what approaches are used and which of the approaches above are compliant or non-compliant?
The DSB cannot comment on how each DH chooses to implement their monitoring. It is reasonable to expect some form of external API monitoring could be part of the solution. Internal monitoring could also be used.
The Data Holder can decide to use external API monitoring to infer availability of their CDR solution. The DH would need to be confident that it can infer unavailability from errors or scheduled outages.
Availability represents the whole CDR solution, so it applies to all APIs and services defined in the standards.
It is likely that external polling would be only part of the solution for Data Holders monitoring availability. Each Data Holder must decide, based on their architecture, how far into the application stack internal monitoring is required.
Unavailability may also be identified and reported based on internal incident management processes.
See issue 339.