Content Authenticity Initative

CAI is developing standards & specifications for a "simple, extensible and distributed media provenance solution". The initial mission of the CAI is to develop the industry standard for content attribution. It is developed by Adobe with input from the BBC, Microsoft, the New York Times and Witness (Sam Gregory).

CAI uses [[Extensible Metadata Platform (XMP)]] to embed a URL within a file that can point at a 'Claim', which is a JSON based descripition of assertions.

CAI use the term 'provenence' to describe metadata surrounding the creation of media. The goal is not to "provide value judgments about whether a given set of provenance data is 'good' or 'bad', merely whether the data can be verified as associated with the underlying asset, correctly formed, and free from tampering."

The CAI vision is that metadata is captured at generation (a photograph is taken), then endures thorugh subsequent image editing software (Photoshop), CMS publishing, social-media sharing. "CAI data will travel with the asset and any user who sees the asset posted by any other user, will be able to investigate the source and original context of the asset." Part of the ambition of the project is to support fact-checking and limit deep-fakes.

The white paper is:

Data structure

As shown in “Workflows” above, each of the actors that create or process an asset will produce one or more assertions about what they did, when they did it, and (if possible) on behalf of whom. An assertion is typically a JSON-based data structure which represents a declaration made by an actor about an asset at a specific time. Some of these actors will be human and add human-generated information (e.g. copyright) while others are machines (software/hardware) providing information they generated (e.g. camera type or device time). Each type of assertion is either defined in the CAI specification, defined by other metadata standards such as XMP or or can be custom data for a particular actor or workflow. Assertions are cryptographically hashed and their hashes are gathered together into a claim. A claim is a digitally signed data structure that represents a set of assertions along with one or more cryptographic hashes on the data of an asset. The signature ensures the integrity of the claim and makes the system tamper-evident. A claim can be either directly or indirectly embedded into an asset as it moves through the life of the asset. Each time the asset reaches a specific key point in its lifecycle, such as initial creation, completion of some editing operations, publication to social media, etc. a new set of assertions and a claim are created. Each new claim refers to the previous claim, thus creating a chain of provenance for the asset


There are 'trust lists' which are lists of verified providers, similar to the EU's Trust lists in eiDAS regulation. This relate to the hardware or software manufacturer and can be used to offer psuedonomymity for content provenance (the white paper gives an example of human rights activist). Identity is provided by [[Decentralized Identifiers]] URIs.

Redaction of assertions. The system allows for assertions to be removed by subsequent processes, either because publishing the assertion would be problematic (e.g. the identity of the person who captured a video) or the assertion is no longer valid (e.g. an earlier thumbnail showing something that has since been cropped out).

ClaimReview Fact checking on metadata claims can be carried out using the Claim Review vocabulary.

Content Authenticity Initative