Coming out of the Google-funded [[Content Blockchain]] project, the ISCC's goal is to "establish content as the subject of transactions in decentralized and networked environments." In other words to use a 'multi-faceted lightweight fingerprint' standardised and used as an identifier.
ISCC has some similarities with hashing - a deterministic (always produces the same value), one-way, natural identifier for data of fixed length. Hashing can be:
Titusz Pan identified 6 layers of content identification from abstract to concrete commonly used and discussed:
1. Abstract creation / collection (ie a Journal & all its issues, a TV show & its episodes) 2. Meaning - Semantic Field (e.g. RDF tripples -`photo <contains> clown`) 3. Generic Manifestation (ie the inherent content, file2.jpg ≠ file1.jpg) 4. Media specific Manifestation (ie file.jpg ≠ file.png) 5. Exact Representation (ie file.jpg = file.jpg) 6. Individual Copy (ie print 27 of 100 ≠ print 28 of 100)
![[Pasted image 20210602190249.png]] "It might look complex but I have coded this in 500 lines of Python code. It's just functions, no classes"
ISCC is made up of four 13 character codes, ranging from abstract to concrete:
CC7YpFEVFZm8K - CYLXYDN5KREJi - CDUVSFKUtkrv5 - CR1RWa4MVvKvQ Meta-ID - Content-ID - Data-ID - Instance-ID
they can be separasted with hyphens (55chars) or as one long string (52chars)
Meta-Code (Layer 1) - abstract creation Metadata is a Title - max 128 bytes; and an extra info field - max 800 bytes. Seed Metadata is metadata that stays froxzen and immutable thru its existence. Floating metadata is mutable metadata that is managed in context with an ISCC.
Content-Code (Layer 3) - generic manifestation Recognises the inherent qualities of the media, separate from its encoding. There will be some variation in the content with transcoding - but a threshold of up to 10 bits in 64 is suggested as safely indicating the same file after transcoding. Conversely the divergence begins when a threshold of difference is passed - for e.g. changing one paragraph in a document with five paragraphs will appear very different to changing one document in a doc with 100,000 paragraphs, which might appear largely similar or even the same. "The senesitivity is in relation to the whole data". For images - the image is reduced to a fixed size, made black and white and then given a pixel by pixel analysis.
Data-Code (Layer 4) - Media Specific Manifestation Uses content defined chunking, or shift resistence chunking to recognise the same file, used in de-duplication systems. (guess - when media is transferred it's broken up and repackaged, and this overlooks changes from that process?)
Instance-Code (Layer 5) - Exact Representation A cryptographic hash of the file; the root of a Merkle Tree from the media object's raw data. Verifies data integrity. Is published as a 13 char code - but the metadata includes the full 256bit hash.
NB it handles Syntactical similarity, ie structural similarity. Semantic similarity (level 2) – ie meaning – isn't implemented.
When a files metadata changes - all the other identifiers stay the same.
Compared with other decentralised identifiers
'short, globally unique, persistent, resolvable, owned, verifiable, authenticated IDs' - pointing to an ISCC ID, and resolvable on a blockchain.
Open\ Source\ trailer\ 9.mp4 ISCC:CCjZXG82Vbut3-CVFDwoetYJkav-CDXq7fdDjZD4F-CRTbppbwJQZsQ
Open\ Source\ trailer\ 10\ -\ no\ soundtrack.mp4 ISCC:CCt8VWtRBFZzT-CVFDwoetYJkav-CD7jT3LUfqTwa-CRvepmivq1TLw