Metadata serves two distinct purposes in the [[ZKP/Data Marketplace/Tokenized Datasets/Comprehensive Mechanisms of Tokenized Datasets|tokenized datasets]] framework: ## Public Metadata Basic descriptive information such as schema definitions, creation timestamps, and general categorization is stored on-chain in [[Patricia Tries]] in plaintext to enable discovery and validation. This information is hashed using SHA-512 to ensure integrity, but the hash is stored alongside the plaintext data. ## Privacy-Sensitive Metadata Statistical summaries, detailed provenance information, and other potentially sensitive metadata can be [[ZKP/Data Marketplace/Tokenized Datasets/Encryption and ZKP Ownership Verification|encrypted]] and stored [[ZKP/Data Marketplace/High-Level Overview/Off-Chain Storage with IPFS|off-chain]] through off-chain workers, with only hash references maintained [[ZKP/ZKP Base Layer/ZKP Blockchain/Storage Layer/On-Chain Metadata Storage|on-chain.]] For this sensitive metadata, [[ZKP/ZKP Base Layer/Core Concepts/Zero-Knowledge Proofs|zero-knowledge proofs]] can selectively verify properties without revealing the underlying information. This dual approach balances the need for discoverability with [[ZKP/Data Marketplace/Technical Basis/Cryptographic Foundations/zk-SNARKs and Privacy Framework|privacy protection]], acknowledging that hashing alone does not provide confidentiality for on-chain information. _**The marketplace implements a standardized metadata schema that balances comprehensiveness with efficiency. The schema includes:**_ - **Dataset identification:** Basic information like title, description, version, and creation timestamp - **Technical specifications:** Format, size, encoding, compression method, and schema definition - **Quality indicators:** Completeness, consistency metrics, update frequency, and last verification date - **Domain-specific attributes:** Field-relevant indicators like resolution for images, sampling rate for audio, or collection methodology for surveys - **Usage terms:** License type, attribution requirements, and permitted use categories This structured approach enables efficient discovery and evaluation of datasets while ensuring that critical information is consistently available across the marketplace. The metadata serves several crucial functions in the tokenization process: - It enables efficient dataset discovery without requiring access to the full data - It provides the basis for quality assessment and validation before purchase - It facilitates provenance tracking and attribution for regulatory compliance - It documents the technical requirements for utilizing the dataset effectively _Consider a 500 MB dataset of weather records: its metadata might include the source, schema (e.g., columns for temperature, humidity, wind speed), statistical summaries (e.g., average temperature of 15°C), and a timestamp. The metadata is serialized, hashed with SHA-512 to produce a fixed-length digest, and linked to the CID. This allows consumers to verify the dataset's authenticity and relevance before purchase._ See also: [[ZKP/Data Marketplace/Tokenized Datasets/Smart Contract for Dataset Tokenization/DatasetToken Smart Contract|DatasetToken Smart Contract]]