## [[Off-Chain Data Management with IPFS and Filecoin|IPFS Storage]] and [[ZKP/ZKP Base Layer/ZKP Blockchain/Storage Layer/Data Retrieval and Verification|PoSp Security]] The tokenization process begins with data ingestion, where [[ZKP/Data Marketplace/Tokenized Datasets/Comprehensive Mechanisms of Tokenized Datasets|datasets]] are [[ZKP/Data Marketplace/User Interactions/Interactions of Dataset Consumers|uploaded]] to [[ZKP/Data Marketplace/High-Level Overview/Off-Chain Storage with IPFS|IPFS]], a content-addressable storage network that enables decentralized data retrieval, coordinated through [[Proof Pods in the Data Marketplace|off-chain workers]] [87]. This generates a Content Identifier (CID)—[[ZKP/ZKP Base Layer/ZKP Blockchain/Storage Layer/On-Chain Metadata Storage|a cryptographic hash serving as an address and integrity check]]. For redundancy and fault tolerance, the system leverages erasure coding as described in the base layer specifications, allowing data reconstruction even when some shards are unavailable. [[Proof of Space (PoSp)]]] enhances this by requiring nodes to provide storage proofs through custom pallets, ensuring data persistence in a manner akin to a distributed vault with redundancy [79]. The IPFS implementation in the [[ZKP/Data Marketplace/Intro|Data Marketplace]] leverages several key features of the protocol to enhance security and efficiency: - Content addressing through cryptographic hashing ensures that data retrieval requests specify exactly what is being requested, not where it is located. This removes the need to trust specific storage providers and enables content verification upon receipt. - Deduplication through content addressing automatically eliminates redundant storage of identical data, improving efficiency across the network. If two datasets contain overlapping information, only the unique portions consume additional storage resources. - The Merkle Directed Acyclic Graph (DAG) structure of IPFS divides data into blocks linked by cryptographic hashes, enabling efficient verification, transfer, and caching of dataset components. This structure allows incremental verification and transfer of large datasets, enhancing performance for multi-gigabyte AI training datasets. To ensure long-term storage reliability, PoSp requires nodes to periodically demonstrate possession of the dataset through cryptographic challenges managed by pallets [79]. [[ZKP/ZKP Base Layer/ZKP Blockchain/Strategic Rationale/Energy Efficiency/Energy Efficiency and Performance|Unlike Proof of Work, which relies on computational effort]], PoSp leverages disk space, aligning with the marketplace's goal of resource efficiency. This mechanism discourages data loss or tampering, as nodes are incentivized through rewards to maintain data integrity. See also: [[ZKP/Data Marketplace/Tokenized Datasets/Encryption and ZKP Ownership Verification|Encryption and ZKP Ownership Verification]]