@victorkanu1.near [Posted on DevHub](/devgovgigs.near/widget/Post?id=3289) {"title":"Rustling Up Efficiency: Serialization Strategies for Merkle Trees","subtitle":"","description":"","date":"2024-05-01T22:00:36.503Z","content":"\nSerialization plays a crucial role in Rust smart contracts, particularly when it comes to the storage and manipulation of Merkle trees. While the merkletree crate is a powerful tool for constructing them, they do not implement serialization methods or functions. So, let’s take a deeper look into the significance of serialization and how it impacts how these data structures are stored and utilized within smart contracts.\n\n### Serialization in Rust Smart Contracts\n \nWhen working with Merkle trees in Rust, how the tree nodes are serialized can profoundly affect the efficiency and security of the smart contract. Serialization involves converting the in-memory representation of the Merkle tree into a format that can be stored, transmitted, or hashed. The choice of serialization format and the specific implementation details can significantly influence the size of the data, the speed of serialization and deserialization processes, and the overall performance of the smart contract.\n\nWhile the near_sdk’s near macro supports serialization (and deserialization) with borsh and serde, it's crucial to understand the implications of this abstraction. By relying solely on the derived implementations, developers may overlook the opportunity to optimize based on the specific requirements of their smart contract, where custom serialization logic may sometimes be necessary to achieve better performance, reduce storage costs, or ensure compatibility with external systems. This understanding empowers developers to decide on the best project serialization strategy.\n \n### Custom vs. Standard\n \nUsing libraries like serde and borsh for serialization in Rust provides a standardized, flexible approach with support for various formats and a suite of advanced features, which can streamline integration and reduce boilerplate code. However, the convenience of these general-purpose libraries can come with trade-offs, including potential performance and binary size overhead. These drawbacks might be critical in resource-constrained or high-throughput environments, and developers must know them to make the right project choices.\n\nCustom serialization can be a game-changer. By implementing serialization logic tailored to the data structure, developers can optimize for performance and gain fine-grained control over the process. In the case of a Merkle tree, a custom serialization function can be remarkably efficient. By bypassing the overhead of a generic library, custom serialization can outperform existing libraries in terms of speed and memory usage. However, this approach also increases the developer's responsibility for the code's correctness and security, as it lacks the reliability of libraries like serde or borsh. While offering flexibility for specific use cases, custom serialization can complicate maintainability and adaptability, potentially increasing the codebase's complexity and effort required for future modifications.\n\nIn the following section of this article, we will explore custom serialization. By examining the code and discussing some of the trade-offs, we aim to help developers decide if and when this approach is a better option, empowering them to make informed decisions based on their specific project requirements and constraints.\n \n### Compressing NFTs with a Merkle Tree\n \nMerkle trees have emerged as a powerful data structure in smart contracts, offering a novel approach to data compression and efficient state management. By leveraging the inherent properties of Merkle trees, smart contracts can significantly reduce storage requirements and optimize data verification processes. To illustrate this concept, let's explore a smart contract that compresses NFT data to minimize gas costs during minting.\n \n**For this example, we wrote the CompressedDataContract smart contract:**\n \n- Utilizes the NEAR protocol's SDK and leverages the merkletree crate to construct and manage the Merkle tree \n- Employs the sha2 crate for hashing purposes. \n- A custom `Sha256Digest hasher` ensures compatibility with the Merkle tree implementation. \n- The contract also provides essential functionality for creating new instances, updating and verifying the Merkle root, and performing data-related operations. \n \n**The contract implements serialization and deserialization methods to facilitate the storage and transmission of the Merkle tree data:**\n\n \n- Serialization takes the Merkle tree data and converts it into a compact binary format suitable for storage. \n- Deserialization reverses the process, taking the serialized binary data and reconstructing the Merkle tree structure in memory.\n- They are optimized for the specific requirements of the Merkle tree data structure, and consider the characteristics of the Merkle tree, such as the fixed-size hashes and the tree's hierarchical structure, to ensure efficient storage and retrieval operations.\n\nUltimately, the objective is to significantly reduce storage costs and gas consumption using custom serialization techniques. We want the compact binary representation of the Merkle tree data to minimize the amount of data that needs to be stored on the blockchain, resulting in lower storage fees.\n\n\n \n**Let's take a closer look at a custom serialization method for merkle trees in Rust:**\n \n```rust\npub fn serialize_merkle_tree_data(&self) -> Vec<u8> {\n let mut serialized_data = vec![0u8; 32 * self.merkle_tree_data.len()];\n let mut start = 0;\n let mut end = 32;\n \n for element in &self.merkle_tree_data {\n serialized_data[start..end].copy_from_slice(element);\n start += 32;\n end += 32;\n }\n \n serialized_data\n}\n```\n\nIn this example, we have a `serialize_merkle_tree_data` method that takes a reference to `self` and returns a `Vec<u8>` representing the serialized Merkle tree data.\n\nThe method begins by creating a mutable vector called `serialized_data` with a length equal to 32 times the number of elements in `self.merkle_tree_data`. This ensures that the serialized data will have enough capacity to store all the elements.\n\nNext, we initialize two variables, `start` and `end`, to keep track of the current range within serialized_data, where each element will be copied.\n\nWe iterate over each `element` in `self.merkle_tree_data` using a for loop. We use the `copy_from_slice` method for each element to copy its bytes into the corresponding range of `serialized_data`, specified by `start..end`.\n\nAfter copying each `element`, we increment `start` and `end` by 32 to move to the next range in serialized_data for the subsequent element.\n \n**Finally, we return the `serialized_data` vector containing the serialized Merkle tree data.**\n \nAs we can see, this custom serialization approach has several advantages:\n\n**1. Performance:** By directly copying the bytes of each element into a pre-allocated vector, we avoid the overhead of generic serialization libraries, resulting in faster serialization.\n\nTo quantify the performance benefits, let's consider an example scenario. Suppose we have a Merkle tree with 1,000,000 elements, where each element is a 32-byte hash. Using custom serialization, we can directly copy the bytes of each element into a pre-allocated vector, resulting in a serialized output size of exactly 32,000,000 bytes (32 MB). On the other hand, if we used a generic serialization library like borsh, there would be additional overhead. Borsh adds a 4-byte length prefix to the vector and may include other metadata. \n \n**Let's consider a typical NFT data structure to estimate overhead and analyze the potential overhead introduced by borsh serialization.**\n \nExample NFT Data Structure:\n\n```rust\nuse borsh::{BorshSerialize, BorshDeserialize};\n\n#[derive(BorshSerialize, BorshDeserialize)]\npub struct NFT {\n pub id: u64,\n pub owner: String,\n pub metadata: String,\n pub approved_accounts: Vec<String>,\n}\n```\n\nIn this example, the NFT struct contains the following fields:\n- id: a unique identifier of type u64.\n- owner: a string representing the owner of the NFT.\n- metadata: a string containing metadata associated with the NFT.\n- approved_accounts: a vector of strings representing the approved accounts for the NFT.\n\n \n### Borsh Serialization Overhead\n \nWhen serializing the NFT struct using borsh, the following overhead is introduced:\n\n**Field Type Information:**\n- id: Borsh uses 8 bytes to represent a u64 value.\n- owner and metadata: Borsh adds a 4-byte length prefix to each string to indicate the length of the string.\n- approved_accounts: Borsh adds a 4-byte length prefix to the vector to indicate the number of elements.\n\n**Field Values:**\n- id: The actual value of the id field is serialized as is, without additional overhead.\n- owner and metadata: The string contents are serialized after the length prefix.\n- approved_accounts: Each string in the vector is serialized with its 4-byte length prefix followed by the string contents.\n\n**Assuming the following values for the NFT struct:**\n- id: 42\n- owner: \"0x1234567890\" (12 bytes)\n- metadata: \"{ \"name\": \"Example NFT\", \"description\": \"This is an example NFT\" }\" (64 bytes)\n- approved_accounts: [\"0xabcdef1234\", \"0x9876543210\"] (2 elements, 12 bytes each)\n\n**The serialized output using borsh would consist of the following:**\n- id: 8 bytes\n- owner: 4 bytes (length prefix) + 12 bytes (string contents) = 16 bytes\n- metadata: 4 bytes (length prefix) + 64 bytes (string contents) = 68 bytes\n- approved_accounts: 4 bytes (length prefix) + 2 * (4 bytes (length prefix) + 12 bytes (string contents)) = 36 bytes\n\nThe total serialized size using borsh would be 8 + 16 + 68 + 36 = 128 bytes.\n \n### Custom Serialization\n \nIf we implement custom serialization for the NFT struct, we could reduce the overhead by eliminating the need for length prefixes and field type information. The custom serialization would only serialize the actual field values without additional metadata.\n\n**Using custom serialization, the serialized size would be:**\n- id: 8 bytes\n- owner: 12 bytes\n- metadata: 64 bytes\n- approved_accounts: 2 * 12 bytes = 24 bytes\n\nThe total serialized size using custom serialization is 8 + 12 + 64 + 24 = 108 bytes.\n \n**In this example, the overhead introduced by borsh serialization is approximately:**\n \n```\n(128 bytes - 108 bytes) / 108 bytes * 100% ≈ 18.5%\n```\n\nTherefore, custom serialization could reduce the serialized size by around 18.5% compared to borsh serialization for this specific NFT data structure.\n\nIt's important to note that the actual overhead may vary depending on the specific data structure, the size of the field values, and any additional optimizations the serialization library applies. The example is a simplified representation and may only reflect the exact overhead in some scenarios.\n\n**2. Simplicity:** The code is straightforward to understand, making it maintainable and less prone to bugs.\n\n**3. Control:** We have complete control over the serialization process, allowing us to optimize it for our specific use case and data structure.\n \n**However, it's important to note that custom serialization also comes with some trade-offs:**\n \n**1. Lack of flexibility:** The custom serialization method is tailored to the Merkle tree data structure and may not easily adapt to other data types or formats.\n\n**2. Maintenance:** As the codebase evolves, maintaining custom serialization logic across multiple components can become challenging and may lead to duplication of effort.\n\n**3. Interoperability:** Custom serialization formats may not be compatible with other systems or libraries, limiting interoperability.\n \n### A quick note about the Merkle Tree implementation\n \nThe contract showcases the dynamic nature of Merkle trees within smart contract operations. The `get_merkle_tree` method constructs a Merkle tree on-the-fly based on the contract's underlying data, while the `update_merkle_tree` and `update_merkle_tree_after_mint` methods play a vital role in maintaining the integrity of the Merkle tree. Finally, by leveraging Merkle proofs, the contract can verify the inclusion of specific data elements within the Merkle tree without retrieving and processing the entire dataset. This optimization significantly reduces the computational overhead associated with data verification, enhancing the contract's performance and scalability. \n \n### Decisions…\n \nCustom serialization can provide a fast and efficient solution in scenarios where performance is critical, and the data structure is simple, like Merkle trees. However, a well-established serialization library like `borsch` and `serde` may be better for more complex data structures or when interoperability is a priority. Here are some ideas about when it is more convenient to opt for custom or standard serialization:\n \n### Custom Serialization\n \n- When building systems that require ultra-fast data processing, such as real-time trading platforms.\n- Minimizing binary size is crucial in embedded systems or IoT devices where resources are limited.\n- In security-sensitive applications, precise control over data handling is required to ensure data integrity and confidentiality.\n- When serialization and deserialization speed are critical, such as in high-frequency data logging applications.\n \n### Standard Serialization\n \n- In applications requiring support for multiple data formats and the ability to easily switch between them without significant code changes.\n- When developing software that must be interoperable with other systems that use standard serialization formats.\n- For projects that benefit from the community support, documentation, and updates established libraries like Serde and Borsch offer.\n- A robust error-handling mechanism is essential in complex systems with highly variable data structures.\n- When ease of use and reduced development time are prioritized over absolute performance optimization.\n \nUltimately, the decision depends on your project's specific requirements and constraints. By understanding the trade-offs and choosing the appropriate serialization approach, you can ensure that your Rust code is efficient, maintainable, and suited to your needs.\n\nI hope you enjoyed the article. Keep an eye on our X account @NEARDevHub for Part II, where we will guide you through the entire implementation of the compressed NFT standard.\n \nYou can find more information about the NEAR SDK [here](https://docs.near.org/sdk/rust/introduction)\nFor information about the `merkletree` crate and its documentation, you can [click here](https://crates.io/crates/merkletree) \nAnd for anything else regarding the NEAR ecosystem, please [visit DevHub’s website](https://www.neardevhub.org) and [follow us on X](https://twitter.com/NEARDevHub) \n","author":"","category":"news"}