초록 |
Polypeptides consisting of amino acid sequences are suitable for high-density information storage. Recent advances in peptide sequence synthesis and sequencing have also laid the foundation for high-density peptide data storage, However, the lack of suitable encoding systems, which accommodate the characteristics of polypeptide synthesis, storage and sequencing, impedes the application of polypeptides for large-scale digital data storage. To address this, we developed a reliable and efficient coding system based on 16 amino acids. The coding system realizes the advantages of compressing data, correcting AA chain loss errors, correcting AA chain errors, eliminating homopolymers and pseudo-random encryption. The system is divided into two modules: the cascaded error correction framework of RaptorQ code and RS code, and the functional coding module. The cascaded error correction coding framework can give full play to the error correction ability of RaptorQ code and RS code; The function module includes three parts: 1. Data compression module: compress data, use less amino acids to encode more information; 2. Balanced homopolymer module: reduce the difficulty of synthesis and sequencing errors; 3. Priority module: sort the difficulty of amino acid synthesis to ensure that more amino acids are easier to synthesize. The developed hexadecimal polypeptide based systems may provide a new scenario for high reliable and high efficient data storage. |