基于确定性并发控制的云原生数据库多写事务处理
作者:
通讯作者:

卢卫, E-mail: lu-wei@ruc.edu.cn

中图分类号:

TP311

基金项目:

国家自然科学基金(61972403,61732014);


Deterministic Concurrency Control Based Multi-writer Transaction Processing over Cloud-native Databases
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [54]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    云原生数据库具有开箱即用、弹性伸缩、按需付费等优势, 是目前学术界和工业界的研究热点. 当前, 云原生数据库仅支持“一写多读”, 即读写事务集中在单一的读写节点, 只读事务分散到多个只读节点. 将读写事务集中在单一的读写节点, 制约了系统的读写事务处理能力, 难以满足读写密集型业务需求. 为此, 提出D3C (deterministic concurrency control cloud-native database)架构, 通过设计基于确定性并发控制的云原生数据库事务处理机制来突破一写多读的限制, 支持多个读写节点并发执行读写事务. D3C将事务分拆为子事务, 并根据预先确定的全局顺序在各节点独立执行这些子事务, 以满足多个读写节点上事务执行的可串行化. 此外, 提出基于多版本机制的异步批量数据持久化等机制以保证事务处理的性能, 并提出基于一致性点的故障恢复机制以实现高可用. 实验结果表明, D3C在满足云原生数据库关键需求的同时, 在写密集场景下能够达到一写多读性能的5.1倍.

    Abstract:

    Cloud-native databases, with advantages such as out-of-the-box functionality, elastic scalability, and pay-as-you-go, are currently a research hotspot in academia and industry. Currently, cloud-native databases only support “single writer and multiple readers”, that is, read-write transactions are concentrated on a single read-write node, and read-only transactions are distributed to multiple read-only nodes. This limitation restricts the system’s ability to process read-write transactions, making it difficult to meet the demands of write-intensive businesses. To this end, this study proposes the D3C (deterministic concurrency control cloud-native database) architecture. It breaks through the limitation of “single writer and multiple readers” and supports concurrency execution of read-write transactions on multiple read-write nodes by designing a cloud-native database transaction processing mechanism based on deterministic concurrency control. D3C splits transactions into sub-transactions and independently executes them on each node according to a predefined global order, ensuring serializability for transaction execution on multiple read-write nodes. Additionally, this study introduces mechanisms like asynchronous batch data persistence mechanisms based on multi-version to ensure transaction processing performance and proposes a consistency point-based fault recovery mechanism to achieve high availability. Experimental results show that D3C can achieve 5.1 times the performance of the “single writer and multiple readers” architecture in write-intensive scenarios while meeting the key requirements of cloud-native databases.

    参考文献
    [1] Dharmadhikari S. Global database management system (DBMS) market report and forecast 2024–2032. 2023. https://www.expertmarketresearch.com/reports/database-management-system-market
    [2] Global Industry Analysts, Inc. Database-as-a-service (DBaaS): Global strategic business report. 2024. https://www.researchandmarkets.com/reports/4804281/databaseas-a-service-dbaas-global-strategic
    [3] 刘思源, 齐丹阳, 刘蔚, 等. 数据库发展研究报告. 2023. https://13115299.s21i.faiusr.com/61/1/ABUIABA9GAAgrrmOpQYojvvn7AQ.pdf
    Liu SY, Qi DY, Liu W, et al. Database Development Research Report. 2023 (in Chinese). https://13115299.s21i.faiusr.com/61/1/ABUIABA9GAAgrrmOpQYojvvn7AQ.pdf
    [4] Verbitski A, Gupta A, Saha D, Brahmadesam M, Gupta K, Mittal R, Krishnamurthy S, Maurice S, Kharatishvili T, Bao XF. Amazon aurora: Design considerations for high throughput cloud-native relational databases. In: Proc. of the 2017 ACM Int’l Conf. on Management of Data. Chicago: ACM, 2017. 1041–1052. [doi: 10.1145/3035918.3056101]
    [5] Cao W, Zhang YQ, Yang XJ, Li FF, Wang S, Hu QD, Cheng XT, Chen ZZ, Liu ZJ, Fang J, Wang B, Wang YH, Sun HQ, Yang Z, Cheng ZS, Chen S, Wu J, Hu W, Zhao JW, Gao YS, Cai SL, Zhang YY, Tong JW. Polardb serverless: A cloud native database for disaggregated data centers. In: Proc. of the 2021 Int’l Conf. on Management of Data. Virtual Event: ACM, 2021. 2477–2489.
    [6] Antonopoulos P, Budovski A, Diaconu C, Saenz AH, Hu J, Kodavalla H, Kossmann D, Lingam S, Minhas UF, Prakash N, Purohit V, Qu H, Ravella CS, Reisteter K, Shrotri S, Tang DX, Wakade V. Socrates: The new SQL server in the cloud. In: Proc. of the 2019 Int’l Conf. on Management of Data. Amsterdam: ACM, 2019. 1743–1756. [doi: 10.1145/3299869.3314047]
    [7] Depoutovitch A, Chen C, Chen J, Larson P, Lin S, Ng J, Cui WL, Liu Q, Huang W, Xiao Y, He YJ. Taurus database: How to be fast, available, and frugal in the cloud. In: Proc. of the 2020 ACM SIGMOD Int’l Conf. on Management of Data. Portland: ACM, 2020. 1463–1478. [doi: 10.1145/3318464.3386129]
    [8] Cao W, Liu ZJ, Wang P, Chen S, Zhu CF, Zheng S, Wang YH, Ma GQ. PolarFS: An ultra-low latency and failure resilient distributed file system for shared storage cloud database. Proc. of the VLDB Endowment, 2018, 11(12): 1849–1862.
    [9] Bernstein PA, Hadzilacos V, Goodman N. Concurrency Control and Recovery in Database Systems. Boston: Addison-Wesley Longman Publishing Co. Inc., 1987.
    [10] Zhang M, Hua Y, Zuo PF, Liu LR. FORD: Fast one-sided RDMA-based distributed transactions for disaggregated persistent memory. In: Proc. of the 20th USENIX Conf. on File and Storage Technologies. Santa Clara: USENIX Association, 2022. 51–68.
    [11] Wei XD, Dong ZY, Chen R, Chen HB. Deconstructing RDMA-enabled distributed transactions: Hybrid is better! In: Proc. of the 13th USENIX Symp. on Operating Systems Design and Implementation (OSDI 2018). Carlsbad: USENIX Association, 2018. 233–251.
    [12] Binnig C, Crotty A, Galakatos A, Kraska T, Zamanian E. The end of slow networks: It’s time for a redesign. Proc. of the VLDB Endowment, 2016, 9(7): 528–539.
    [13] Wei XD, Shi JX, Chen YZ, Chen R, Chen HB. Fast in-memory transaction processing using RDMA and HTM. In: Proc. of the 25th Symp. on Operating Systems Principles. Monterey: ACM, 2015. 87–104. [doi: 10.1145/2815400.2815419]
    [14] Chen YZ, Wei XD, Shi JX, Chen R, Chen HB. Fast and general distributed transactions using RDMA and HTM. In: Proc. of the 11th European Conf. on Computer Systems. London: ACM, 2016. 26. [doi: 10.1145/2901318.2901349]
    [15] Barthels C, Müller I, Taranov K, Alonso G, Hoefler T. Strong consistency is not hard to get: Two-phase locking and two-phase commit on thousands of cores. Proc. of the VLDB Endowment, 2019, 12(13): 2325–2338.
    [16] Yoon DY, Chowdhury M, Mozafari B. Distributed lock management with RDMA: Decentralization without starvation. In: Proc. of the 2018 Int’l Conf. on Management of Data. Houston: ACM, 2018. 1571–1586. [doi: 10.1145/3183713.3196890]
    [17] Gray J, Reuter A. Transaction Processing: Concepts and Techniques. Burlington: Morgan Kaufmann Publishers Inc., 1992.
    [18] 王珊, 萨师煊. 数据库系统概论. 第5版, 北京: 高等教育出版, 2014.
    Wang S, Sa SX. Introduction to Database System. 5th ed., Beijing: Higher Education Press, 2014 (in Chinese).
    [19] Ziegler T, Binnig C, Leis V. ScaleStore: A fast and cost-efficient storage engine using DRAM, NVMe, and RDMA. In: Proc. of the 2022 Int’l Conf. on Management of Data. Philadelphia: ACM, 2022. 685–699. [doi: 10.1145/3514221.3526187]
    [20] Yang XJ, Zhang YQ, Chen H, Li FF, Wang B, Fang J, Sun C, Wang YH. PolarDB-MP: A multi-primary cloud-native database via disaggregated shared memory. In: Companion of the 2024 Int’l Conf. on Management of Data. Santiago: ACM, 2024. 295–308. [doi: 10.1145/3626246.3653377]
    [21] Depoutovitch A, Chen C, Larson PA, Ng J, Lin S, Xiong GZ, Lee P, Boctor E, Ren SM, Wu LD, Zhang YC, Sun C. Taurus MM: Bringing multi-master to the cloud. Proc. of the VLDB Endowment, 2023, 16(12): 3488–3500.
    [22] Gray JN, Lorie RA, Putzolu GR, Traiger IL. Granularity of locks and degrees of consistency in a shared data base. In: Stonebraker M, ed. Readings in Database Systems. San Francisco: Morgan Kaufmann Publishers, 1988. 94–121.
    [23] Bernstein PA, Goodman N. Timestamp-based algorithms for concurrency control in distributed database systems. In: Proc. of the 6th Int’l Conf. on Very Large Data Bases. Montreal: VLDB Endowment, 1980. 285–300.
    [24] Yu XY, Xia Y, Pavlo A, Sanchez D, Rudolph L, Devadas S. Sundial: Harmonizing concurrency control and caching in a distributed OLTP database management system. Proc. of the VLDB Endowment, 2018, 11(10): 1289–1302.
    [25] Tu S, Zheng WT, Kohler E, Liskov B, Madden S. Speedy transactions in multicore in-memory databases. In: Proc. of the 24th ACM Symp. on Operating Systems Principles. Farminton: ACM, 2013. 18–32. [doi: 10.1145/2517349.2522713]
    [26] Mahmoud HA, Arora V, Nawab F, Agrawal D, El Abbadi A. MaaT: Effective and scalable coordination of distributed transactions in the cloud. Proc. of the VLDB Endowment, 2014, 7(5): 329–340.
    [27] Lim H, Kaminsky M, Andersen DG. Cicada: Dependably fast multi-core in-memory transactions. In: Proc. of the 2017 ACM Int’l Conf. on Management of Data. Chicago: ACM, 2017. 21–35. [doi: 10.1145/3035918.3064015]
    [28] Berenson H, Bernstein P, Gray J, Melton J, O'Neil E, O'Neil P. A critique of ANSI SQL isolation levels. ACM SIGMOD Record, 1995, 24(2): 1–10.
    [29] 赵泓尧, 赵展浩, 杨皖晴, 卢卫, 李海翔, 杜小勇. 内存数据库并发控制算法的实验研究. 软件学报, 2022, 33(3): 867–890. http://www.jos.org.cn/1000-9825/6454.htm
    Zhao HY, Zhao ZH, Yang WQ, Lu W, Li HX, Du XY. Experimental study on concurrency control algorithms in in-memory databases. Ruan Jian Xue Bao/Journal of Software, 2022, 33(3): 867–890 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6454.htm
    [30] Fekete A, Liarokapis D, O’Neil E, O’Neil P, Shasha D. Making snapshot isolation serializable. ACM Trans. on Database Systems (TODS), 2005, 30(2): 492–528.
    [31] Cahill MJ, Röhm U, Fekete AD. Serializable isolation for snapshot databases. ACM Trans. on Database Systems (TODS), 2009, 34(4): 20.
    [32] Thomson A, Diamond T, Weng SC, Ren K, Shao P, Abadi DJ. Calvin: Fast distributed transactions for partitioned database systems. In: Proc. of the 2012 ACM SIGMOD Int’l Conf. on Management of Data. Scottsdale: ACM, 2012. 1–12. [doi: 10.1145/2213836.2213838]
    [33] Faleiro JM, Abadi DJ. Rethinking serializable multiversion concurrency control. Proc. of the VLDB Endowment, 2015, 8(11): 1190–1201.
    [34] Qin D, Brown AD, Goel A. Caracal: Contention management with deterministic concurrency control. In: Proc. of the 28th ACM SIGOPS Symp. on Operating Systems Principles. Virtual Event: ACM, 2021. 180–194. [doi: 10.1145/3477132.3483591]
    [35] Faleiro JM, Abadi DJ, Hellerstein JM. High performance transactions via early write visibility. Proc. of the VLDB Endowment, 2017, 10(5): 613–624.
    [36] Nathan S, Govindarajan C, Saraf A, Sethi M, Jayachandran P. Blockchain meets database: Design and implementation of a blockchain relational database. Proc. of the VLDB Endowment, 2019, 12(11): 1539–1552.
    [37] Dong ZY, Tang CZ, Wang JC, Wang ZG, Chen HB, Zang BY. Optimistic transaction processing in deterministic database. Journal of Computer Science and Technology, 2020, 35(2): 382–394.
    [38] Lu Y, Yu XY, Cao L, Madden S. Aria: A fast and practical deterministic OLTP database. Proc. of the VLDB Endowment, 2020, 13(12): 2047–2060.
    [39] Lai ZL, Liu C, Lo E. When private blockchain meets deterministic database. Proc. of the ACM on Management of Data, 2023, 1(1): 98.
    [40] Harding R, Van Aken D, Pavlo A, Stonebraker M. An evaluation of distributed concurrency control. Proc. of the VLDB Endowment, 2017, 10(5): 553–564.
    [41] Chen ZH, Zhuo HZ, Xu QQ, Qi XD, Zhu CY, Zhang Z, Jin CQ, Zhou AY, Yan Y, Zhang H. SChain: A scalable consortium blockchain exploiting intra- and inter-block concurrency. Proc. of the VLDB Endowment, 2021, 14(12): 2799–2802.
    [42] Qi XD, Chen ZH, Zhuo HZ, Xu QQ, Zhu CY, Zhang Z, Jin CQ, Zhou AY, Yan Y, Zhang H. SChain: Scalable concurrency over flexible permissioned blockchain. In: Proc. of the 39th IEEE Int’l Conf. on Data Engineering (ICDE). Anaheim: IEEE, 2023. 1901–1913.
    [43] FaunaDB. 2024. https://fauna.com/
    [44] Costa CH, Filho JVBM, Maia PHM, Oliveira FCMB. Sharding by hash partitioning. In: Proc. of the 17th Int’l Conf. on Enterprise Information Systems. Setubal: SciTePress, 2015. 313–320. [doi: 10.5220/0005376203130320]
    [45] Venkateswaran N, Changder S. Simplified data partitioning in a consistent hashing based sharding implementation. In: Proc. of the 2017 IEEE Region 10 Conf. (TENCON 2017) Penang: IEEE, 2017. 895–900. [doi: 10.1109/TENCON.2017.8227985]
    [46] Cao T, Vaz Salles M, Sowell B, Yue Y, Demers A, Gehrke J, White W. Fast checkpoint recovery algorithms for frequently consistent applications. In: Proc. of the 2011 ACM SIGMOD Int’l Conf. on Management of Data. Athens: ACM, 2011. 265–276.
    [47] Liu ML, Agrawal D, Abbadi AE. An efficient implementation of the quorum consensus protocol. Technical Report, Santa Barbara: University of California at Santa Barbara, 1994.
    [48] Diks K, Pelc A. Almost safe gossiping in bounded degree networks. SIAM Journal on Discrete Mathematics, 1992, 5(3): 338–344.
    [49] Mohan C, Haderle D, Lindsay B, Pirahesh H, Schwarz P. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. on Database Systems (TODS), 1992, 17(1): 94–162.
    [50] TPC-C. 2024. http://www.tpc.org/tpcc/
    [51] Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R. Benchmarking cloud serving systems with YCSB. In: Proc. of the 1st ACM Symp. on Cloud Computing. Indianapolis: ACM, 2010. 143–154. [doi: 10.1145/1807128.1807152]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

洪殷昊,赵泓尧,王乙霖,史心悦,卢卫,杨尚,杜胜.基于确定性并发控制的云原生数据库多写事务处理.软件学报,2025,36(3):995-1021

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-05-27
  • 最后修改日期:2024-07-16
  • 在线发布日期: 2024-09-13
文章二维码
您是第20087380位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号