Skip to main content

Introduction to TCC Theory and Design Implementation Guide

· 7 min read

Fescar 0.4.0 version released the TCC schema, contributed by the Anthem team, you are welcome to try it out,
Sample address:[https://github.com/fescar-group/fescar-samples/tree/master/tcc](https. //github.com/fescar-group/fescar-samples/tree/master/tcc),
At the end of this article, we also provide the roadmap of the project, welcome to follow.

I. Introduction to TCC

In the Two Phase Commitment Protocol (2PC), the resource manager (RM, resource manager) needs to provide three functions: "prepare", "commit" and "rollback". "Rollback" 3 operations; while the transaction manager (TM, transaction manager) coordinates all resource managers in 2 phases, in the first phase asks all resource managers whether the "preparation" is successful, if all resources are If all resources are "ready" successfully, then perform "commit" operation of all resources in the second phase, otherwise perform "rollback" operation of all resources in the second phase to ensure that the final state of all resources is the same, either all commits or all commits, or the final state of all resources is the same. to ensure that the final state of all resources is the same, either all commit or all rollback.

Resource Manager has many implementations, among which TCC (Try-Confirm-Cancel) is a service-based implementation of Resource Manager; TCC is a relatively mature distributed transaction solution that can be used to solve the data consistency problem of cross-database and cross-service business operations; TCC's Try, Confirm and Cancel methods are implemented by business code. TCC's Try, Confirm, and Cancel methods are all implemented by business code, so TCC can be called a service-based resource manager.

The Try operation of TCC is the first stage, which is responsible for checking and reserving resources; Confirm operation is the second stage, which is the submit operation to execute the real business; Cancel is the second stage, which is the rollback operation, which is the cancellation of the reserved resources to return the resources to the initial state.

As shown in the figure below, after the user implements a TCC service, the TCC service will be one of the resources of the distributed transaction, participating in the whole distributed transaction; the transaction manager coordinates the TCC services in two stages, calling the Try method of all TCC services in the first stage, and executing the Confirm or Cancel method of all TCC services in the second stage; eventually all TCC services are either committed or cancelled; all TCC services are either committed or cancelled. services are either all committed or all rolled back.

image.png

II. TCC Design

When users access TCC, most of the work is focused on how to implement TCC service, after years of TCC application by Anthem, the following main TCC design and implementation of the main matters are summarised below:

1, Business operation is completed in two stages

Before connecting to TCC, business operation can be completed in one step only, but after connecting to TCC, we need to consider how to divide it into 2 phases to complete, put the resource checking and reserving in Try operation in the first phase, and put the execution of real business operation in Confirm operation in the second phase.

Below is an example of how the business model can be designed in two phases. Example scenario: "Account A has a balance of $100, of which $30 needs to be deducted";

Before accessing TCC, the user could write SQL: "update account table set balance = balance - 30 where account = A" to complete the deduction operation in one step.

After connecting to TCC, you need to consider how to split the debit operation into 2 steps:

  • Try operation: checking and reserving resources;

In the deduction scenario, what Try operation has to do is to check whether the balance of A account is enough, and then freeze the $30 to be deducted (reserved resources); no real deduction will happen at this stage.

  • Confirm operation: performs the submission of the real operation;

In the deduction scenario, the Confirm phase takes place when the real deduction occurs, deducting the $30 already frozen in A's account.

  • Cancel operation: whether or not the reserved resource is released;

In a debit scenario, the debit is cancelled and the Cancel operation performs the task of releasing the $30 that was frozen by the Try operation, returning Account A to its initial state.

image.png

2, Concurrency Control

Users should consider concurrency issues when implementing TCC and minimise lock granularity to maximise concurrency in distributed transactions.

The following is still an example of deducting money from account A. "There is $100 on account A. Transaction T1 has to deduct $30 of it, and transaction T2 also has to deduct $30, and there is concurrency".

In the first phase of the Try operation, distributed transaction T1 and distributed transaction T2 are freezing that part of the funds without interfering with each other; so that in the second phase of the distributed transaction, no matter whether T1 is a commit or a rollback, there will be no impact on T2, so that T1 and T2 are executing in parallel on the same piece of business data.

image.png

3, Allow empty rollback

As shown in the following figure, when the transaction coordinator invokes the first-phase Try operation of the TCC service, there may be a network timeout due to packet loss, and at this time the transaction manager triggers a two-phase rollback to invoke the Cancel operation of the TCC service, which is invoked without a timeout.

The TCC service receives a Cancel request without receiving a Try request, this scenario is called a null rollback; null rollbacks often occur in production environments, and users should allow for null rollbacks when implementing TCC services, i.e., return success when receiving a null rollback.

image.png

4. Anti-suspension control

As shown in the figure below, when the transaction coordinator calls the TCC service's one-phase Try operation, there may be a timeout due to network congestion, at this time, the transaction manager will trigger a two-phase rollback and call the TCC service's Cancel operation, and the Cancel call is not timed out; after this, the one-phase Try packet that is congested in the network is received by the TCC service, and there is a two-phase After this, the first-phase Try packet on the congested network is received by the TCC service, and the second-phase Cancel request is executed before the first-phase Try request, and the TCC service will never receive the second-phase Confirm or Cancel after executing the late Try, resulting in the suspension of the TCC service.

When you implement TCC service, you should allow empty rollback, but refuse to execute Try request after empty rollback to avoid hanging.

image.png

5. Idempotent control

Whether it is network packet retransmission or compensation execution of abnormal transaction, it will lead to the Try, Confirm or Cancel operation of TCC service to be executed repeatedly; users need to consider idempotent control when implementing TCC service, i.e., the business result of Try, Confirm, Cancel executed once and executed many times is the same.
image.png

Roadmap

Currently we have released version 0.4.0, we will release version 0.5 ~ 1.0, continue to improve and enrich the functions of AT, TCC mode, and solve the problem of high availability of the server side, after version 1.0, this open source product will reach the standard of production environment.

image1.png