Skip to main content

· 13 min read

Fescar 0.4.0 version released the TCC model, contributed by the ant gold service team, welcome to try, the end of the article also provides the project follow-up Roadmap, welcome to pay attention.

Preface: Application scenarios based on TCC model


1.png

The TCC distributed transaction model acts directly on the service layer. It is not coupled with the specific service framework, has nothing to do with the underlying RPC protocol, has nothing to do with the underlying storage media, can flexibly choose the locking granularity of the business resources, reduces the resource locking holding time, has good scalability, and can be said to be designed for independently deployed SOA services.

I. TCC model advantages

For TCC distributed transaction model, I think its application in business scenarios, there are two aspects of significance.

1.1 Distributed transaction across services

The splitting of services can also be thought of as horizontal scaling of resources, only in a different direction.

Horizontal extensions may go along two directions:

  1. functional scaling, where data is grouped according to function and different functional groups are distributed over multiple different databases, which is effectively servitisation under the SOA architecture.
  2. data sharding, which adds a new dimension to horizontal scaling by splitting data across multiple databases within functional groups.

The following figure briefly illustrates the horizontal data scaling strategy:

2.png

Therefore, one of the roles of TCC is to ensure the transaction property of multi-resource access when scaling resources horizontally by function.

1.2 Two-stage splitting

Another effect of TCC is that it splits the two phases into two separate phases that are related by means of resource business locking. The advantage of resource locking is that it does not block other transactions from continuing to use the same resources in the first phase, nor does it affect the correct execution of the second phase of the transaction.

The traditional model of concurrent transactions:
3.png

Concurrent transactions for the TCC model:
4.png

How does this benefit the business? Taking the secured transaction scenario of Alipay, the simplified case involves only two services, the transaction service and the billing service. The transaction service is the main business service, and the accounting service is the slave business service, which provides the Try, Commit, and Cancel interfaces:

  1. The Try interface deducts the user's available funds and transfers them to pre-frozen funds. Pre-frozen funds is the business locking programme, each transaction can only use the pre-frozen funds of this transaction in the second phase, and other concurrent transactions can continue to process the user's available funds after the first phase of execution.
  2. The Commit interface deducts the pre-frozen funds and increases the funds available in the intermediate account (secured transactions do not immediately credit the merchant and require an intermediate account for suspense).

Assuming there is only one intermediary account, every time the Commit interface of the payment service is called, it locks the intermediary account, and there are hotspot performance issues with the intermediary account. However, in the secured transaction scenario, the funds need to be transferred from the intermediate account to the merchant only after seven days, and the intermediate account does not need to be displayed to the public. Therefore, after executing the first stage of the payment service, it can be considered that the payment part of this transaction has been completed and return the result of successful payment to the user and the merchant, and does not need to execute the Commit interface of the second stage of the payment service right away, and wait until the low-frontal period, and then slowly digest it and execute it asynchronously.
5.png

This is the two-phase asynchronisation feature of TCC distributed transaction model, the first phase of execution from the business service is successful, the master business service can be committed to complete, and then the framework asynchronously execute the second phase of each slave business service.

General-purpose TCC solution

The generic TCC solution is the most typical implementation of the TCC distributed transaction model, where all the slave business services need to participate in the decision making of the master business service.
6.png

Applicable scenarios

Since the slave business services are invoked synchronously and their results affect the decisions of the master business service, the generic TCC distributed transaction solution is suitable for businesses with deterministic and short execution times, such as the three most core services of an Internet financial enterprise: transaction, payment, and accounting:
7.png

When a user initiates a transaction, the transaction service is accessed first to create the transaction order; then the transaction service calls the payment service to create the payment order for the transaction and performs the collection action, and finally, the payment service calls the billing service to record the account flow and bookkeeping.

In order to ensure that the three services work together to complete a transaction, either succeed or fail at the same time, you can use a general-purpose TCC solution that puts the three services in a distributed transaction, with the transaction as the master service, the payment as the slave service, and the billing as the nested slave service of the payment service, and the atomicity of the transaction is guaranteed by the TCC model.
8.png

The Try interface of the payment service creates the payment order, opens a nested distributed transaction, and calls the Try interface of the billing service; the billing service freezes the buyer's funds in the Try interface. After the first stage of the call is completed, the transaction is completed, the local transaction is submitted, and the TCC framework completes the second stage of the distributed transaction from the business service.

The second stage of the payment service first calls the Confirm interface of the accounting service to deduct the buyer's frozen funds and increase the seller's available funds. After the call is successful, the payment service modifies the payment order to the completed state and completes the payment.

When both payment and billing service phase 2 are finished, the whole distributed transaction is finished.

Asynchronous guaranteed TCC solution

The direct slave service of the asynchronous assured TCC solution is the reliable messaging service, while the real slave service is decoupled by the messaging service and executed asynchronously as the consumer of the messaging service.
9.png

The Reliable Messaging Service needs to provide three interfaces, Try, Confirm, and Cancel. The Try interface pre-sends, and is only responsible for persistently storing the message data; the Confirm interface confirms the sending, and this is when the actual delivery of the message begins The Confirm interface confirms the delivery, which is when the actual delivery of the message begins; and the Cancel interface cancels the delivery and deletes the message data.

The message data of the message service is stored independently and scaled independently, which reduces the coupling between the business service and the messaging system, and achieves the ultimate consistency of the distributed transaction under the premise that the message service is reliable.

This solution increases the maintenance cost of message service, but since message service implements TCC interface instead of slave business service, slave business service doesn't need any modification and the access cost is very low.

Application scenario

Since consuming messages from a business service is an asynchronous process, the execution time is uncertain, which may lead to an increase in the inconsistency time window. Therefore, the Asynchronous Ensured TCC Distributed Transaction Solution is only applicable to some passive businesses that are less sensitive to the final consistency time (the processing result of the slave business service does not affect the decision of the master business service, and only passively receives the decision result of the master business service). For example, the member registration service and the email sending service:
10.png

When a user registers for a membership successfully, an email needs to be sent to the user to tell the user that the registration was successful and to prompt the user to activate the membership. But pay attention to two points:

  1. If the user registration is successful, make sure to send an email to the user;
  2. if the user's registration fails, an email must not be sent to the user.

So again, this requires the membership service and the mail service to ensure atomicity, either both are executed or neither is executed. The difference is that the mail service is only a passive business, it does not affect whether the user can register successfully or not, it only needs to send an email to the user after the user has registered successfully, and the mail service does not need to be involved in the decision making of the activities of the membership service.

For this kind of business scenario, you can use the asynchronous ensured TCC distributed transaction solution, as follows:
11.png


The reliable messaging service decouples the member and mail services, and the member service and the messaging service comprise the TCC transaction model, which ensures the atomicity of transactions. Then through the reliable feature of the message service, it ensures that the message can definitely be consumed by the mail service, so that the member and the mail service are in the same distributed transaction. At the same time, the mail service will not affect the execution process of the member service, and will only passively receive the request to send mail after the member service is executed successfully.

Compensated TCC solution

Compensated TCC solution is similar in structure to generic TCC solution, and its slave services also need to participate in the decision making of the main business service. However, the difference is that the former slave service only needs to provide Do and Compensate two interfaces, while the latter needs to provide three interfaces.
12.png

The Do interface directly executes the real complete business logic, completes the business processing, and the result of the business execution is visible externally; the Compensate operation is used for the business compensation, which offsets or partially offsets the business result of the positive business operation. Compensate operation needs to satisfy idempotency.
Compensate operation is used to offset or partially offset the business results of positive business operations, and the Compensate operation needs to satisfy idempotency.
Compared with the general-purpose solution, Compensate solution does not need to transform the original business logic from the business service, and only needs to add an additional Compensate rollback logic, which is a lesser business transformation. However, it is important to note that the business executes the entire business logic in one phase and cannot achieve effective transaction isolation. When rollback is required, there may be a compensation failure, and additional exception handling mechanisms, such as manual intervention, are also required.

Applicable scenarios

Due to the existence of rollback compensation failure, the compensated TCC distributed transaction solution is only applicable to some of the less concurrent conflict or need to interact with external business, these external business is not a passive business, its execution results will affect the decision of the main business service, such as the ticket booking service of the air ticket agency:
13.png

This air ticket service provides multi-destination air ticket booking service, which can book air tickets for multiple itinerary flights at the same time, e.g., to travel from Beijing to St. Petersburg, it is necessary to take the first journey from Beijing to Moscow, as well as the second journey from Moscow to St. Petersburg.

When a user books a ticket, he/she would definitely want to book tickets for both flights at the same time, and booking only one flight does not make sense for the user. Therefore, such a business service also imposes the atomicity requirement that if the booking for one of the flights fails, the other flight needs to be able to be cancelled.

However, it is extremely difficult to push the airlines to change as they are external to the ticket agents and only provide booking and cancellation interfaces. Therefore, for this type of business service, a compensated TCC distributed transaction solution can be used, as follows:
14.png

The gateway service adds the Compensate interface on top of the original logic, which is responsible for calling the cancellation interface of the corresponding airline.

When the user initiates a ticket booking request, the ticket service first calls the booking interface of each airline through the Do interface of the gateway, and if all flights are booked successfully, the whole distributed transaction is executed successfully; once the booking of tickets for a certain flight fails, the distributed transaction is rolled back, and the Compensate interface of each gateway is called by the TCC transaction framework, which then calls the corresponding airline's The TCC transaction framework calls the Compensate compensation interface of each gateway, which then calls the corresponding airline's cancellation interface. In this way, the atomicity of multi-way ticket booking service can also be guaranteed.

V. Summary

For today's Internet applications, horizontal scaling of resources provides more flexibility and is a relatively easy to implement outward scaling solution, but at the same time, it also significantly increases the complexity and introduces some new challenges, such as data consistency issues between resources.

Horizontal data scaling can be done both by data slicing and by functionality. the TCC model ensures the transactional properties of multi-resource access while scaling resources horizontally by functionality.

TCC model in addition to the role of cross-service distributed transactions this layer , but also has a two-stage division of the function , through the business resource locking , allowing the second stage of asynchronous execution , and asynchronous idea is to solve the hot spot data concurrency performance problems of one of the tools .

Roadmap

Currently, we have released 0.4.0, and we will release 0.5 ~ 1.0 to continue to improve and enrich the functionality of AT and TCC modes, and to solve the problem of high availability of the server side. After 1.0, this open source product will reach the standard of production environment.


image1.png

· 7 min read

Fescar 0.4.0 version released the TCC schema, contributed by the Anthem team, you are welcome to try it out,
Sample address:[https://github.com/fescar-group/fescar-samples/tree/master/tcc](https. //github.com/fescar-group/fescar-samples/tree/master/tcc),
At the end of this article, we also provide the roadmap of the project, welcome to follow.

I. Introduction to TCC

In the Two Phase Commitment Protocol (2PC), the resource manager (RM, resource manager) needs to provide three functions: "prepare", "commit" and "rollback". "Rollback" 3 operations; while the transaction manager (TM, transaction manager) coordinates all resource managers in 2 phases, in the first phase asks all resource managers whether the "preparation" is successful, if all resources are If all resources are "ready" successfully, then perform "commit" operation of all resources in the second phase, otherwise perform "rollback" operation of all resources in the second phase to ensure that the final state of all resources is the same, either all commits or all commits, or the final state of all resources is the same. to ensure that the final state of all resources is the same, either all commit or all rollback.

Resource Manager has many implementations, among which TCC (Try-Confirm-Cancel) is a service-based implementation of Resource Manager; TCC is a relatively mature distributed transaction solution that can be used to solve the data consistency problem of cross-database and cross-service business operations; TCC's Try, Confirm and Cancel methods are implemented by business code. TCC's Try, Confirm, and Cancel methods are all implemented by business code, so TCC can be called a service-based resource manager.

The Try operation of TCC is the first stage, which is responsible for checking and reserving resources; Confirm operation is the second stage, which is the submit operation to execute the real business; Cancel is the second stage, which is the rollback operation, which is the cancellation of the reserved resources to return the resources to the initial state.

As shown in the figure below, after the user implements a TCC service, the TCC service will be one of the resources of the distributed transaction, participating in the whole distributed transaction; the transaction manager coordinates the TCC services in two stages, calling the Try method of all TCC services in the first stage, and executing the Confirm or Cancel method of all TCC services in the second stage; eventually all TCC services are either committed or cancelled; all TCC services are either committed or cancelled. services are either all committed or all rolled back.

image.png

II. TCC Design

When users access TCC, most of the work is focused on how to implement TCC service, after years of TCC application by Anthem, the following main TCC design and implementation of the main matters are summarised below:

1, Business operation is completed in two stages

Before connecting to TCC, business operation can be completed in one step only, but after connecting to TCC, we need to consider how to divide it into 2 phases to complete, put the resource checking and reserving in Try operation in the first phase, and put the execution of real business operation in Confirm operation in the second phase.

Below is an example of how the business model can be designed in two phases. Example scenario: "Account A has a balance of $100, of which $30 needs to be deducted";

Before accessing TCC, the user could write SQL: "update account table set balance = balance - 30 where account = A" to complete the deduction operation in one step.

After connecting to TCC, you need to consider how to split the debit operation into 2 steps:

  • Try operation: checking and reserving resources;

In the deduction scenario, what Try operation has to do is to check whether the balance of A account is enough, and then freeze the $30 to be deducted (reserved resources); no real deduction will happen at this stage.

  • Confirm operation: performs the submission of the real operation;

In the deduction scenario, the Confirm phase takes place when the real deduction occurs, deducting the $30 already frozen in A's account.

  • Cancel operation: whether or not the reserved resource is released;

In a debit scenario, the debit is cancelled and the Cancel operation performs the task of releasing the $30 that was frozen by the Try operation, returning Account A to its initial state.

image.png

2, Concurrency Control

Users should consider concurrency issues when implementing TCC and minimise lock granularity to maximise concurrency in distributed transactions.

The following is still an example of deducting money from account A. "There is $100 on account A. Transaction T1 has to deduct $30 of it, and transaction T2 also has to deduct $30, and there is concurrency".

In the first phase of the Try operation, distributed transaction T1 and distributed transaction T2 are freezing that part of the funds without interfering with each other; so that in the second phase of the distributed transaction, no matter whether T1 is a commit or a rollback, there will be no impact on T2, so that T1 and T2 are executing in parallel on the same piece of business data.

image.png

3, Allow empty rollback

As shown in the following figure, when the transaction coordinator invokes the first-phase Try operation of the TCC service, there may be a network timeout due to packet loss, and at this time the transaction manager triggers a two-phase rollback to invoke the Cancel operation of the TCC service, which is invoked without a timeout.

The TCC service receives a Cancel request without receiving a Try request, this scenario is called a null rollback; null rollbacks often occur in production environments, and users should allow for null rollbacks when implementing TCC services, i.e., return success when receiving a null rollback.

image.png

4. Anti-suspension control

As shown in the figure below, when the transaction coordinator calls the TCC service's one-phase Try operation, there may be a timeout due to network congestion, at this time, the transaction manager will trigger a two-phase rollback and call the TCC service's Cancel operation, and the Cancel call is not timed out; after this, the one-phase Try packet that is congested in the network is received by the TCC service, and there is a two-phase After this, the first-phase Try packet on the congested network is received by the TCC service, and the second-phase Cancel request is executed before the first-phase Try request, and the TCC service will never receive the second-phase Confirm or Cancel after executing the late Try, resulting in the suspension of the TCC service.

When you implement TCC service, you should allow empty rollback, but refuse to execute Try request after empty rollback to avoid hanging.

image.png

5. Idempotent control

Whether it is network packet retransmission or compensation execution of abnormal transaction, it will lead to the Try, Confirm or Cancel operation of TCC service to be executed repeatedly; users need to consider idempotent control when implementing TCC service, i.e., the business result of Try, Confirm, Cancel executed once and executed many times is the same.
image.png

Roadmap

Currently we have released version 0.4.0, we will release version 0.5 ~ 1.0, continue to improve and enrich the functions of AT, TCC mode, and solve the problem of high availability of the server side, after version 1.0, this open source product will reach the standard of production environment.

image1.png

· 3 min read

Use case

A business logic for user purchasing commodities. The whole business logic is powered by 3 microservices:

  • Storage service: deduct storage count on given commodity.
  • Order service: create order according to purchase request.
  • Account service: debit the balance of user's account.

Architecture

Architecture

StorageService

public interface StorageService {

/**
* deduct storage count
*/
void deduct(String commodityCode, int count);
}

OrderService

public interface OrderService {

/**
* create order
*/
Order create(String userId, String commodityCode, int orderCount);
}

AccountService

public interface AccountService {

/**
* debit balance of user's account
*/
void debit(String userId, int money);
}

Main business logic

public class BusinessServiceImpl implements BusinessService {

private StorageService storageService;

private OrderService orderService;

/**
* purchase
*/
public void purchase(String userId, String commodityCode, int orderCount) {

storageService.deduct(commodityCode, orderCount);

orderService.create(userId, commodityCode, orderCount);
}
}
public class StorageServiceImpl implements StorageService {

private StorageDAO storageDAO;

@Override
public void deduct(String commodityCode, int count) {
Storage storage = new Storage();
storage.setCount(count);
storage.setCommodityCode(commodityCode);
storageDAO.update(storage);
}
}
public class OrderServiceImpl implements OrderService {

private OrderDAO orderDAO;

private AccountService accountService;

public Order create(String userId, String commodityCode, int orderCount) {

int orderMoney = calculate(commodityCode, orderCount);

accountService.debit(userId, orderMoney);

Order order = new Order();
order.userId = userId;
order.commodityCode = commodityCode;
order.count = orderCount;
order.money = orderMoney;

return orderDAO.insert(order);
}
}

Distributed Transaction Solution with Seata

undefined

We just need an annotation @GlobalTransactional on business method:


@GlobalTransactional
public void purchase(String userId, String commodityCode, int orderCount) {
......
}

Example powered by Dubbo + Seata

Step 1: Setup database

  • Requirement: MySQL with InnoDB engine.

Note: In fact, there should be 3 database for the 3 services in the example use case. However, we can just create one database and configure 3 data sources for simple.

Modify Spring XML with the database URL/username/password you just created.

dubbo-account-service.xml dubbo-order-service.xml dubbo-storage-service.xml

    <property name="url" value="jdbc:mysql://x.x.x.x:3306/xxx" />
<property name="username" value="xxx" />
<property name="password" value="xxx" />

Step 2: Create UNDO_LOG table for Seata

UNDO_LOG table is required by Seata AT mode.

CREATE TABLE `undo_log` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`branch_id` bigint(20) NOT NULL,
`xid` varchar(100) NOT NULL,
`rollback_info` longblob NOT NULL,
`log_status` int(11) NOT NULL,
`log_created` datetime NOT NULL,
`log_modified` datetime NOT NULL,
`ext` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_unionkey` (`xid`,`branch_id`)
) ENGINE=InnoDB AUTO_INCREMENT=159 DEFAULT CHARSET=utf8

Step 3: Create tables for example business


DROP TABLE IF EXISTS `storage_tbl`;
CREATE TABLE `storage_tbl` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`commodity_code` varchar(255) DEFAULT NULL,
`count` int(11) DEFAULT 0,
PRIMARY KEY (`id`),
UNIQUE KEY (`commodity_code`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


DROP TABLE IF EXISTS `order_tbl`;
CREATE TABLE `order_tbl` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` varchar(255) DEFAULT NULL,
`commodity_code` varchar(255) DEFAULT NULL,
`count` int(11) DEFAULT 0,
`money` int(11) DEFAULT 0,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


DROP TABLE IF EXISTS `account_tbl`;
CREATE TABLE `account_tbl` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` varchar(255) DEFAULT NULL,
`money` int(11) DEFAULT 0,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Step 4: Start Seata-Server

  • Download server package, unzip it.
  • Start Seata-Server
sh seata-server.sh $LISTEN_PORT $PATH_FOR_PERSISTENT_DATA

e.g.

sh seata-server.sh 8091 /home/admin/seata/data/

Step 5: Run example

· 17 min read

Fescar has been released for a while, and distributed transactions have always been a highly focused area in the industry. Fescar received nearly 5000 stars within a month of its release, indicating its popularity. Of course, before Fescar, there were already relatively mature open-source distributed transaction solutions, such as the 2PC non-intrusive transaction of LCN, which has now evolved to version 5.0 and supports TCX transactions similar to Fescar's transaction model. Other implementations of TCC transactions include hmily and tcc-transaction. In the current era of microservice architecture, and given Alibaba's extensive background in open source, the release of Fescar has undoubtedly sparked a new wave of research into distributed transactions. Fescar originated from Alibaba Cloud's commercial distributed transaction service GTS, a model that has undergone rigorous testing in online environments. The TXC distributed transaction model of Fescar is similar to the traditional XA transaction model, with the main difference being the positioning of the resource manager—one at the application layer and the other at the database layer. The author believes that Fescar's TXC model implementation is of significant research value, so today we will thoroughly explore the Fescar project's code. This article is lengthy and will take about 30-60 minutes to read and understand.

Project Address

Fescar: https://github.com/alibaba/fescar

The code discussed in this blog post is from the 0.1.2-SNAPSHOT version of Fescar. As Fescar evolves, the project structure and module implementations might change significantly.

Fescar's TXC Model

The above image is an official schematic of the TXC model created by Fescar. The quality of visuals produced by large companies is indeed impressive. From the schematic, we can see the overall implementation of TXC. TXC is implemented through three components, as depicted in the three dark yellow sections in the image, with the following roles:

  1. TM: Global Transaction Manager, which starts the Fescar distributed transaction on the server side and sends the global transaction to the TC (Transaction Coordinator) for management.
  2. TC: Transaction Coordinator, which controls the global transaction's commit or rollback. This component requires independent deployment and maintenance, currently only supporting a single-machine version. Future iterations plan to include a clustered version.
  3. RM: Resource Manager, mainly responsible for reporting branch transactions and managing local transactions.

A brief description of its implementation process: The initiating service starts a global transaction and registers it with the TC. When calling a cooperating service, the branch transaction of the cooperating service completes the first phase of transaction commit or rollback and generates an undo_log for transaction rollback, then registers the current cooperating service with the TC and reports its transaction status, merging it into the global transaction of the same business. If no issues arise, it proceeds to the next cooperating service call. If any branch transaction of the cooperating service rolls back, it will notify the TC, which then notifies all branch transactions of the global transaction that have completed the first phase to roll back. If all branch transactions proceed normally, it will notify the TC when returning to the global transaction initiator, and the TC will notify all branches of the global transaction to delete the rollback logs. To solve write isolation and degree isolation issues during this process, global locks managed by the TC will be involved.

The goal of this blog post is to delve into the code details and explore how its basic ideas are implemented. We will first outline the role of each module from the project's structure, then investigate the entire distributed transaction implementation process using the official examples.

Project Structure Analysis

After pulling the project and opening it with an IDE, the directory structure is as follows. Let's take a look at the implementation of each module:

  • common: Common components, providing commonly used utility classes, static variables, extension mechanism class loaders, and defining global exceptions, etc.
  • config: Configuration loading and parsing module, providing basic interfaces for configuration. Currently, only file configuration is implemented, with plans for implementations of configuration centers like Nacos.
  • core: The core module mainly encapsulating RPC-related content for communication between TM, RM, and TC.
  • dubbo: The Dubbo module mainly adapts the Dubbo communication framework, using Dubbo's filter mechanism to pass global transaction information to branches.
  • examples: A simple example module that we will explore to understand the implementation.
  • rm-datasource: The resource management module, a core module that proxies some JDBC classes to parse SQL, generate rollback logs, and coordinate local transactions. Personally, I think naming this module "core" would be more appropriate.
  • server: The TC component, mainly coordinating and managing global transactions, responsible for committing or rolling back global transactions, and maintaining global locks.
  • spring: The module integrated with Spring, mainly consisting of AOP logic, serving as the entry point for the entire distributed transaction, and the breakthrough point for studying Fescar.
  • tm: The global transaction management module, managing the boundaries of global transactions, and controlling the initiation and rollback points of global transactions.

Viewing the Effects through the [examples] Module

First, start the TC (Server) module, and start the main method directly. The default server port is 8091.

Second, go to the examples module and configure the configuration files for the order, business, account, and storage services, mainly the MySQL data source and Zookeeper connection address. Note that the default Dubbo Zookeeper registry dependency is missing, and starting it will throw a class not found exception. Add the following dependency:

<dependency>
<groupId>com.101tec</groupId>
<artifactId>zkclient</artifactId>
<version>0.10</version>
<exclusions>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
</exclusions>
</dependency>

Third, place a breakpoint at the simulated exception in BusinessServiceImpl. Start OrderServiceImpl, StorageServiceImpl, AccountServiceImpl, and BusinessServiceImpl services one by one. After hitting the breakpoint, check the account_tbl table in the database; the amount has been reduced by 400 yuan, to 599 yuan. Then, release the breakpoint to trigger the simulated exception in the BusinessServiceImpl module. The global transaction rolls back, and the account_tbl table amount returns to 999 yuan.

As shown above, we have experienced the control capability of Fescar transactions. Next, let's look at how it controls transactions in detail.

Analysis of Fescar Transaction Process

First, Analyze the Configuration File

This is a golden rule: to integrate any technology or framework, the configuration file is definitely a breakthrough point. From the above example, we learned that the configuration file in the example module configured an instance of a global transaction scanner, as follows:

<bean class="com.alibaba.fescar.spring.annotation.GlobalTransactionScanner">
<constructor-arg value="dubbo-demo-app"/>
<constructor-arg value="my_test_tx_group"/>
</bean>

This instance scans all instances when the project starts. The specific implementation can be found in the [spring] module, and methods marked with the @GlobalTransactional annotation are woven into the logic of GlobalTransactionalInterceptor's invoke method. When the application starts, instances of TM (TmRpcClient) and RM (RmRpcClient) are initialized, connecting the service with the TC (Transaction Coordinator). Going further involves the TransactionalTemplate class in the TM module.

[TM] Module Starts Global Transactions

The opening, committing, and rolling back of global transactions are encapsulated in the TransactionalTemplate. The code is as follows:


public Object execute(TransactionalExecutor business) throws TransactionalExecutor.ExecutionException {
// 1. get or create a transaction
GlobalTransaction tx = GlobalTransactionContext.getCurrentOrCreate();
// 2. begin transaction
try {
tx.begin(business.timeout(), business.name());
} catch (TransactionException txe) {
throw new TransactionalExecutor.ExecutionException(tx, txe,
TransactionalExecutor.Code.BeginFailure);
}
Object rs = null;
try {
// Do Your Business
rs = business.execute();
} catch (Throwable ex) {
// 3. any business exception, rollback.
try {
tx.rollback();
// 3.1 Successfully rolled back
throw new TransactionalExecutor.ExecutionException(tx, TransactionalExecutor.Code.RollbackDone, ex);
} catch (TransactionException txe) {
// 3.2 Failed to rollback
throw new TransactionalExecutor.ExecutionException(tx, txe,
TransactionalExecutor.Code.RollbackFailure, ex);
}
}
// 4. everything is fine, commit.
try {
tx.commit();
} catch (TransactionException txe) {
// 4.1 Failed to commit
throw new TransactionalExecutor.ExecutionException(tx, txe,
TransactionalExecutor.Code.CommitFailure);
}
return rs;
}

The more detailed implementation in the [TM] module is divided into two classes, as follows:

DefaultGlobalTransaction: Responsible for the specific actions of starting, committing, and rolling back global transactions.

DefaultTransactionManager: Responsible for using TmRpcClient to send commands to the TC control center, such as starting a global transaction (GlobalBeginRequest), committing (GlobalCommitRequest), rolling back (GlobalRollbackRequest), and querying status (GlobalStatusRequest).

The above are the core contents of the TM module. After the TM module completes the global transaction start, we then look at how the global transaction ID, xid, is passed and how the RM component intervenes.

Passing Global Transaction xid with [dubbo]

First is the transmission of xid. Currently, the transmission in a microservice architecture implemented with the Dubbo framework has been realized. It is also easy to implement for others like Spring Cloud and Motan. By using the filter mechanism that general RPC communication frameworks have, xid is passed from the initiating node of the global transaction to the service's subordinate nodes. After being received by the subordinate nodes, it is bound to the current thread context environment to determine whether to join the global transaction when the branch transaction executes SQL. Fescar's implementation can be seen in the [dubbo] module as follows:

@Activate(group = { Constants.PROVIDER, Constants.CONSUMER }, order = 100)
public class TransactionPropagationFilter implements Filter {

private static final Logger LOGGER = LoggerFactory.getLogger(TransactionPropagationFilter.class);

@Override
public Result invoke(Invoker<?> invoker, Invocation invocation) throws RpcException {
String xid = RootContext.getXID();
String rpcXid = RpcContext.getContext().getAttachment(RootContext.KEY_XID);
if (LOGGER.isDebugEnabled()) {
LOGGER.debug("xid in RootContext\[" + xid + "\] xid in RpcContext\[" + rpcXid + "\]");
}
boolean bind = false;
if (xid != null) {
RpcContext.getContext().setAttachment(RootContext.KEY_XID, xid);
} else {
if (rpcXid != null) {
RootContext.bind(rpcXid);
bind = true;
if (LOGGER.isDebugEnabled()) {
LOGGER.debug("bind\[" + rpcXid + "\] to RootContext");
}
}
}
try {
return invoker.invoke(invocation);

} finally {
if (bind) {
String unbindXid = RootContext.unbind();
if (LOGGER.isDebugEnabled()) {
LOGGER.debug("unbind\[" + unbindXid + "\] from RootContext");
}
if (!rpcXid.equalsIgnoreCase(unbindXid)) {
LOGGER.warn("xid in change during RPC from " + rpcXid + " to " + unbindXid);
if (unbindXid != null) {
RootContext.bind(unbindXid);
LOGGER.warn("bind \[" + unbindXid + "\] back to RootContext");
}
}
}
}
}
}

When rpcXid is not null, it is added to the ContextCore of RootContext. Let's delve into this a bit. ContextCore is an extensible interface, and the default implementation is ThreadLocalContextCore, which maintains the current xid based on ThreadLocal. Fescar provides an extensible mechanism implemented in the [common] module. Through a custom class loader, EnhancedServiceLoader, it loads the service classes that need to be extended. By adding the @LoadLevel annotation with a high order attribute to the extension class, the purpose of extension implementation can be achieved.

Intervention of Local Resource Management in the [RM] Module

Fescar implements proxy classes for local transaction-related interfaces through a proxy mechanism, such as DataSourceProxy, ConnectionProxy, and StatementProxy. This can be seen in the configuration file, indicating that to use Fescar distributed transactions, the proxy data source provided by Fescar must be configured. For example:

After configuring the proxy data source, starting from DataSourceProxy, we can freely control all local operations on the database. From the xid transmission above, we know that the xid is saved in RootContext. Now, look at the following code to see it clearly:

First, look at a piece of code from StatementProxy:

Then, look at the code in ExecuteTemplate:

Similar to the transaction management template class TransactionalTemplate in the [TM] module, the crucial logic proxy here is encapsulated in the ExecuteTemplate template class. By overriding Statement with StatementProxy implementation, the execute logic of ExecuteTemplate is called when the original JDBC executeUpdate method is executed. Before the SQL is actually executed, it checks whether the current context in RootContext contains xid, i.e., whether it is a global distributed transaction. If not, the local transaction is used directly. If it is, RM adds some distributed transaction-related logic. Fescar has encapsulated five different executors to handle different types of SQL, namely UpdateExecutor, DeleteExecutor, InsertExecutor, SelectForUpdateExecutor, and PlainExecutor. The structure is as follows:

PlainExecutor:

The native JDBC interface implementation, without any processing, is used for ordinary select queries in global transactions.

UpdateExecutor, DeleteExecutor, InsertExecutor:

The three DML (Data Manipulation Language) executors for updating, deleting, and inserting, mainly analyze the SQL statements before and after execution and implement the following two abstract interface methods:

protected abstract TableRecords beforeImage() throws SQLException;

protected abstract TableRecords afterImage(TableRecords beforeImage) throws SQLException;

During this process, the undo_log for rollback operation is generated by analyzing the SQL, and the log is currently saved in MySQL, sharing the same transaction with the business SQL operation. The table structure is as follows:

The rollback_info column contains the detailed information of the undo_log, which is of type longblob. The structure is as follows:

{
"branchId":3958194,
"sqlUndoLogs":[
{
"afterImage":{
"rows":[
{
"fields":[
{
"keyType":"PrimaryKey",
"name":"ID",
"type":4,
"value":10
},
{
"keyType":"NULL",
"name":"COUNT",
"type":4,
"value":98
}
]
}
],
"tableName":"storage_tbl"
},
"beforeImage":{
"rows":[
{
"fields":[
{
"keyType":"PrimaryKey",
"name":"ID",
"type":4,
"value":10
},
{
"keyType":"NULL",
"name":"COUNT",
"type":4,
"value":100
}
]
}
],
"tableName":"storage_tbl"
},
"sqlType":"UPDATE",
"tableName":"storage_tbl"
}
],
"xid":"192.168.7.77:8091:3958193"
}


Here is an example of an update operation. The undo_log records very detailed information. It associates the branch ID with the global transaction xid, records the table name, the operation field names, and the records before and after the SQL execution. For instance, this record shows table name = storage_tbl, before SQL execution ID = 10, count = 100, after SQL execution ID = 10, count = 98. If the entire global transaction fails and needs to be rolled back, it can generate the following rollback SQL statement:

update storage_tbl set count = 100 where id = 10;

SelectForUpdateExecutor:

In Fescar's AT mode, the default isolation level above the local transaction is read uncommitted. However, through the SelectForUpdateExecutor, it can support the read committed isolation level. The code is as follows:

@Override
public Object doExecute(Object... args) throws Throwable {
SQLSelectRecognizer recognizer = (SQLSelectRecognizer) sqlRecognizer;

Connection conn = statementProxy.getConnection();
ResultSet rs = null;
Savepoint sp = null;
LockRetryController lockRetryController = new LockRetryController();
boolean originalAutoCommit = conn.getAutoCommit();

StringBuffer selectSQLAppender = new StringBuffer("SELECT ");
selectSQLAppender.append(getTableMeta().getPkName());
selectSQLAppender.append(" FROM " + getTableMeta().getTableName());
String whereCondition = null;
ArrayList<Object> paramAppender = new ArrayList<>();
if (statementProxy instanceof ParametersHolder) {
whereCondition = recognizer.getWhereCondition((ParametersHolder) statementProxy, paramAppender);
} else {
whereCondition = recognizer.getWhereCondition();
}
if (!StringUtils.isEmpty(whereCondition)) {
selectSQLAppender.append(" WHERE " + whereCondition);
}
selectSQLAppender.append(" FOR UPDATE");
String selectPKSQL = selectSQLAppender.toString();

try {
if (originalAutoCommit) {
conn.setAutoCommit(false);
}
sp = conn.setSavepoint();
rs = statementCallback.execute(statementProxy.getTargetStatement(), args);

while (true) {
// Try to get global lock of those rows selected
Statement stPK = null;
PreparedStatement pstPK = null;
ResultSet rsPK = null;
try {
if (paramAppender.isEmpty()) {
stPK = statementProxy.getConnection().createStatement();
rsPK = stPK.executeQuery(selectPKSQL);
} else {
pstPK = statementProxy.getConnection().prepareStatement(selectPKSQL);
for (int i = 0; i < paramAppender.size(); i++) {
pstPK.setObject(i + 1, paramAppender.get(i));
}
rsPK = pstPK.executeQuery();
}

TableRecords selectPKRows = TableRecords.buildRecords(getTableMeta(), rsPK);
statementProxy.getConnectionProxy().checkLock(selectPKRows);
break;

} catch (LockConflictException lce) {
conn.rollback(sp);
lockRetryController.sleep(lce);

} finally {
if (rsPK != null) {
rsPK.close();
}
if (stPK != null) {
stPK.close();
}
if (pstPK != null) {
pstPK.close();
}
}
}

} finally {
if (sp != null) {
conn.releaseSavepoint(sp);
}
if (originalAutoCommit) {
conn.setAutoCommit(true);
}
}
return rs;
}

The key code is as follows:

TableRecords selectPKRows = TableRecords.buildRecords(getTableMeta(), rsPK);
statementProxy.getConnectionProxy().checkLock(selectPKRows);

By operating on the selectPKRows table records, lockKeys are obtained, and then it checks with the TC controller to see if they are globally locked. If they are locked, it retries until the lock is released and then returns the query result.

Registration and Reporting of Branch Transactions

Before the local transaction is committed, Fescar registers and reports information related to the branch transaction. This can be seen in the commit part of the ConnectionProxy class code:

@Override
public void commit() throws SQLException {
if (context.inGlobalTransaction()) {
try {
register();
} catch (TransactionException e) {
recognizeLockKeyConflictException(e);
}

try {
if (context.hasUndoLog()) {
UndoLogManager.flushUndoLogs(this);
}
targetConnection.commit();
} catch (Throwable ex) {
report(false);
if (ex instanceof SQLException) {
throw (SQLException) ex;
} else {
throw new SQLException(ex);
}
}
report(true);
context.reset();

} else {
targetConnection.commit();
}
}

From this code, we can see that it first checks whether it is a global transaction. If it is not, it commits directly. If it is, it first registers the branch transaction with the TC controller. For write isolation, it involves obtaining global locks on the TC side. Then, it saves the undo_log used for rollback operations and finally commits the local transaction. Lastly, it reports the transaction status to the TC controller. At this point, the first phase of the local transaction is complete.

Coordinating the Global Transaction with the [server] Module

For the server module, we can focus on the DefaultCoordinator class. This is the default implementation of the AbstractTCInboundHandler controller handler. It mainly implements interfaces for starting, committing, rolling back, querying the status of global transactions, registering branch transactions, reporting, and checking locks, such as:

Returning to the TransactionalTemplate at the beginning, if the entire distributed transaction fails and needs to be rolled back, TM first initiates a rollback instruction to TC. After TC receives it and parses the request, it is routed to the doGlobalRollback method of the default controller class. The code executed on the TC controller side is as follows:

@Override
public void doGlobalRollback(GlobalSession globalSession, boolean retrying) throws TransactionException {
for (BranchSession branchSession : globalSession.getReverseSortedBranches()) {
BranchStatus currentBranchStatus = branchSession.getStatus();
if (currentBranchStatus == BranchStatus.PhaseOne_Failed) {
continue;
}
try {
BranchStatus branchStatus = resourceManagerInbound.branchRollback(XID.generateXID(branchSession.getTransactionId()), branchSession.getBranchId(),
branchSession.getResourceId(), branchSession.getApplicationData());

switch (branchStatus) {
case PhaseTwo_Rollbacked:
globalSession.removeBranch(branchSession);
LOGGER.error("Successfully rolled back branch " + branchSession);
continue;
case PhaseTwo\_RollbackFailed\_Unretryable:
GlobalStatus currentStatus = globalSession.getStatus();
if (currentStatus.name().startsWith("Timeout")) {
globalSession.changeStatus(GlobalStatus.TimeoutRollbackFailed);
} else {
globalSession.changeStatus(GlobalStatus.RollbackFailed);
}
globalSession.end();
LOGGER.error("Failed to rollback global\[" + globalSession.getTransactionId() + "\] since branch\[" + branchSession.getBranchId() + "\] rollback failed");
return;
default:
LOGGER.info("Failed to rollback branch " + branchSession);
if (!retrying) {
queueToRetryRollback(globalSession);
}
return;

}
} catch (Exception ex) {
LOGGER.info("Exception rollbacking branch " + branchSession, ex);
if (!retrying) {
queueToRetryRollback(globalSession);
if (ex instanceof TransactionException) {
throw (TransactionException) ex;
} else {
throw new TransactionException(ex);
}
}

}

}
GlobalStatus currentStatus = globalSession.getStatus();
if (currentStatus.name().startsWith("Timeout")) {
globalSession.changeStatus(GlobalStatus.TimeoutRollbacked);
} else {
globalSession.changeStatus(GlobalStatus.Rollbacked);
}
globalSession.end();
}

As seen from the above code, during rollback, each branch transaction is iterated from the global transaction session, and then each branch transaction is notified to rollback. When the branch service receives the request, it is first routed to the doBranchRollback method in RMHandlerAT, and then the branchRollback method in RM is called. The code is as follows:

@Override
public BranchStatus branchRollback(String xid, long branchId, String resourceId, String applicationData) throws TransactionException {
DataSourceProxy dataSourceProxy = get(resourceId);
if (dataSourceProxy == null) {
throw new ShouldNeverHappenException();
}
try {
UndoLogManager.undo(dataSourceProxy, xid, branchId);
} catch (TransactionException te) {
if (te.getCode() == TransactionExceptionCode.BranchRollbackFailed_Unretriable) {
return BranchStatus.PhaseTwo_RollbackFailed_Unretryable;
} else {
return BranchStatus.PhaseTwo_RollbackFailed_Retryable;
}
}
return BranchStatus.PhaseTwo_Rollbacked;
}

On the RM branch transaction side, the UndoLogManager's undo method is ultimately executed. It retrieves the rollback log from the database using xid and branchId to complete the data rollback operation. The entire process is completed synchronously. If the global transaction is successful, TC will have a similar coordination process as mentioned above, but it will be asynchronous, clearing the undo_log related to the global transaction. At this point, the two-phase commit or rollback is completed, thus achieving complete control of the global transaction.

Conclusion

If you've made it this far, thank you very much for patiently taking the time to learn amidst your busy schedule. I believe the time spent was worthwhile. By thoroughly reading and understanding this article, you likely have a comprehensive understanding of the main flow of Fescar's implementation. This article took approximately one person-day from conception to completion. During this process, I also gained a deeper understanding of Fescar's implementation. Due to space constraints, I did not delve into every detail of the implementation, such as how SQL parsing is done, but instead focused on the key points of the implementation process of Fescar's TXC model. This article has been proofread, but due to personal knowledge limitations and finite energy, errors or misunderstandings may inevitably occur. Corrections are welcome.

About the Author:

Chen Kailing joined Kaijing Technology in May 2016. Formerly a senior developer and project manager, currently the head of the Architecture & Operations Department of Kaijing Technology's R&D Center. PMP project management certification, Alibaba Cloud MVP. Enthusiastic about open source, having open-sourced several popular projects. Passionate about sharing technical insights, and author of the independent blog KL Blog (http://www.kailing.pub).

· 2 min read

Review the description in the overview: a distributed global transaction, the whole is a model of the two-phase commit. A global transaction consists of several branch transactions that meet the model requirements of the two-phase commit, which requires each branch transaction to have its own:

  • One phase prepare behavior
  • Two phase commit or rollback behavior

Overview of a global transaction

According to the two phase behavior pattern,We divide the branch transaction into Automatic (Branch) Transaction Mode and Manual (Branch) Transaction Mode.

The AT mode is based on the Relational Database that supports local ACID transactions

  • One phase prepare behavior: In the local transaction, the business data update and the corresponding rollback log record are submitted together.
  • Two phase commit behavior: Immediately ended successfully, Auto asynchronous batch cleanup of the rollback log.
  • Two phase rollback behavior: By rolling back the log, automatic generates a compensation operation to complete the data rollback.

Accordingly, the MT mode does not rely on transaction support for the underlying data resources:

  • One phase prepare behavior: Call the prepare logic of custom .
  • Two phase commit behavior:Call the commit logic of custom .
  • Two phase rollback behavior:Call the rollback logic of custom .

The so-called MT mode refers to the support of the branch transaction of custom into the management of global transactions.