Tuesday, September 11, 2018

Live upgrade with data transferring


Figure 1. Data and software upgrades


A. Data upgrade process
As shown in figure 1, the executive server and mirror server are receiving live data during a software upgrade with data transferring from the executive server to the mirror side in parallel.
The process could be described as follows:
1. Disconnect the mirror side from live traffic
2. Load new software or data on the mirror server
3. Start upgrading by allowing data with its timestamp from the executive server transferring to the mirror side in parallel with live data coming to the mirror server at the same time.
- The executive server keeps offering services to users during data upgrade
- Data coming from live traffic to the mirror server would have higher priority as compared to data updates with timestamp coming from the executive server, i.e. live data would be written to the mirror database if data collision occurred.
The process of data update is carried out orderly in the executive server while live data coming from the users randomly or unpredictably.
After data transferring from the executive server to the mirror sever completed, operators would run a command (or automatically by system) to switch the mirror server to executive side. Software and data would be transferred from the new executive server to the new mirror server. Both sides would be operated normally after software and data completely upgraded.

B. Exception

There is a case that life traffic data to mirror server are not completed or incorrect, because those are dynamic data.

For example, user is making a call and that call couldn’t be properly setup in the upgraded mirror server, because there was no physical connection of phones. However, data was properly written on the executive side. If those data fields had been transferred to the mirror server during data update with timestamp, the mirror server missed some live data.

In order to save some dynamic data, the software upgrade process could record data fields newly written that have created or written on the executive server, but those haven’t written in the mirror server. This is the second pass of data transfer. The executive server transfers those data to the mirror server.

Because live traffic is ongoing forever, the upgrade process must stop after the second pass or third pass. We must accept some data lost.

C. Additional pass of data transfer

The operating system could perform another loop of screening data to be transferred between the executive server and the mirror server. If data in the executive server is newer based on the timestamp, data would be copied to the mirror server again.

However, this is not a bullet proof method as data keep updating and the screening loop and data copy takes some time.

I heard that IBM and Ericsson came up with a better strategy to transfer missing data by the operating system in parallel. I think, this should be a better method.

No comments:

Post a Comment