Multinode Setup for High Availability
The TeamCity server can be configured to use multiple nodes (or servers) for high availability and flexible load distribution. It is possible to set up a cluster of TeamCity nodes, where each node is responsible for different tasks, like processing data from builds or collecting changes from VCS repositories. Or, to keep one main node that does all the work and a secondary node which provides a read-only interface. In case the main node goes down, all data processing can be switched to the secondary node with minimum downtime.
As the main use case of a multinode setup is to achieve high availability (HA), this article will focus on configuring an HA cluster. However, you can use the same methods for any setup with multiple TeamCity nodes: for example, to distribute load between several machines.
Main vs. Secondary Node
A TeamCity cluster can have one main node and multiple secondary nodes. The main node is the "preferred" one. By default, it receives all incoming HTTP requests. It also performs critical background tasks, such as starting builds. A secondary node mostly serves as a backup server, necessary for the failover. For better load distribution and performance optimization, it can also be granted with additional responsibilities.
High-Availability Setup
Prerequisites
A basic HA setup must include the following components:
Dedicated database server: an external database server from the list of supported databases.
Dedicated server for the TeamCity data directory: the data directory should be shared with nodes by network, via NFS or SMB.
At least two servers for TeamCity nodes with the mounted shared data directory: both servers should have the same or comparable hardware. Otherwise, if a secondary node is less performant, you can experience a significant performance drop in case of a failover.
Local storage on the TeamCity nodes: necessary for the TeamCity installation and storing logs and local caches. To estimate the size of caches, see the size of the
<TeamCity data directory>/system/caches
directory in your current installation.Dedicated reverse HTTP proxy server.
The minimum number of server machines necessary for a high-availability setup is 5: a database, a network storage server, two TeamCity nodes, and one server for the reverse HTTP proxy. A simpler load-balancing solution might be achieved with fewer machines.
Shared Data Directory
The main TeamCity node and secondary nodes must be able to access and share the same TeamCity Data Directory.
Here are main recommendations on setting up the shared Data Directory:
For a high-availability setup, store it on a separate and well-performing machine, so it is accessible even when the main node goes down.
All TeamCity nodes’ machines should be able to access it in the read/write modes.
The typical Data Directory mounting options are SMB and NFS. TeamCity uses the Data Directory as a regular file system so all basic file system operations should be supported.
The I/O operations count or I/O volume limits should not be restricted by the storage or mounting option.
Make sure to review performance guidelines for your storage solution. For example, increasing MTU for the network connection between the server and the storage usually increases the artifact transfer speed.
Disable Network Client Caches on Data Directory Mounts
It is important that all the nodes "see" the current state of the shared Data Directory without delay. If this is not the case, it is likely to result in unstable behavior and frequent build log corruptions.
If TeamCity nodes run on Windows with Data Directory shared via SMB protocol, make sure all the registry keys mentioned in this article are set to 0 on all the TeamCity nodes.
If the Data Directory is shared via NFS, make sure all nodes have the following option in their mount settings: lookupcache=positive
.
Configuring HA Setup
This section relies on the following assumptions:
All the system requirements are met.
TeamCity is installed on both nodes.
TeamCity Data Directory and database are already initialized.
Now, we can proceed with the transition from a single server setup to a high-availability cluster setup.
Used terms:
TeamCity Data Directory — a path to a mount point of the shared data directory.
TeamCity Home Directory — a path to the TeamCity server installation directory.
Node-specific Data Directory — a path to each node’s local storage where the node keeps its specific configuration and caches.
<node_ID>
— a unique identifier of a TeamCity node, which can be used in the UI and configuration files.<node_root_URL>
— a direct URL of a node, usuallyhttp://<node_hostname>
. This URL is required for inter-node communications; a firewall should be configured to allow connections to this URL from one node to another.
To configure a TeamCity cluster consisting of two nodes, follow these steps:
Check that the version of the TeamCity server installed on both nodes is the same and corresponds to the version of the Data Directory.
Select an ID for each of the nodes: for example, a short ID based on a hostname.
Create the
TEAMCITY_SERVER_OPTS
environment variable on each node. This variable should have the following arguments:-Dteamcity.server.nodeId=<node_ID> -Dteamcity.server.rootURL=<node_root_URL> -Dteamcity.data.path=<TeamCity Data Directory> -Dteamcity.node.data.path=<Node-specific Data Directory>Start both nodes using regular TeamCity scripts or via the TeamCity service.
Open the TeamCity Administration | Nodes Configuration page on any of the two servers and enable the "Main TeamCity node " responsibility for a node you want to make main.
Proceed with configuring the reverse HTTP proxy.
Proxy Configuration
The reverse HTTP proxy serves as a single endpoint for TeamCity users and for build agents. This is also a good place to configure HTTPS connection settings for the entire cluster.
When used with TeamCity, a proxy server also acts as a load balancer for incoming requests. It can determine where the request should be sent and route it to the corresponding upstream server. In this article, we provide examples of the proxy configuration for the most popular proxy servers: NGINX, NGINX Plus and HAProxy.
Choosing the Right Proxy Server
TeamCity can work with different types of proxy servers. However, HAProxy and NGINX Plus are preferable since these servers support active health checks and sticky sessions. These features are essential for the round-robin of user requests among different nodes (supported by TeamCity 2023.05+).
In comparison, the regular NGINX proxy with standard modules lacks these features and cannot be used to configure the round-robin.
Additionally, the regular NGINX requires you to explicitly distinguish main and secondary nodes in the configuration file (the main node should be first in the list). This requirement forces you to manually update node roles when a main node transfers its responsibilities to a secondary node (for example, in case of a failover). HAProxy and NGINX Plus servers do not require you to update configuration files in the same scenario manually.
Finally, configuration files of HAProxy and NGINX Plus proxy servers are easier to maintain, especially when adding new nodes to the cluster.
Matching Proxy Version with Server
Sample proxy configurations above set a special X-TeamCity-Proxy
header. This header notifies TeamCity that a request comes through a properly configured proxy.
The X-TeamCity-Proxy
header also defines a version of the proxy config. If this version is incompatible with the current TeamCity version, TeamCity shows the "Proxy configuration version mismatch" health report. Check your proxy settings and ensure they are similar to the configuration illustrated in the Proxy Configuration section.
Round-Robin
Starting with version 2023.05, the main TeamCity node and every secondary node with the Handling UI actions and load balancing user requests responsibility participate in a round-robin.
The proxy randomly routes the first browser request to any of participating nodes. Subsequent HTTP requests are sent to the same node (the "sticky session"). This behavior distributes the load produced by user requests among different nodes, potentially allowing TeamCity to handle more simultaneous HTTP requests. In addition, round-robins improve user experience in case of a failover or a planned node restart. For instance, if a single node should be restarted, only users assigned to this node will be affected. And as soon as the proxy server detects that the node is no longer available, all these users will be distributed among other nodes.
If a request must be handled by one specific node rather than randomly assigned to any node, use the node selector in the TeamCity UI footer or add the __nodeId=<id of a node>
request parameter to the request query string.
To add or remove a node to/from the round-robin list, change the state of the Handling UI actions and load balancing user requests responsibility. No changes in the proxy configuration are required.
Domain Isolation Proxy Configuration
The domain isolation mode requires configuring a dedicated domain for serving artifacts. Make sure to add a new record in your DNS pointing from this domain to the proxy address: for example, it could be a CNAME
record pointing to the proxy URL.
Failover
In the case of a failover, when the main node is no longer available either because of a crash or during maintenance, a secondary node can be granted with the "Main TeamCity node" responsibility and act as a main node temporarily or permanently.
Each secondary node tracks the activity of the current main node and, if the main node is inactive for several (3 by default) minutes, shows the corresponding health report. The "Main TeamCity Node" responsibility can be reassigned to another node via the TeamCity UI or REST API. However, if this inactivity has not been planned (that is, the main node has crashed), it is important to verify that no TeamCity server processes are left running on the inactive node. If you detect such processes, you need to stop them before reassigning the "Main TeamCity node" responsibility.
If you use the standard NGINX proxy server, update node IDs and hostnames in the reverse proxy configuration and reload the proxy server configuration after switching the "Main TeamCity Node" responsibility. HAProxy and NGINX Plus proxy servers do not require manual configuration updates.
To sum up, follow these steps in case of a failover:
Wait for a server health report about the main node unavailability on the secondary node.
Make sure the main node is stopped.
Switch the "Main node" responsibility on the secondary node.
(if regular NGINX is used) Update the main node ID in the reverse proxy configuration and reload the config.
Monitoring and Managing Nodes via REST API
Starting from TeamCity 2022.10, you can use REST API to check the node status and reassign node responsibilities.
To get the list of all online nodes, use:
To assign the Main node responsibility to a secondary node, use:
The same example with curl:
Nodes Configuration and Usage
Authentication and License
The main and secondary nodes operate under the same license.
Secondary nodes use the same authentication settings as the main node. However, users might be asked to relog in after they are routed to a secondary node. This happens in two cases: (1) a user did not select the "Remember Me" option on the login screen and (2) SSO authentication is not configured.
Global Settings
Secondary notes use the same global settings (path to artifacts, version control settings, and so on) as the main node.
Responsibilities
By default, a newly started secondary node provides a read-only user interface and does not perform any background activity. You can assign the following additional responsibilities to each secondary node in Administration | Server Administration | Nodes Configuration:
To allow users perform the most common actions on builds and agents, enable the Handling UI actions and load balancing user requests responsibility.
You can enable and disable responsibilities for nodes at any moment.
Processing Data Produced by Builds on Secondary Node
It is possible to use one or more secondary nodes to process traffic from the TeamCity agents. This allows moving the related load to a separate machine from the main TeamCity node which will improve the TeamCity performance when handling hundreds of concurrent and actively logging builds.
In general, you do not need a separate node for running builds unless you have more than 400 agents connected to a single node. Using a secondary node allows you to significantly increase the number of agents which the setup can handle.
Once you assign a secondary node to the Processing data produced by builds responsibility for the first time, all* newly started builds will be routed to this node. The existing running builds will continue being executed on the main node. When you disable the responsibility, only the newly started builds will be switched to the main node. The builds that were already running on the secondary node will continue running there.
If you assign more than one secondary nodes to this responsibility, builds will be distributed equally between these nodes.
* You can control how many builds can be run by each node.
To do this, find the required node in the list of available nodes and click Edit next to its Processing data produced by running builds responsibility. The Limit builds dialog will open. Here, you can enter a relative limit of builds allowed to run on this node. We suggest that you choose this limit depending on the node's hardware capabilities.
VCS Repositories Polling on Secondary Node
Initially, only the main TeamCity node polls VCS repositories for new commits. Enabling the "VCS Repositories Polling" responsibility on multiple nodes allows you to distribute this potentially slow activity and reduce the latency for starting new builds.
Commit hooks configured on your main node do not require any changes and remain functional after you delegate the VCS polling to secondary node(s).
Processing Triggers on Secondary Node
In setups with many build agents, a significant amount of the main node's CPU is allocated to constant processing of build triggers. By enabling the Processing build trigger responsibility for one or more secondary nodes, you can distribute the trigger processing tasks and CPU load between the main node and the responsible secondary ones. TeamCity distributes the triggers automatically but you can see what triggers are currently assigned to each node.
Handling UI Actions and Load Balancing User Requests
This responsibility is responsible for allowing user actions on a secondary node. It is especially useful when the main node is down or goes through maintenance.
Enabling of this responsibility also adds the node to the list of the nodes participating in round-robin.
Main Node Responsibility
You can assign a secondary node to the Main TeamCity node responsibility. This responsibility by default belongs to the current main node, but gets vacant if this node becomes unavailable. After you assign any secondary node to this responsibility, it becomes the main node and in addition receives the "Handling UI Actions and Load Balancing User Requests" responsibility. All the running builds will continue their operations without interruption. If a proxy is configured in your setup, build agents will seamlessly reconnect to the new main node.
When the previous main node starts again, it becomes a secondary node, as the Main TeamCity node responsibility is already occupied by another node. If necessary, you can repeat the procedure above to switch roles between these nodes.
Internal Properties
All TeamCity nodes rely both on common internal properties, stored in the shared data directory, and specific properties, stored in the node-specific data directory. The node-specific properties have a higher priority and overwrite the common values in case of a conflict.
To disable any common property on a given secondary node, pass it using the following syntax: -<property_name>
.
Secondary Node Memory Settings
The secondary node requires the same memory settings as the main node. If you have already configured the TEAMCITY_SERVER_MEM_OPTS
environment variable for the main node, make sure to use the same variable for the secondary node. If your main node uses a 64-bit JVM, the secondary node must use a 64-bit JVM as well.
Project Import
You can import projects to the main node only. Secondary nodes will detect the imported data in the runtime, without restarting.
Clean-up
The TeamCity clean-up task can be scheduled on any TeamCity node, but it executes on the main node only. In a multi-node configuration, as well as in a single node configuration, the task can run while secondary nodes are handling their operations.
Using Plugins
A secondary node has access to all plugins enabled on the main node. It also watches for the newly uploaded plugins. If a secondary node detects that a plugin has been uploaded to the main node, it will show the respective notification on its Administration | Plugins page. If the plugin supports it, it can be reloaded in runtime with no need to restart the secondary node — the respective hint will be displayed in the UI.
Installing Additional Secondary Node
To install a secondary node, follow these steps on the secondary node machine:
Install the TeamCity software as usual: download the distribution package and follow the installation wizard. Please note: when installing a secondary node using the installation wizard, it is important not to start the TeamCity Server service until the environment variables in step 2 and 3 are configured.
Provide the path to the shared Data Directory via the
TEAMCITY_DATA_PATH
environment variable.Add additional arguments to the
TEAMCITY_SERVER_OPTS
environment variable:TEAMCITY_SERVER_OPTS = -Dteamcity.server.nodeId=<node_ID> -Dteamcity.node.data.path=<path_to_node_data_directory> -Dteamcity.server.rootURL=<node_URL>where
<node_ID>
is the ID of the node that will be displayed on the Administration | Nodes Configuration page.<path_to_node_data_directory>
is the path to the<Node-specific Data Directory>
.<node_URL>
is the secondary node root URL. It should be accessible from the main node and agents.
Upgrade/Downgrade
It is recommended that the main TeamCity node and all secondary nodes have the same version. In certain cases, the main node and secondary nodes can be running different versions for a short period, for example, during the minor upgrade of the main node. When the versions of a secondary node and the main node are different, the corresponding health report will be displayed on both nodes.
When upgrading to a minor version (a bugfix release), the main and the secondary nodes should be running without issues as the TeamCity data format stays the same. You can upgrade the main TeamCity node and then the secondary nodes manually, or with the automatic update.
When upgrading the main node to a major version, its TeamCity data format changes. Because of that, as soon as the main node of a new version starts, the secondary nodes detect the main node has a different data format and switch to the read-only mode. It may take some time for the secondary nodes to become read-only. During this time the main node will wait until it ensures that all the other nodes do not change data anymore.
To upgrade nodes in a multinode setup to a major version of TeamCity, follow these steps:
Start the upgrade on the main TeamCity node as usual.
Proceed with the upgrade.
Verify that everything works properly and agents are connecting to the main node (the agents will reroute the data that was supposed to be routed to the secondary nodes to the main node).
Perform upgrade of TeamCity on the secondary nodes to the same version.
To downgrade nodes in a multinode setup, follow these steps:
Shutdown the main node and the secondary nodes.
Restore the data from backup (only if the data format has been changed during the upgrade).
Downgrade the TeamCity software on the main node.
Start the main TeamCity node and verify that everything works properly.
Downgrade the TeamCity software on the secondary nodes to the same version as the main node.
Start the secondary nodes.
TeamCity agents will perform upgrade/downgrade automatically.
Start/Stop
Any TeamCity node can be started/stopped using regular TeamCity scripts (teamcity-server.bat
or teamcity-server.sh
) or Windows services. The environment variables [TEAMCITY_DATA_PATH
], [TEAMCITY_SERVER_MEM_OPTS
], and [TEAMCITY_SERVER_OPTS
] are supported by all types of nodes.
All nodes use the same approach to logging events. You can check the state of the startup in the <TeamCity Home Directory>/logs/teamcity-server.log
file. Or, you can open <node root URL>
in your browser to see the TeamCity startup screens.
A secondary node, as well as the main node, can be stopped or restarted while the builds are running. They will continue running on agents and either will be reassigned to another node by the proxy, or will wait until their appointed node starts again.
Backup/Restore
You can start backup on any node from the TeamCity UI that has the Handling UI actions and load balancing user requests responsibility. You can also start backup on any node from a command line. While a backup runs, all other nodes do not allow scheduling cleanup, importing projects, or starting another backup process.
The restore operation can be done on either of the nodes, but only if all nodes using the TeamCity database and data directory are stopped.
Multinode Setup Health Reports
See the related reports in the dedicated article.
Common Issues
Trouble Accessing Data Directory from Windows Service
Note that if TeamCity is running as a service on Windows, it might not be able to access the TeamCity Data Directory via a mapped network drive. This happens because Windows services cannot work with mapped network drives, and TeamCity does not support the UNC format (\\host\directory
) for the Data Directory path. To work around this problem, you can use mklink
as it allows making a symbolic link on a network drive:
Make sure remote-to-local symbolic link evaluations are enabled in your OS:
To enable them, use the following command: