Using AWS Aurora Database Cluster

This page provides details on using an Amazon Aurora cluster as the TeamCity database server.

Overview

When using an AWS Aurora cluster with TeamCity pointing to the cluster end-point as the database server, it is important to understand what happens when an AWS Aurora cluster fails over.

Both AWS Aurora DB instances are rebooted (so for a short period of time TeamCity entirely loses connection to the cluster) and

the original DB instance is started with innodb_read_only flag set (the new reader instance);
the former failover instance is the new writer and the cluster endpoint DNS record is changed to point to the new writer instance.

By default, TeamCity JVM caches DNS name lookups, which essentially means that TeamCity will stay connected to the original DB instance until DNS cache expires. This in turn leads to the database connection pool on the TeamCity side to be populated with the connections to the new reader.

It will take some time for the JVM-specific cache in TeamCity to expire and for the invalid connections to be evicted from the pool.

General Recommendations

When working with a failover cluster, it is recommended you decrease the JVM-specific DNS caching on TeamCity by setting the TTL to 60:

Add the
-Dsun.net.inetaddr.ttl=60
JVM option to the environment variable.
Restart TeamCity for the changes to take effect.
You may also wish to manually flush your OS-specific DNS cache, but this is the action to be taken each time Aurora fails over.

Forcing TeamCity to Connect to New Writer

You can force TeamCity to connect to the new writer either manually or automatically.

To force the connection manually:

reboot the new DB reader instance again: the reboot will take up to 2 minutes which is sufficient for the DNS cache to expire as well as for the invalid connections to be evicted from the pool
alternatively, restart TeamCity manually and all the connections will be created anew.

To force TeamCity to automatically connect to the new primary instance as soon as it's up and running:

Configure the database connection pool to use a special validation query, so that the connections to the DB instance are tested before and/or after use and if a connection to the read-only database is detected, they are evicted from the pool.

Add the following lines to the <TeamCity Data Directory>/config/ file:
testOnBorrow=true testOnReturn=true testWhileIdle=true timeBetweenEvictionRunsMillis=60000 validationQuery=select case when @@read_only + @@innodb_read_only \= 0 then 1 else (select table_name from information_schema.tables) end as `1`
Restart TeamCity. Once you do that, the following SQL query:
select case when @@read_only + @@innodb_read_only = 0 then 1 else (select table_name from information_schema.tables) end as `1`
will be executed for all connections whenever they are borrowed from or returned to the pool, and also every 1 minute (60000 milliseconds) for idle connections, raising error 1242 (ER_SUBSELECT_NO_1_ROW ) for each connection to the read-only database.

Last modified: 20 April 2023