How to Run All UI Tests on a Merge Request Without Getting Older | by Yulia Mittova | May, 2023

I am continuing to post translations of the most interesting articles from habr.com. And today’s article is about reducing running all UI tests on PR from 6 hours to 30 mins. I know from my own experience that this problem is the one that almost every organisation with complex Android apps are facing. If you find this article useful, it would be the best reward for me.
I hope you find something interesting or useful here. And I would really appreciate any comments and suggestions to make my work better. ❤

Today’s author: https://habr.com/ru/users/Hukumister/
The original post: https://habr.com/ru/articles/725778/

Sooner or later, a large product faces the problem of the number of tests, or rather, how long it takes to run them. At the same time, not all teams are ready to spend a lot of effort on optimising this process. It is much easier to solve the problem with many emulators or resources on CI.

In this long read, I want to talk about how we reduced the time of running UI tests on a merge request (we will be using PR here) from 4 hours to 30 minutes, the approaches to solving the problem and how to make our own Test Impact Analysis.

Imagine you are a small startup that just started developing its own app. Most likely, you won’t think a lot about tests at the early stage because even manual testing takes no more than one day.

Over time, the application will grow up, the number of developers in the team will increase, and the regression will take a significant amount of time. To prevent increasing release time, you integrate automated testing. Start covering features with unit and UI tests to make the process as automated as possible.

Sooner or later, you will find yourself at the point where there is a huge amount of unit and UI tests. There are no particular problems with unit tests: they work fast, and even if the developer didn’t run them locally, CI will run them on a pull request (PR). But it’s not that easy with UI tests: there are so many of them that it’s no longer possible to run them on every PR.

We found ourselves in such a situation. There are about 5000 UI and 6500 unit tests in our project. The run of all UI tests has long exceeded 6 hours. The only solution that can be implemented quickly is to run tests not on every PR but only on nightly builds.

Test run on nightly build in CI

We solved the problem with the time of passing the PR, but a new one has appeared. The developer can get the feedback only a day after the changes have been merged. In such a system, tests are breaking often and not always teams can find time to fix them because everyone has urgent features that need to be rolled out quickly.

  1. Test parallelisation at the CI level is one of the first ideas that come to mind. We have more than half of the tests located in feature modules, meaning you can combine some modules into one Job and run only it.

We can achieve not sequential execution of tests, but parallel ones and, with proper separation, do not even go beyond the Timeout boundary.

In theory, everything is fine, but in practice, there are two problems:

  • Jobs will not always be executed in parallel. CI is busy even without tests, and Jobs will line up. Because of that time running tests can take up to 2–3 hours. With frequently created PRs, it can go beyond all reasonable limits.
This is how our CI looks like after the creation of three PRs simultaneously.
  • There is a limitation on the number of emulators in the infrastructure. With a small number of emulators, tests will run for an unimaginably long time, and if we take a large number of emulators, there will be delays associated with them as well.

What we have in the end:

➕ No changes in the code, just change the pipelines a little.
➖ Heavy load on CI and the pool of emulators.

Conclusion: CI parallelisation can be used but as part of a more complex solution.

2. Analysis of changed files is an idea that arises in response to the question, “Do I need to run all tests on PR?”. Infrastructure constraints will prevent this from being done, so a more precise approach is needed.

The simplest implementation of such a system looks like this:
For every PR from Git, we get a list of names of changed files; after that, from Gradle, we can get modules in which the changed files were. After that, we run tests only on these modules.

tasks.register("computeImpactedTest") 
val gitDiff = cmd("git show --name-only origin/main..HEAD")

val testTasks = gitDiff
.filter filePath -> isModule(filePath)
.map filePath -> moduleNameByPath(filePath)
.distinct()
.map module -> getTestTask(module)
.toTypedArray()

dependsOn(*testTasks)

The implementation is simple but with a bug. Let’s say there is module A that uses module B. We changed something in module B and ran tests on it. The changes could also affect module A, in which you also need to run tests.

Such a problem is easy to solve. Gradle knows the whole graph, and after getting the modules in which we changed something, we can get dependent modules as well. And after that, we run all tests in them. Do you feel the trick? If the dependent module is the main module which has a huge number of tests and even terribly slow ones, then haven’t solved the problem very well.

val testTasks = gitDiff
.filter filePath -> isModule(filePath)
.map filePath -> moduleNameByPath(filePath)
.map moduleName -> getTestTask(moduleName) + getDependenciesTestTask(moduleName)
.distinct()
.toTypedArray()

Pros and cons of the approach:

➕ Simplicity of the implementation. There are no changes from the infrastructure side, and the code can be implemented within a couple of days or leisurely coding.
➖ We are running lots of tests which we shouldn’t. It’s hard to tell from the filename which tests should be run, so you have to run all the tests in the module.

Conclusion: this approach can be used, and it is implemented by Avito (https://avito-tech.github.io/avito-android/ci/ImpactAnalysis/ ) and DropBox (https://github.com/dropbox/AffectedModuleDetector ). If tests in your project are evenly distributed across the codebase, your dependency graph doesn’t have confusion, and you are fine to run extra tests, the solution will suit you well. Our project could not boast of such a thing, and a more advanced solution was required.

3. Code markup that allows you to match the code of the tests with the code of the project — is the third possible solution. You can add a special annotation on the PageObject, which would indicate which screen this PageObject belongs to. And further, when these classes are changed, the corresponding tests will be run (the ones in which the bound PageObject is used).

@UiDsl
@LinkedWith([AccountsFragment::class, AccountsRepository::class])
class AccountsPage

companion object

const val TAG = "Экран со списком счетов"

inline operator fun invoke(
block: AccountsPage.() -> Unit
) = AccountsPage().block()

//...

You can also create a config file. It’s not so important how it will be implemented — the essence is important. This solution is not the most complex, but we must fully trust the developers.

In addition, over time, a huge number of classes will accumulate in the annotation or file. This needs to be supported: classes will be removed, renamed, moved to another package, and so on. Such a system is prone to bugs and human error.

Pros and cons of this approach:

➕ Simplicity of implementation.
➖ Human factor.

Conclusion: this approach can be used if you need to run tests only when the UI changes. In all other cases, the method is inferior to the previous ones.

Now is the era of the development of ML systems, and why not fantasise about applying a neural network to this task? This is what industry giants like Google and Facebook do. But this option is out of the question for two reasons: people are needed to support such a system, and it is unknown where to get the labelled data for training.

We need a solution that can be realistically implemented in a reasonable time. An approach that would run exactly those tests that are affected by code changes, regardless of which module they are in. A solution that works automatically and eliminates developer error. Such a solution exists and is called Test Impact Analysis (TIA).

The idea of TIA is not new, like many things in the industry, everything was invented before us. Martin Fowler wrote about impact analysis in his article.

The essence of the concept is that in most programming languages, such an idea as Code Сoverage has already been implemented. We run the tests, which, in turn, run our code. After that, a file is generated that contains information about which methods and code branches have been passed and which have not. This information can be used by the IDE and highlight, right in the editor, which code is covered by tests and which is not.

A quite simple idea arises: why not use this Code Coverage in reverse? Create a Code Coverage by running all the tests on a nightly run and save it somewhere. Then we take the line numbers from Git and, based on the Code Coverage, we get the numbers or names of the tests that call these lines. That’s the whole concept, fitting into one paragraph.

Pros and cons of this approach:

➕ We run only the necessary tests. Moreover, we will run them not only in the modified module but also in all dependent modules. It doesn’t matter how the tests are divided into modules — we won’t run anything extra.
➖ Complex implementation, which involves infrastructure change. It can’t be done with just one script. This is a complex task, starting with special runners for tests and ending with the architecture of our CI. Code Coverage for a large project will be quite large, so the company must have tools that allow you to quickly save and download large files.

Let’s look closer at one of the implementations of this idea, which we implemented in our project.

The first task that needs to be solved is getting Code Coverage. For Android development, there is not much choice of tools, and it is obvious to use JaCoCo. It’s already built into Gradle, time-tested, and has an API that allows us to manage the collection of Coverage ourselves.

JaCoCo can work in several modes. The first mode is to use the Java Agent or online mode. The second mode is modifying the bytecode during compilation or offline mode.

Android uses a compile-time bytecode modification approach. Most probably, due to the fact that the JVM in Android is different from the rest, and it is not so easy to run the Java Agent there.

Unfortunately, we can’t just take and use JaCoCo. By default, it collects one Coverage for all tests. Having received one file, we will not be able to understand which test calls this or that line. We’ll have to change this behaviour a bit. To solve this, let’s go and see how JaCoCo methods are called in UI tests.

The test framework provides an InstrumentationRunListener that allows us to track the start and end of a test. In addition, it has a method that is called when all tests are completed. It is used by JaCoCo.

Since JaCoCo is already integrated into the test framework, one of the standard listeners (which is CoverageListener) is responsible for collecting Coverage. All that’s left to do is dig into it and see how it does its job. The essence of the method is given here:

class CoverageListener : InstrumentationRunListener() 

private val EMMA_RUNTIME_CLASS = "com.vladium.emma.rt.RT"

override fun instrumentationRunFinished(
streamResult: PrintStream?,
resultBundle: Bundle?,
junitResults: Result?
)
val coverageFile = File("coverageFilePath")
try
val classLoader = instrumentation.targetContext.classLoader
val emmaRTClass = Class.forName(
EMMA_RUNTIME_CLASS,
true,
classLoader
)
val dumpCoverageMethod = emmaRTClass.getMethod(
"dumpCoverageData",
coverageFile.javaClass,
Boolean::class.javaPrimitiveType,
Boolean::class.javaPrimitiveType
)
dumpCoverageMethod.invoke(null, coverageFile, false, false)
catch (ex: Exception)
reportEmmaError("Is Emma/JaCoCo jar on classpath?", ex)


For API calls JaCoCo uses reflection. The reason why it is used is not quite clear to me. There is the following comment in the source code:

Uses reflection to call emma dump coverage method, to avoid always statically compiling against the JaCoCo jar. The test infrastructure should make sure the JaCoCo library is available when collecting the coverage data.

Most likely, if we bind with jar of JaCoCo itself, it will change itself during the compilation process and will not work. In any case, it’s more than enough to create your own Listener with blackjack and — well, you understand.

open class CollectCoverageRunListener : InstrumentationRunListener() 

private val baseDir = File("/sdcard/tia")

override fun testFinished(description: Description) = with(description)
val allureId = getAnnotation(AllureId::class.java)?.value.orEmpty()
val fileName = "$className$methodName#$allureId.exec"
val resultCoverageFile = baseDir.resolve(fileName)

val emmaRTClass = Class.forName(EMMA_RUNTIME_CLASS, true, classLoader)
val dumpCoverageMethod = emmaRTClass.getMethod(
"dumpCoverageData",
resultCoverageFile.javaClass,
Boolean::class.javaPrimitiveType,
Boolean::class.javaPrimitiveType
)
dumpCoverageMethod.invoke(null, resultCoverageFile, false, false)
super.testFinished(description)

Please note that in the file name we put “allure id” of the test. This is necessary in order to understand which test this or that Coverage belongs to. Instead of “allure id”, you can use the method name or any other identifier you like. It remains pass our Listener to the runner. This is very convenient, as in all libs from Google:

class TestImpactAnalysisCollectorAllureRunner : AllureAndroidJUnitRunner() 

override fun onCreate(arguments: Bundle)
val listenerArg = listOfNotNull(
arguments.getCharSequence("listener"),
CollectCoverageRunListener::class.java.name
).joinToString(separator = ",")
arguments.putCharSequence("listener", listenerArg)
super.onCreate(arguments)

All that remains is to extract the necessary files using Adb. Having received Coverage files in exec format, we can get information about which lines of code this or that test calls. All this information is stored in the JaCoCo format, which is inconvenient for processing. We need to pull out the necessary information and move it to the format in which it would be convenient to work further.

To get data, you can use the API of JaCoCo itself. It can be used without any hassle with reflection because we have collected Coverage.

fun loadCoverage(execFile: File, sourcesDir: File): List<ImpactedFile> 
val loader = ExecFileLoader().also it.load(execFile)
val result = mutableListOf<ImpactedFile>()
Analyzer(loader.executionDataStore) classCoverage ->
val coveredLines = classCoverage.coveredLines
// handleCoveredLines(coveredLines)
.analyzeAll(sourcesDir)
return result

We need to send to the API not only exec file but also a folder with compiled source files. We can get the needed directory from Gradle. The information about which test it was we get from the name of the Coverage itself. It looks pretty simple, but there is a problem.

Let’s say we are creating functionality as Gradle-plugin, which we are attaching to the specific module. We can get source files of this module, and hence the information from exec, which will apply only to this module. But we need the information about coverage and all depending sources as well. Otherwise, we will not be able to understand which tests we need to run if, for example, we change files from the depending modules (such as “common” or any modules with domain logic).

We need to get source files not only of the module we attached our plugin to, but also all dependent modules. And send all this to the API to get data from the exec file.

Нужно достать сорцы не только того модуля, в который мы подключаем наш плагин, но и всех зависимых модулей. И все это отправлять в API для получения данных из exec-файла.

fun computeImpactedFiles(
execFile: File,
sourcesDir: File,
dependenciesSourcesDirs:List<File>
): List<ImpactedFile>
return (dependenciesSourcesDirs + sourcesDir)
.flatMap sourcesDir -> loadCoverage(execFile, sourcesDir)

Once we got the needed data we can convert it to any convenient format, for example JSON. It is the easiest to work with and easy to debug. We save this JSON to the file. As a result, we will have a file for each project module that contains tests. Information about dependent modules will also be in this file.

Next, we merge all the files into one big JSON, removing duplicates. We are merging in order not to worry about building a dependency graph and keeping in mind which exact JSON file we need.

Next, we need to solve the problem of where to store the calculated information about the Coverage. We often use our s3 for this. But here the choice comes from the limitations that exist in the company.

Let’s move on to the most interesting. We have a freshly made Coverage, and we need to apply it to the PR.

So, we have a JSON file in which, by class and line of code we can find a test or list of tests which are running this line of code. This means that the first thing we need is to get the changed lines.

The simplest way to do this is through Git integration. There are two options: use the JGit library or implement it yourself via a CLI call. JGit is considered the official solution, but the documentation leaves much to be desired, and in the usual CLI you have to deal with Regex. Here everyone chooses for himself.

Regardless of whether you are using JGit or a custom implementation, there can be a problem. When trying to get a diff, Git by default shows a lot of context for each changed line. Here is an example:

--- a/android/samples/android-lib/src/main/java/ru/tinkoff/moba/sample/library/FeatureInteractor.kt
+++ b/android/samples/android-lib/src/main/java/ru/tinkoff/moba/sample/library/FeatureInteractor.kt
@@ -5,7 +5,7 @@ import ru.tinkoff.moba.sample.kotlin.KotlinInteractor
class FeatureInteractor {
fun compute(): Int
- return 4
+ return 2 + 2

fun libCompute(): Int {

We made a change to one line, and diff gave a bunch of others. In this case, it is difficult to find the number of the modified line. In the example above, line 8 has been changed, and it’s not entirely obvious how to get 8 from the given numbers. You can create a math solution based on the numbers in the header, or you can simplify the task.

Git has a — unified=number setting that specifies how many lines, besides modified ones, should be shown. Thus, we show in diff only those lines that have been changed, that is, without any context. Here is the same example but with — unified=0:

--- a/android/samples/android-lib/src/main/java/ru/tinkoff/moba/sample/library/FeatureInteractor.kt
+++ b/android/samples/android-lib/src/main/java/ru/tinkoff/moba/sample/library/FeatureInteractor.kt
@@ -8 +8 @@ class FeatureInteractor {
- return 4
+ return 2 + 2

This “diff” is much easier to analyse, and we immediately get the number of the changed line. We are looking for matches in it, first by class (file name), then by lines. If we find matches, we get the specified tests and add them to a separate file. All you need to do afterwards is to run all tests from this file. If the file is empty, then the changes made do not affect the tests in any way, so there is no point in running them.

This is how we solved the pain point and reduced the UI test runs from 6 hours to 30 minutes on average. The median value of the number of tests run is about 100. This is 50 times less than a full run.

Such a system has a huge number of potential improvements. You can optimise the system to get Coverage data using a binary format instead of JSON. And you can add exclude by class so as not to take into account any analytics in Сoverage, you can configure the depth of analysis by dependencies, and a whole lot more.

There are two problems with this solution:

1️⃣ If you make changes to the base component, all tests will run. On the one hand, this is good because changes in core modules should lead to regression, on the other hand, we are returning to the beginning of this article. The solution to this problem draws on a separate article, so this one may have a second part.

2️⃣ Coverage is updated once a day. This means that if the developer has time to write new functionality, open tests, and then change them, then the impact analysis will not load new tests, because there is no new code in Coverage at all. Of course, developers rarely write tests so quickly, but the problem can be solved by cleverly updating Coverage, for example, when merging to the “master” branch.

The concept allows you to improve it in the direction that a particular project needs. As a result, we get a system where developers will definitely not forget to run the necessary tests and will be forced to fix them right away.

The functionality works more accurately than the developers who are sure which tests to run after the changes.

Next Post

Gen Z Marketing: What You Need to Know (Charts Included)

Do you understand who Gen Z is and the impact they have on your marketing? Have you noticed that different generations behave in distinctive ways? While their behavior may vary slightly, yet the impact on your business can be significant especially as it relates to your marketing. As Gen Z […]
Gen Z Marketing: What You Need to Know (Charts Included)