Reboot Yoga

I’m in the Agoda now, and we, Mobile team, write and run plenty of tests here. Some tests require real device to be executed, so we have pool of them. They are integrated to CI and connected to build agents by the OpenSTF system (you can read a great article by Anton Malinskiy about how we did it). In current configuration any test run uses about 50 devices in parallel.

The using devices CI infrastructure has lots of components (device itself, USB hub, USB controller and kernel driver, adb daemon, components of OpenSTF, Kubernetes as a deployment environment etc.), and an unlucky correlation of bugs, corner cases and architectural weaknesses lead time to time to “disappearance” of device. Reboot of the device is rough but effective way to restore the connection: device, adb and sate in OpenSTF come together to initial clean state. So every morning was started for me by simple yoga: if I see device without tests - reboot it. It was wasting my time, so I decided to automate it.

Let’s think about how it can be implemented. Reboot is most valuable when we already lost the control to the device. Therefore the solution will be application on the device to be able to reboot in the worst case. Next, the main customer of the device is the build agent, and it does not matter what reason it can not access the device. There is no active way to check availability of the device, but there is a passive one: something can ping device from the build agent, and the disappearance of ping for a long time will mean the disappearance of the connection, that is reason to reboot.

Diagramm 1

Looks like generic watchdog, nothing more. So, let’s implement.

Easiest way to implement ping is to send broadcast intents from the build agent by the adb command. It is safe because intents do not interfere with the execution of other applications, and will be ignored if there is no recipient application on the device. Here’s a simple script I run from crontab:

Receiving intent and working with the timer strictly according to the documentation:

Each received ping causes the timer to restart. The absence of ping after a specified time will trigger the reboot.

Most interesting part of this story is reboot.

Android allows create business and embedded applications. Such applications may require additional permissions that are not available for common applications. It is described in the chapter “Build for Enterprise”. We are now interested in the app category “Device Policy Controller” and the role “Device owner”. Device owner can reboot the device, this feature appeared in API 24 due to the growing up of Android based built-in solutions.

Despite all the variety of categories and roles, control applications are technically quite uniformly. In order for the application to obtain rights, it must declare Broadcast Receiver in the manifest, protected by the right android.permission.BIND_DEVICE_ADMIN. The documentation recommends using the `DeviceAdminReceiver class to create this receiver, it parses the intents and calls the corresponding callbacks. Out of curiosity, you can add an intent filter and watch the events that come, but in our task we will not use them.

Also this receiver should have a metadata file with a list of additional permissions. This list is Device management policy and used by Device administrator, it is for another kind of management app. In our case this list will be empty, but the file must be present in any case. Path to file is res/xml/device_admin.xml, and link to this file you see in the previous code snippet.

There are few ways to grand the Device owner permission to the app, easiest for us is to do this using the dpm utility:

During the execution of this command, there should not be any registered account on the device.

We solved the problem: after the application became the Device owner, it can reboot the device:

Note the variable admin. As I said, the name of the receiver’s component is needed to use the Device owner permission.

To test the application’s ability to reboot the device, you can send an Intent directly to RebootReceiver. If the device reboots, then everything works correctly.

The application that received the Device owner permissions can be updated, but can not be deleted. If you going to delete this application someday, be sure that this application is testing (launched from the Studio, or explicitly specified the flag testOnly in the manifest). In that case you can revoke the Device owner permission by the dpm utility, and then uninstall the app. Documentation tells that all user accounts will be deleted with the app automatically, so be ready for that (but I seen that it’s not always true).

Thanks to this simple tool, in our testing pool the devices no longer disappear. Even if the reboot is due to a pause in test runs, it does not cause problems.

Links