Reboot Yoga
I’m in the Agoda now, and we, Mobile team, write and run plenty of tests here. Some tests require real device to be executed, so we have pool of them. They are integrated to CI and connected to build agents by the OpenSTF system (you can read a great article by Anton Malinskiy about how we did it). In current configuration any test run uses about 50 devices in parallel.
The using devices CI infrastructure has lots of components (device itself, USB hub, USB controller and
kernel driver, adb
daemon, components of OpenSTF
, Kubernetes
as a
deployment environment etc.), and an unlucky correlation of bugs, corner cases and architectural weaknesses lead
time to time to “disappearance” of device. Reboot of the device is rough but effective way to restore the connection:
device, adb
and sate in OpenSTF
come together to initial clean state. So every morning was started for me by simple
yoga: if I see device without tests - reboot it. It was wasting my time, so I decided to automate it.
Let’s think about how it can be implemented. Reboot is most valuable when we already lost the control to the device. Therefore the solution will be application on the device to be able to reboot in the worst case. Next, the main customer of the device is the build agent, and it does not matter what reason it can not access the device. There is no active way to check availability of the device, but there is a passive one: something can ping device from the build agent, and the disappearance of ping for a long time will mean the disappearance of the connection, that is reason to reboot.
Looks like generic watchdog, nothing more. So, let’s implement.
Easiest way to implement ping is to send broadcast intents from the build agent by the adb command. It is safe because intents do not interfere with the execution of other applications, and will be ignored if there is no recipient application on the device. Here’s a simple script I run from crontab:
Receiving intent and working with the timer strictly according to the documentation:
Each received ping causes the timer to restart. The absence of ping after a specified time will trigger the reboot.
Most interesting part of this story is reboot.
Android allows create business and embedded applications. Such applications may require additional permissions that are not available for common applications. It is described in the chapter “Build for Enterprise”. We are now interested in the app category “Device Policy Controller” and the role “Device owner”. Device owner can reboot the device, this feature appeared in API 24 due to the growing up of Android based built-in solutions.
Despite all the variety of categories and roles, control applications are technically quite uniformly. In order
for the application to obtain rights, it must declare Broadcast Receiver in the manifest, protected
by the right android.permission.BIND_DEVICE_ADMIN
.
The documentation recommends using the `DeviceAdminReceiver
class to create this receiver, it parses the intents and calls the corresponding callbacks. Out of curiosity,
you can add an intent filter and watch the events that come, but in our task we will not use them.
Also this receiver should have a metadata file with a list of additional permissions. This list is
Device management policy and used by
Device administrator, it is for another kind of management app. In our case this
list will be empty, but the file must be present in any case. Path to file is res/xml/device_admin.xml
,
and link to this file you see in the previous code snippet.
There are few ways to grand the Device owner permission to the app, easiest for us is to do this using the dpm utility:
During the execution of this command, there should not be any registered account on the device.
We solved the problem: after the application became the Device owner, it can reboot the device:
Note the variable admin
. As I said, the name of the receiver’s component is needed to use
the Device owner permission.
To test the application’s ability to reboot the device, you can send an Intent directly to
RebootReceiver
. If the device reboots, then everything works correctly.
The application that received the Device owner permissions can be updated, but can not be deleted.
If you going to delete this application someday, be sure that this application is testing
(launched from the Studio, or explicitly specified
the flag testOnly in the manifest). In that case you can revoke the Device
owner permission by the dpm
utility, and then uninstall the app. Documentation tells that all
user accounts will be deleted with the app automatically, so be ready for that (but I seen that it’s not always true).
Thanks to this simple tool, in our testing pool the devices no longer disappear. Even if the reboot is due to a pause in test runs, it does not cause problems.