Working with A/B tests in the devtodev interface
Last updated
Last updated
A/B testing is the best way to challenge your hypotheses. A/B testing is essentially an experiment where you show your users different variants of the app at random and then analyze the results to determine which variation performed better.
In devtodev, you can work with A/B testing in the ‘A/B Testing’ section on the app level. All tests are stored in one table. Besides basic information about each test, you can see its status. There are five types of status:
Draft – draft of an unexecuted test.
Stopped – the test was stopped before completion. It’s not possible to restart the test. If you want to restart it, make a copy of the test and launch it.
In progress – the test is currently in progress.
Finished: No winner – test results determined that there was no winner.
Finished: Success – test results determined a winner group.
Before creating a test in the devtodev interface, you need to set variables through the SDK and use the methods for launching A/B tests. Use certain classes to set the variables and their default values. In case the variables are involved with the test, their values will vary depending on the group defined by the server. If the app will be offline and won’t be able to present the test to the user, he will see default values and the app will continue to function correctly.
Here you can find more information about SDK configuration (about setting variables and methods for launching A/B tests).
To go to the test creation wizard, open the desired devtodev project, navigate to the ‘A/B Testing’ tab and click the ‘+ Add new A/B test’ button.
You have opened the segment creation wizard that consists of five steps:
Enter a unique name of the test and its description. Try to make the description of the test as detailed as possible: its hypothesis, audience, description of the test groups, target metric, desired outcome, etc. Or simply insert a link to the test description. To learn more about test planning, open this link.
In this section, you need to create test assignment rules and define the audience size. Use the ‘Filter your audience’ and ‘Triggering event’ sections to set the assignment rules.
Filter your audience – use this option to define user properties that are needed to be included into the test. The devtodev SDK which is integrated into the application, will use the filters to select the audience whose current properties match the test requirements.
In this example, all paying users will participate in the test.
If you use several filters at once, only users or devices (this depends on the selected user identification method) that meet all conditions will participate in the test.
Set a triggering event if you want your test to include only the users who performed a certain event.
In this example, the devtodev SDK will include the user or device (this depends on the selected user identification method) in the test when the SDK receives information that the said user or device reached the fourth level.
To each trigger event, you can add more parameters that are related to the event. Events have different lists of additional parameters (see Events). Please note that you can’t use an event that you sent to devtodev via API as a trigger event.
The selected filters and trigger events cannot be altered after the start of the experiment.
The filters and trigger events become available for audience configuration after at least one event/property is received via the SDK, processed and accounted for in the devtodev analytics.
When applying both filters and trigger events, all conditions have to be met for the user/device to be included into the test.
Audience fraction – use this option to define the percentage or an absolute number of users out of those selected by the filter and/or completed the trigger event who will participate in the test. If the initial audience size is not enough for drawing firm conclusions, you can change it even after the test begins.
Please note that one user can participate in only one test at a time.
If you need to run several tests in parallel, then:
You need to create non-overlapping test audiences
If your audience overlaps, you need to configure the audience fraction so that part of the audience will be included into each test (e.g. 50% of the overlapping audience gets included into each of the two tests).
If you know the number of users that you need to achieve a statistically significant result, insert it in the ‘Max number of observation’ cell.
In this example, 100% of users who completed more than three levels at the time of a sign up will be included in the test.
A user can be excluded from the test only in two cases:
The test time is up.
The test is stopped.
In this section, you can define the goal of your A/B test – metrics for analysis, criteria for stopping the experiment for a group, and the duration of the experiment.
You can set up one ‘Primary metric’ and no more than five ‘Secondary metrics’ for each test. The ‘Primary metric’ is used to assess the test result and to calculate statistical significance of the obtained result. ‘Secondary metrics’ are optional and do not take part in the final result assessment. However, they can improve the quality of the analysis and prove that the implemented changes did not influence other key metrics of the app.
For example, you can select one of the following as a secondary metric:
One of the fundamental metrics (ARPU, Paying conversion, Day-N retention, etc.)
User engagement with an event.
Average number of times an event was completed (per user).
Below, you can set the ‘Estimated experiment duration’ (days). The test will be stopped after the set number of days. You can also automatically stop the test execution in case the winning group is defined – simply check the ‘Stop the experiment when there is a winning group’ box.
If you don’t see any sense in continuing the test or you want to change the group settings and restart it, you are free to change the duration of the test or even stop it anytime during its course.
To calculate the size of the test audience, use any of the Size Calculators. To use them, first estimate the current value of the Primary metric and then define the result that you expect to get from the tested changes. In addition, you can set up several user experience funnels and display their results in the test report. This will give you additional information about how successful the test has been.
The main goal of the test above is to improve conversion to payment. However you can use the same method to test the conversion to trial or to ARPU.
One of the most crucial steps is setting up test groups and variables. You create a config containing various groups and their parameters. After the devtodev SDK reports that the user has been successfully included in the experiment, one of the groups becomes available in the app via the SDK.
By default, there are two groups available to you: Group A and a control group. You can increase the number of groups to maximum 4 in a single test by clicking the ‘+Add group’ button.
The control group usually includes users as they currently are. This way, you can test other variants to see the change in their metrics, relative to the same metrics at the moment. For each group you need to define a set of variables that is composed of the name of the variable and its value. The variables have to be defined inside of your app – they are supposed to grant your users different experiences.
In the above example, you can see three groups: the control group (it has default parameters) and two more groups that have other parameters for the button_color and button_size variables. The test will be focused on defining the most favorable size and color of the button. If one of the groups wins, it may lead to the change of interface for all users of the app.
When the app is launching, the SDK defines the test that the user will participate in. Then he is randomly assigned to a test group and the SDK applies all the variables defined for this group.
To make sure that all the test groups are set up correctly and that the app is handling the selected variables the right way, we highly recommend you to test the current test settings. In this section, you can check how the A/B test configuration runs on test devices, manually determine relevant groups for them and also check how the design and app behavior change in different groups.
Click ‘+Add test device’ to add a test device using an Advertising ID / User ID / devtodev ID or select it from the list of test devices in the current devtodev Space. After that, select a user group that you want to test on and click ‘Start checking’.
The settings of the selected group will be applied to the selected test device. From this moment on, the test device will start the test for the selected group and they will not wait for meeting the entry conditions that you have specified above.
Test devices do not save the information about the active test or the group. After you successfully finish testing one group, select the next one for the same test device and click ‘Restart checking’.
To be able to access the active test and its group at a test device after restarting the app, use the DTDRemoteConfig.сacheTestExperiment()
method before initializing the SDK.
After you check all group settings on the test devices, you can launch the test for the entire selected audience or save it as a draft.
A maximum of 8 A/B tests can be run simultaneously in one project.
The test can have several outcomes. Let's look at them in more detail.
Force stop and test delete
It may so happen that you’ve launched an incorrectly configured test and now you need to stop it. To do this, you can select the required test by clicking on it in the list of experiments.
To stop the test (full stop, no chance to resume) – open the A/B test report and click on the edit icon in the upper right corner. The test editing wizard will open. Click on the Stop Test button at the bottom of the wizard page.
To remove the A/B test from the list – open the test editing wizard. At the bottom of the wizard page, click the Delete Test button. Please note that you can delete only the tests that were stopped or completed. The created A/B tests stay in the project until the user deletes them.
The SDK updates the A/B test config only during initialization. If the deleted experiment was previously activated on any device, then when the SDK is initialized, it will be available for activation. After the config gets updated, the SDK will remove this test from the database but will not report it via external interfaces.
This is intended to avoid changing the app settings that were received from the experiment config at the beginning of the session (e.g, the UI that the user is currently interacting with). The next time the app starts, the test will be deleted during SDK initialization.
Do not update the app interface and behavior when the user interacts with it. Do not use the network to receive default parameters. It is better to define them in the app
Test completion using the specified criteria
If you check the ‘Stop the experiment when there is a winning group’ box at the third step of the A/B test creation process, the test will automatically stop if ‘Probability to be best’ of one of the groups is larger than 95%. This metric (Probability to be best) can be considered to be a Bayesian approach. This value is auto-calculated for each group based on the selected Primary metric. If you want to stop the test when reaching a higher number (e.g. 99%), you can change the test duration and continue with its execution until you reach the desired outcome.
The test can finish when it reaches the end of the time period specified at the ‘Goals’ step in the ‘Estimated experiment duration → duration days’ section which is responsible for the test duration. For example, you set ‘duration days’ as 3. This means that the entire audience has only three days since the test creation to be included into the test and participate in it. When the app is launched and the SDK is initialized, it will compare the current time with the end time of the test. If the experiment time is up, the SDK will erase the test from the database and the users will not be able to receive any data. If the test time has run out when the app has been used, the SDK will not respond.
This is intended to avoid changing the app settings that were received from the experiment config at the beginning of the session (e.g, the UI that the user is currently interacting with). The next time the app starts, the test will be deleted as described above.
During the test execution, a report on the test results will be built and updated in real time. In the upper block, you can find all the basic information about the current state:
Current test status
Name of the group with the highest ‘Probability to be best’ and its value
Number of groups, total number of users in the test, and number of users in each group
Experiment time frame
Below you can see a graph. Its horizontal axis represents calendar days starting from the test start date, while the vertical axis represents the value of the selected metric for each of the groups. You can select the displayed metric in the top left corner of the graph. These are:
The Primary and Secondary metrics selected in the wizard
Probability to be best
A/B test audience
Underneath you can find a table with aggregated values for each of the metrics in each test group and their fluctuations relative to the control group. The ‘Probability to be best’ is also calculated for each metric including the Secondary. This way you can make sure that all the tested changes do not influence other metrics in a negative way. After that you can see the funnels configured in the A/B test creation wizard. They contain data on the number of users at each funnel stage, conversion rate from one stage to another, and the ‘Probability to be best’ for conversion from the first to the last stage.
When a user gets included into a test group, he is automatically marked with the ID of this group.
If you want to drill down even more and understand a subtle difference in metrics and behavior of the users, you can use these user groups as filters in any devtodev reports. Simply go to the report filters, open the ‘Segments’ tab, and select the required segment.
A/B test segment values are saved as a separate user property (‘A/B test groups’) in the user card at the moment the user gets included into an A/B test group.
The user can not enroll into two A/B tests at the same time. He can be included into one test and after it completes, be included into another. In this case, the ‘ A/B test groups’ field will contain names of the two groups.