Professional Documents
Culture Documents
EB Anrs Android Framework
EB Anrs Android Framework
embrace.io
Executive
summary
We recently released an eBook about why the complex mobile ecosystem – and
the Android mobile ecosystem in particular – makes it so difficult to identify,
prioritize, and solve Application Not Responding (ANRs) errors. We explored the
causes and impact to your app and business. In this eBook, we’ll dive deeper
into the technical causes of ANRs so your team can rapidly identify and solve
them.
You likely know that an ANR is “officially” triggered when the main thread has
been blocked for at least 5 seconds. But, do you know what the source code
actually says about ANRs for specific Android components such as activities,
services, and broadcast receivers? (Hint: It’s not always 5 seconds!) Do you
know exactly how the Google Play Console collects and reports ANR data
compared to Firebase Crashlytics, and why it matters during troubleshooting?
2
How the Android
framework
measures ANRs
Quick recap: Why ANRs matter
The Google Play Store is extremely particular about user experience. It sets a strict threshold requiring that a
maximum of only 0.47% of daily active users can experience an ANR. (They’ve also recently added a second bad
behavior threshold, where only 8% of daily active users on a single device model can experience an ANR.) If you
exceed either of these thresholds, you can expect:
• Negative reviews, which impact your business by making user acquisition increasingly difficult.
• User churn, resulting from frustration associated with a frozen app experience.
The consequences of exceeding Google’s ANR threshold can cascade throughout your whole organization.
When you have negative reviews, don’t show up in search results, or provide a poor user experience, your user
engagement and revenue are impacted. You’ll have fewer purchases (both in-app or at checkout, in the case of
e-commerce apps). Your users will engage less as they experience frustration. And, you’ll have fewer installs which
can taint the rest of your brand — a compounding issue that can slow momentum for other apps in your company’s
portfolio.
Here’s a quick snapshot of how high ANR rates can impact your business:
Lower Google Play Store ranking Lower visibility and fewer organic installs
Negative app store reviews Poor brand perception and impact on your other apps
According to the Android documentation, Google will report an ANR with an associated stack trace if the main
thread is blocked for 5 seconds*. We know it can be helpful to see examples from the real world, so here’s an
example of what an ANR and crash look like in an e-commerce app.
4
E-commerce app crash flow
A user goes to the checkout screen and the app crashes.
$19
Buy Checkout
THE CAUSE: A bug in the checkout code causes the crash due to an uncaught exception or a C signal.
THE RESULT: Google provides a stack trace, which makes it easy to identify the cause of the crash and
address the issue.
$19 $19
THE CAUSE: The ANR could result from many possible root causes, including:
THE RESULT: At the end of 5 seconds of non-responsiveness, Google will provide you with a stack trace
to indicate an ANR. Unfortunately, that stack trace alone is rarely sufficient to get at the root cause of the
ANR.
* While Android documentation states an ANR is triggered after 5 seconds, Embrace engineers have found exceptions to this “rule” within Google’s own
source code. We’ll explain those exceptions and bust the myth around the 5-second ANR trigger in the following section.
5
How Google reports ANRs
The Android operating system monitors your main thread from a background thread. Once it detects that the main
thread is blocked, it starts an ANR timer on the background thread. According to the Android documentation, if that
timer exceeds a particular time threshold – 5 seconds – the Android OS triggers an ANR.
Because the Android OS treats ANRs the same way it treats crashes, it generates a single stack trace at the time
the ANR is triggered. Unfortunately, this means you don’t have insight into what has been happening from the
moment the ANR begins. We’ll dive further into why this is an issue later in this eBook.
Here are some more insights into the differences between the two main tools Google uses to provide ANR data:
Firebase Crashlytics and the Google Play Console.
Records the “exit reason” for a Utilizes Android OS’s built-in trace file
Type of data collected process on the device each time an mechanism and records on the device’s file
ANR occurs system
Android versions
Android 11+ All Android versions
supported
6
The truth
about ANRs
and the limitations of Google Play
Console and Crashlytics
What really triggers ANRs in Android
and why it matters for you
If we gave you a pop quiz and asked, “what triggers ANRs?”,
you’d technically be correct if you answered, “when the main
thread has been blocked for 5 seconds.” After all, it’s what the
Android documentation says. INSTEAD OF A UNIVERSAL
5-SECOND ANR TRIGGER,
But at Embrace, we believe in a healthy skepticism. We are
THERE ARE SEPARATE CHECKS
driven by a deep curiosity and a desire to help our customers
AND SEPARATE TIMING
optimize their user experience. So, our engineers went beyond
THRESHOLDS FOR EACH
the documentation and closely studied the Android source
ANDROID COMPONENT.
code. They found that what triggers ANRs in the Android
environment is far more nuanced than what the documentation
says.
In fact, there are specific components and situations that are quite different from the “universal” 5-second rule.
That’s why we created this eBook – so you’ll have an easy reference manual and can understand the important
nuances.
As a quick refresher, you know there are four main components in Android: activities, services, broadcast receivers,
and content providers. Every time one of these components performs work on the main thread, the Android
framework creates a timer on a monitor thread. Instead of a universal 5-second ANR trigger, there are separate
checks and separate timing thresholds for each component. Knowing the real timing will help you optimize how you
code your apps, and how best to understand what might really be causing an ANR.
To help, let’s dive into a few of these components to see how the Android source code helped us bust the myth of
the 5-second ANR trigger.
When do activities trigger an ANR? Activities trigger an ANR if input dispatching takes more than 5 seconds. The
Android OS interprets actions like touches, entries, or taps on the screen or keyboard as an “input event.” The OS
places these input events onto an input dispatch queue and processes them on the main thread, where a watchdog
thread checks whether the processing takes more than 5 seconds. If the main thread is blocked or busy, the input
dispatch queue will not empty within 5 seconds. At this point, the watchdog thread will trigger an ANR.
MYTH
There is a 5-second threshold for an ANR to be
triggered for activities.
REALITY
There is actually a 5-second threshold for ANRs to be
triggered, but the timer only starts after an input event
has been dispatched!
8
Busting the myths about services &
ANRs
Recap: A service is a component on Android which does not provide
a user interface, and typically performs long-running operations in
the background. This can include a service that plays music in the
background while the user is in a different app, tracks location or
performance via a consistency check on the database at regular
intervals, or fetches data over the network without blocking user What happens
interaction with an activity. when the Android
When do services trigger an ANR? There are two conditions when framework
an ANR can be triggered in a service. If a foreground service does
not call startForeground in under 10 seconds, the Android OS will
detects an ANR in
trigger an ANR. As a reminder, a foreground service is something any component?
which is typically performing work in the foreground, like music
playback. If the service is in the background and doesn’t start
or bind in under 20 seconds, it will trigger an ANR. This longer The Android framework
threshold is because it doesn’t have as high a priority on the CPU. schedules work in the
We believe the thresholds are higher than 5 seconds for services component, and then
because users aren’t interacting with services the same way they starts a timer to continually
are with activities. Unlike the activities component, there’s no user check whether the work is
input event required for the services component. completed within the allotted
time period.
9
When do broadcast receivers trigger an ANR? Broadcast receivers
trigger an ANR if they take longer than 10 seconds to process a
message. However, there is an interesting exception to this: when
the Android OS is booting, a lot of CPU work is underway so early
broadcasts could be false positives.
This can make ANR error reports highly misleading when viewed
in isolation. Typical Android apps can have dozens of components
running at the same time. This means debugging the true cause
of ANRs in the real world is often even more difficult than in our
relatively simple example above. Stack traces provided by Google
aren’t helpful because they don’t let you see what is happening
from the moment the ANR begins and throughout the duration of
the ANR.
10
ANR trace file
generated
User input
event
As a visual example, picture the components running on the main thread as a series of cars driving on a road.
Google’s ANR stack trace would highlight the last blue car as the cause of the ANR. In reality, the freeze started
during the first red car. Collecting stack traces as soon as the app freezes is crucial for getting to the actual root
cause.
11
How Embrace makes
solving ANRs faster
and easier
An alternative solution for ANRs
Embrace is a data-driven toolset to help mobile engineers build better experiences. Because of this mobile-centric
approach, we have a very different method of data collection.
Unlike event-based monitoring solutions which limit data collection and can only help you solve known issues,
Embrace collects 100% of the data from every user session to provide capabilities that were previously impossible.
With full visibility across every user experience (including both foreground and background sessions), Embrace
enables you to see complete technical and behavioral data, giving you the context to solve both known and
unknown issues.
To accurately triage and resolve ANRs for good, it’s important to understand what code was running from the
moment the ANR is triggered to the end of the ANR interval. With Embrace intelligent ANR Reporting, teams can
auto-capture and surface a stack trace as soon as the main thread is blocked for 1 second, followed by auto-
collecting main thread stack traces every 100ms until the app recovers, force quits, or the ANR dialog appears. By
capturing these additional stack traces engineers can gain deep insights into the code execution and its evolution
throughout the ANR interval to get to the true root cause, all without introducing any unnecessary overhead. This
level of detail empowers engineers to quickly spot and resolve both fatal and non-fatal ANRs and ultimately drive
better user experiences.
13
02 Embrace filters out noisy ANRs with intelligent grouping
Finding which stack traces to focus on can also be challenging when you’re basing your decision solely on the
final stack trace. Oftentimes, the stack traces exhibit enough differences that identifying the root cause becomes
challenging. Because Embrace auto-collects stack traces throughout the entire ANR interval, teams gain access
to a broader range of stack trace samples, providing deeper insights into the underlying cause. Teams can filter
sample stack traces by “Most Representative”, “First Sample” or “Ad SDK” to help visualize the data in different
ways and quickly surface the right ANRs. When selecting “Most Representative” Embrace will scan and analyze
sample stack traces for you and group them by the most relevant method to identify the code sections likely
contributing to the ANR. Embrace then ranks them by volume, sessions impacted, or users impacted and maps
issues to a category (Ads or Concurrency) to help you cut out the noise and focus on what matters most for you and
your team.
14
Selecting a method will take you to the ANR Method Troubleshooting graph. This allows you to see all the code
paths that lead to the selected problematic method ( “callers”) and the code paths following the method (“callees”)
to help you nail down the line of code that may be contributing to the ANR.
15
04 Embrace highlights what contributed to the
ANR with complete out-of-the-box user session context
Capturing multiple stack traces may not be enough to pinpoint every ANR root cause. ANRs can also stem
from unpredictable factors in the device and from the user that you may need to take into account when
troubleshooting. Factors like failed network calls, low connectivity, heavy view rendering and more can lead to
an ANR. Embrace gives you the ability to easily pivot from a high priority sample stack trace in the flame graph or
summary view directly into sample affected user sessions (‘Sample Sessions’) to help you understand the ANR in
more depth.
With User Session Insights, Embrace collects all behavioral and technical user activity leading up to the ANR,
completely out-of-the-box, so engineers can quickly and accurately reproduce the ANR and understand how other
factors may be contributing to it. All this without wasting time cobbling together log, ANR, and product analytics
data. If you find that low connectivity, heavy view rendering, and bad code all contributed to the ANR, you can
quickly share session details with the right teams using out-of-the-box Jira and Slack integrations.
16
05 Embrace identifies patterns and trends in real-time with
high-level ANR overview and proactive alerting
Embrace continuously analyzes millions of data points across your applications to help you proactively spot critical
ANRs before your users do. Our real-time alerting can help you separate important ANRs from the noise with
context rich alerting that identifies spikes and drops for critical ANR indicators. You can set up alerts for important
user flows, like payment flows, add to cart, or during paid advertisements to optimize user experiences for
revenue-generating moments.
With ANR Summary, anyone can get an out-of-the-box overview of critical ANR metrics like ANR-Free Sessions and
ANR-Free Users, Total ANRs, Affected Users and more in a single view. Teams can surface patterns and anomalies
by version and deployment across your user base.
17
Closing
thoughts
ANRs aren’t just annoying for users to encounter and for developers to debug.
They’re critical errors that have an outsized effect on user engagement,
acquisition, and retention, and can ultimately drag down your bottom line.
In this eBook, we’ve taken you beyond the Android documentation, busted the
myth of the universal 5-second ANR trigger, and provided key insights to help
you stay above Google’s stringent bad behavior thresholds.
While we hope this eBook serves as a valuable reference guide for future ANR
debugging, Embrace can provide even more support through the use of our
platform.
From providing flame graphs that help you quickly get at the root cause of an
ANR, to the ability to stitch multiple sessions together for deeper insight and
analysis, Embrace is the best option for Android developers who care about
providing superior mobile experiences.
Learn how Embrace can help you get a handle on ANRs and optimize your app
for greater visibility today.