Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Startups

Augmented Reality in Reality

T
Haibin Ling he study of augmented reality (AR) can This rise in AR use stems from four main
HiScene be traced back to Ivan Sutherland’s inven- developments:
tion of the head-mounted display (HMD)
in the 1960s,1,2 though it wasn’t until 1990 ■■ the pervasiveness of low-cost visual sen-
that Tom Caudell, a former Boeing researcher, sors, such as phone cameras, which created
coined the term.3 Furthermore, for many years, the foundation for AR consumerism;
AR was confined to academia and sci-fi films. ■■ progress in environmental perception
Early AR systems were mostly experimental algorithms, such as visual simultaneous
and focused on specific tasks, such as main- localization and mapping (SLAM), which
tenance and repair.4 More recently, AR has has provided key guidance for fusing vir-
started to successfully step away from laborato- tual information and reality;
ries and appear in a wide range of applications, ■■ advances in optics, which largely pushed
including ones for advanced driver-assistance the development of consumer-level AR
systems, advertising, education, entertainment, displays; and
defense, manufacturing, medicine, smart cities, ■■ the maturity of multimedia techniques,
social networking, and tourism. which enriched the content and styles of
In particular, the 2013 launch of Google AR applications.
Glass was when AR first caught the attention
of a more general audience. Despite reluc- In considering these developments, here I
tance from mainstream consumers, Google review contributions from academia driving
Glass ignited a new enthusiasm for AR from AR’s newfound commercial potential, and I
certain industries and techies. Since then, consider the trends being exploited by industry
tech giants have been actively developing to advance software and hardware development
and investing in AR technologies (see the and emerging applications. I then discuss how
“Investing in AR Hardware” sidebar), and the startups hoping to leverage these advances can
progress is by no means limited to hardware compete against senior tech tycoons in today’s
innovation. ultracompetitive environment.
AR on smartphones has been soaring
to new heights thanks to breakthroughs in Breakthroughs of
AR algorithms. In July 2016, Niantic and Enabling Algorithms
Editor: Tao Mei, Microsoft Research Asia

Nintendo released Pokemon Go (Figure 1a), Although industry efforts are driving AR’s recent
triggering millions of downloads in one popularity, it’s important to note the pioneering
week.5 One month later, social media giant contributions of academia. It’s beyond this arti-
Tencent organized the virtual Olympic torch cle’s scope to elaborate on all related breakthrough
relay on smartphones (Figure 1b),6 encour- algorithms; instead, I sample a few important
aging 100 million to use AR techniques pro- ones from the past decade that have significantly
vided by HiScene. The trend became clearer influenced today’s AR. A more comprehensive
in 2017, with Snapchat’s release of World survey of milestones in AR appears elsewhere.8
Lenses7 (Figure 1c) and the popularity of
Meitu’s facial-up app (Figure 1d). These apps 2D Recognition and Tracking
each have hundreds of millions of active In early AR systems, recognition and track-
users. ing was mostly based on 2D binary markers

10 1070-986X/17/$33.00 ©2017 IEEE Published by the IEEE Computer Society


Investing in AR Hardware

T he following tech giants are actively developing or


investing in AR technologies:
References
1.   G. Sloane, “Watch Out, Snap: Mark Zuckerberg Outlines
Facebook’s AR Ambitions,” Advertising Age, 18 Apr. 2017;
■■ Microsoft has released its Hololens head-mounted AR http://adage.com/article/digital/mark-zuckerberg
device (www.microsoft.com/en-us/hololens), -confident-facebook-lead-snapchat-ar/308724.
■■ Google has launched Tango (https://get.google. 2.  L. Matney, “Apple Enters the Augmented Reality Fray
com/tango), with ARKit for iOS,” TechCrunch, 5 June, 2017; https://
■■ Facebook recently announced an ambitious AR techcr unch.com/2017/06/05/a pple-enter s-the
plan,1 and -augmented-reality-fray-with-arkit-for-ios/.
■■ Apple just showed off its new ARKit as the “largest AR
platform in the world.”2

Meanwhile, startup companies form another force for AR


development. Many of them, such as Atheer (atheerair.
com), Daqri (daqri.com), Magic Leap (magicleap.com),
Meta (meta.com), and Osterhout Design Group (oster-
houtgroup.com), and my own company, HiScene (his-
cene.com), have presented or forecasted their own AR (1) (2) (3)
headsets. Numerous AR headsets have been presented to
public, three of which are shown in Figure A—a pioneer- FIGURE A. Examples of head-mounted AR devices: (1) the
pioneering Google Glass, (2) Microsoft’s Hololens, which
ing headset (Google Glass), a product from a tech giant
received the Red Dot Award 2016, and (3) HiScene’s HiAR
(Microsoft’s Hololens), and a counterpart from a startup
Glasses, which received the Red Dot Award 2017.
(HiScene’s HiAR Glasses).

(essentially a 2D extension of barcodes), which and Mapping (PTAM)10 framework has


are inconvenient (because markers must be influenced modern SLAM algorithms on
attached to a real scene) or inappropriate smartphones, including the recently pop-
(because they appear unnatural) for many sce- ularized ORB-SLAM.11 Another line of
narios. This issue was largely relieved when 2D work toward effective SLAM or 3D recon-
pictures (movie posters, for example) started struction is via the fusion of multimodal
to replace binary markers, as demonstrated sources,12,13 which is common in newly
in Daniel Wagner and his colleagues’ seminal developed AR devices.
work.9
Efficient Computer Vision
Visual SLAM Algorithms
For years, breaking the 2D restriction was General computer vision algorithms were not
a key challenge in AR because of limited extensively adopted in AR until the recent
computing resources. Then, in 1997, George emergence of fast yet reasonably accurate solu-
Klein and David Murray developed a SLAM tions, including binary local features,14 fast
system running in real time on a smart- general tracking algorithms,15 and efficient
phone.10 Since then, their Parallel Tracking pose estimation.16

 July–September 2017 11
Startups

(a) (b) (c) (d)


FIGURE 1. Four representative mobile AR products: (a) Pokemon Go, (b) QQ-AR (for a virtual Olympic torch relay),
(c) Snapchat AR, and (d) Meitu AR (meitu.com).

Deep Learning for Advancing Software and


Semantic Understanding Hardware Techniques
Deep learning’s recent jumpstart in computer In the near future, I expect to see advances in
vision has mainly been driven by the AlexNet both software and hardware techniques. In
network for image classification, which was particular, the following areas are receiving a
published in 2012.17 Now, deep-learning solu- significant amount of attention.
tions are beating the competition in most vision
problems, such as object detection, semantic 3D environment perception. Automatic 3D
segmentation, face recognition, and visual track- perception of the environment around an AR
ing. Although deep-learning solutions still face device (such as a smartphone or AR headset)
computational limitations, their rapid advance- has long been at the heart of AR. Many exist-
ment is putting them in a good position for ing systems still live on 2D planar patterns due
future use in in AR. to the lack of practical 3D solutions. This issue
has been alleviated recently by multisensor
AR Trends fusion or effective algorithms—or a combina-
The algorithmic breakthroughs paid off, thanks tion of the two. Without a doubt, reliable 3D
to aggressive industrial follow-up and huge perception will greatly broaden the territory of
market potential. Digi-Capital is reporting AR and elevate relevant user experiences.
that AR consumer-market revenue could hit
US$108 billion by 2021.18 AR’s fast growth will Semantic perception. The fast evolution
be both horizontal and vertical. Horizontally, of vision-based semantic understanding
the AR market will broaden its use to numer- algorithms—such as those for face analysis and
ous application domains while exploiting the scene parsing—has a natural outlet in AR, pro-
wide spectrum of emerging techniques. Verti- viding content-aware guidance to augment a live
cally, in-depth exploration of AR techniques for visual stream. An early example can be found in
specific industries (such as manufacturing) will Google’s Translate app. Semantic information
greatly speed up advances in the field. has not been extensively used in AR mainly due
Here, I consider trends in AR develop- to the lack of practical vision-based semantic
ment from two different perspectives: first inference algorithms. With this situation improv-
from the perspective of techniques, and then ing rapidly, it’s easy to forecast the popularity of
from the perspective of applications. semantic information in future AR products.

12 IEEE MultiMedia www.computer.org/multimedia


Virtual-real fusion. Although numerous AR
headsets have surfaced recently, they are still far
from offering a perfect user experience. Issues
related to the field of view (FOV), comfort level,
and cost are expected to be mitigated in the
near future. Furthermore, breakthroughs in AR
headsets will facilitate AR’s role as next gener-
ation human-computer interface. In addition,
new AR hardware systems, such as projectors
and heads-up displays (HUDs), will continue to
(a) (b)
progress toward consumer markets. On the soft-
ware side, breakthroughs such as eye-tracking FIGURE 2. Examples of (a) strong virtual-real fusion and (b) weak
algorithms could further refine rendering systems. virtual-real fusion.

Smart human-computer interaction. With


the advances of environment perception and vir- perception, but also allows strong filtering tech-
tual-real fusion, smart and natural HCI needs niques to alleviate the notorious jittering effect.
to catch up as an important upgrade to AR. An important driving force for such weakened
Hand-gesture recognition and speech recogni- AR is advancements in semantic perception
tion, and the fusion of the two, are likely to be techniques, which enables reliable and efficient
the most important techniques along these lines. acquisition of semantic information.

Expanding Application Areas From general to domain-specific AR. With


It is safe to say that AR’s territory is expanding AR’s fast penetration into many application
to include many other fields, and exciting AR domains, deep integration of AR with spe-
applications are in the works. cific domains might show stronger competing
power than general AR. For example, AR in
From strong to weak virtual-real fusion. clinical intervention might be closely related to
Traditionally, AR has focused heavily on medical image analysis, while AR for tourism
high-quality, strong virtual-real fusion, which typically integrates GPS-based location analy-
is built on top of an accurate geometric under- sis and geographical information systems.
standing of the input stream. For example, in
Figure 2a, a fluent integration of the virtual From technique- to application-driven
mask with the real face could not be achieved products. As an emerging technology, AR
without precise and fast localization of facial has remained mystic to most potential users.
landmarks. In practice, such a requirement is It’s often an AR expert, instead of a domain
usually expensive but not always necessary. expert, who initiates the development of an
In many scenarios, rough localization of AR product. With the fast spread of AR to
targets of interest is sufficient. For example, general audience, and given the maturity of
when augmenting a surveillance video, once AR techniques, domain experts are expected
a target’s face is located, it typically suffices to to be more familiar and comfortable with AR
show related information around the suspect than ever before, and thus they could eventu-
(see Figure 2b) or even at a fixed position (for ally initiate or even lead the development of an
example, on the corner of the image). The weak AR product.
virtual-real fusion in such cases needs only the
rough location of the targets, and is much more Opportunities for Startups
casual than that in traditional AR. As described, AR has recently stimulated a tre-
This kind of weak fusion not only largely mendous amount of interest and effort from
reduces the burden of accurate environment industries, ranging from senior tech tycoons

 July–September 2017 13
Startups

like Microsoft to young unicorns like Magic model. For example, HiScene contributed to
Leap. Company involvement varies, but here the Tencent QQ virtual Olympic torch relay
I focus on those developing AR techniques in by providing the underlying AR tracking
particular—as opposed to using AR. algorithm.
If you work for an AR startup, you’ve likely Second, a high-quality user experience
been asked how you plan to survive, competing is often crucial in 2C products, and current
against all those tech giants. Every startup has techniques often need improvements. This is
its own strengths, expertise, resources, culture, particularly true for AR headsets, which are
and focuses, which work together to form a still under active development for better FOV,
unique growing path. That said, some common lighter weight, and longer battery life. Differ-
guidelines still exist. ent from most 2C scenarios, users of 2B appli-
cations are relatively easy to satisfy, because
Work around Technique Limitations they usually care more about the functional
Every technology has its inadequacies; so does enhancement, productivity, and efficiency over
AR. As AR techniques continue to evolve, so certain user experiences. For example, when
do their limitations. For example, real-world a mechanical engineer wearing AR glasses is
AR used to be restricted to tracking 2D binary repairing a specific part, he or she cares more
markers, but recent AR products typically work about the part-relevant information—that is,
well with natural image patterns such as post- the model specifics—than about the precise
ers. While a big company can afford to work on location in the picture where the information
far-reaching, long-term problems, most start- is presented. In other words, weak virtual-real
ups can’t. That said, startups can still stay up to fusion is sufficient for such 2B applications.
date on the latest AR techniques and can work
around their limitations. Foster Deep Integration
A startup should know the limitations of with Specific Industries
AR techniques, be able to predict how these Because AR is still in its early stages, find-
limitations will change in the near future, and ing universal solutions for all applications is
be willing to push the limits. Taking 3D object extremely hard—if not impossible. Conse-
tracking as an example, a startup should know quently, deep integration with a specific indus-
that this technology is still in its infancy— try is more practical for a startup to build a solid
despite tons of research papers on the topic—to protective barrier. Furthermore, big companies
avoid making over-aggressive promises to will find it more difficult to pursue such deep
potential clients. In addition, the startup integration because of their historical burdens.
should have a rough projection of when, and
to what degree, the technique will mature, to
help plan ahead. Finally, startups that require
3D object tracking should devote serious time
and effort to pushing the limits.
W e’re witnessing the rise of AR thanks
to the technical advances in relevant
areas as well as an increase in demand. I fore-
see the emergence and maturity of a rich set
Use a To-Business (2B) Model of enabling technologies and prosperity of
For an AR startup, it is much safer to select a wide spectrum of AR applications, provid-
the 2B rather than to-consumer (2C) busi- ing tremendous opportunities for startups to
ness model for two reasons. First, 2C relies make significant contributions to this growing
heavily on channels that reach high-capacity field.
users, and such channels are seldom available
to AR startups. Note that this doesn’t mean References
a startup can’t contribute to a 2C product. 1. D. Schmalstieg and T. Höllerer, Aug-
In fact, this can be accomplished via collab- mented Reality: Principles and Practice, 1st
oration with channel owners via the 2B2C ed., Addison-Wesley Professional, 2016.

14 IEEE MultiMedia www.computer.org/multimedia


2. I.E. Sutherland, “A Head-Mounted Three IEEE Trans. Robotics, vol. 31, no. 5, 2015,
Dimensional Display,” Proc. Am. Feder- pp. 1147–1163.
ation of Information Processing Societies 12. M. Li, B.H. Kim, and A. Mourikis,
(AFIPS), 1968, pp. 757–764. “Real-Time Motion Tracking on a Cell-
3. K. Lee, “Augmented Reality in Education phone Using Inertial Sensing and a Roll-
and Training,” Techtrends: Linking Research ing-Shutter Camera,” IEEE Int’l Conf.
& Practice to Improve Learning, vol. 56, Robotics and Automation (ICRA), 2013,
no. 2, 2012, pp. 13–21. pp. 4712–4719.
4. S. Henderson and S. Feiner, “Augmented 13. R.A. Newcombe et al., “Kinectfusion:
Reality for Maintenance and Repair Real-Time Dense Surface Mapping
(ARMAR),” 2017; http://graphics.cs and Tracking,” IEEE Int’l Symp. Mixed
.columbia.edu/projects/armar. and Augmented Reality (ISMAR), 2011,
5. B. Molina, “‘Pokémon Go’ Fastest Mobile pp. 127–136.
Game to 10M Downloads,” USA Today, 20 14. E. Rublee et al., “ORB: An Efficient
July 2016; www.usatoday.com/story/tech Alternative to SIFT or SURF,” Proc. IEEE
/gaming/2016/07/20/pokemon-go-fastest Int’l Conf. Computer Vision (ICCV), 2011,
-mobile-game-10m-downloads/87338366. pp. 2564–2571.
6. “Tencent QQ Drives Popularity of AR 15. V. Lepetit, F. Moreno-Noguer, and P.
Among Internet Users in China on the Fua, “EPnP: An Accurate O(n) Solution
Occasion of 2016 Olympic Games,” to the PnP Problem,” Int’l J. Computer
PR Newswire, 2016; www.prnewswire Vision, vol. 81, no. 2, 2009; doi:10.1007/
.co.uk/news-releases/tencent-qq-drives s11263-008-0152-6.
- p o p u l a r i t y - o f - a r - a m on g - i n t e r n e t 16. J.F. Henriques et al., “High-Speed Tracking
-users-in-china-on-the-occasion-of with Kernelized Correlation Filters,” IEEE
-2016-olympic-games-588631091.html. Trans. Pattern Analysis and Machine Intelli-
7. C. Newton, “Snapchat Adds World gence, vol. 37, no. 3, 2015, pp. 583–596.
Lenses to Further Its Push into Aug- 17. A. Krizhevsky, I. Sutskever, and G.E. Hin-
mented Reality,” The Verge, 18 Apr. 2017; ton, “ImageNet Classification with Deep
www.theverge.com/2017/4/18/15333130 Convolutional Neural Networks,” Proc.
/snapchat-wor ld-lenses-something 25th Int’l Conf. Neural Information Process-
-new-for-facebook-to-copy. ing Systems (NIPS), 2012, pp. 1097–1105.
8. C. Arth et al., The History of Mobile 18. “After Mixed Year, Mobile AR to
Augmented Reality, tech. report ICG– Drive $108 Billion VR/AR Market by
TR–2015-001, Graz University of Tech- 2021,” blog, Jan. 2017; www.digi-capital
nology, Inst. for Computer Graphics and .com/news/2017/01/after-mixed-year
Vision, May 2015. -mobile-ar-to-drive-108-billion
9. D. Wagner et al., “Real-Time Detection -vrar-market-by-2021/#.WUh0P-3yucx.
and Tracking for Augmented Reality on
Mobile Phones,” IEEE Trans. Visualiza- Haibin Ling is a professor at the Temple Uni-
tion and Computer Graphics, vol. 16, no. 3, versity. He is currently on leave and working
2010, pp. 355–368. as the chief scientist at HiScene Informa-
10. G. Klein and D. Murray, “Parallel Track- tion Technologies, a startup company that he
ing and Mapping for Small AR Work- co-founded with Chunyuan Liao and Rongx-
spaces,” IEEE Int’l Symp. Mixed and ing Tang. Contact him at linghb@hiscene.com.
Augmented Reality (ISMAR), 2007;
doi:10.1109/ISMAR.2007.4538852.
11. R. Mur-Artal, J.M.M. Montiel, and J.D.
TardÓs, “ORB-SLAM: A Versatile and
Accurate Monocular SLAM System,”

 July–September 2017 15

You might also like