BA Insight SharePoint 2013 Enterprise Search Guide

The Essential Guide to Enterprise Search in SharePoint 2013
Everything You Need to Know to Get the Most Out of Search and Search-based Applications
ABOUT THE AUTHORs
Jeff Fried, CTO, BA Insight

Jeff is a long-standing search nerd. He was the VP of Products for semantic search company LingoMotors, VP of Advanced Solutions for FAST Search, and technical product manager for all Microsoft enterprise search products. He is also a frequent writer, who has authored 50 technical papers and co-authored two new books on SharePoint and search. He holds over 15 patents, and routinely speaks at industry events.
Agnes Molnar, MVP

Agnes is a Microsoft SharePoint MVP and a Senior Solutions Consultant for BA Insight. She has also co-authored and contributed to several SharePoint books. She is a regular speaker at technical conferences and symposiums around the world.
Michael Himelstein, vTSP

Michael has more than 20 years of practical experience developing, deploying, and architecting search-based applications. In this role he has advised hundreds of the largest companies around the world around unified information access. He was previously a Technology Solutions Manager in the Enterprise Search Group at Microsoft.
Tony Malandain
Tony Malandain is a co-founder of BA-Insight. Tony architected and built the first version of the product which gained significant momentum on the Microsoft Office SharePoint Server (MOSS) and positioned BA Insight as the leading Enhanced Search vendor for SharePoint. Tony was awarded a patent for the core AptivRank technology, which monitors usage behavior of search users to influence relevancy automatically.
Eric Moore
Eric Moore is the lead for BA Insights Search Interactions and Content Enrichment teams. He is accustomed to living at the leading edge of search, and has deep experience with multimedia search, XML search, and content enrichment. Prior to BA Insight, Eric worked for five years at FAST and on the Microsoft Search Platform team. Eric has developed state of the art Products, algorithms, and platforms for specialized information workers.
SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH
WHATs INTRODUCTiON IN THIs E-BOOK?
Theres a lot to say about SharePoint 2013, and about search in SharePoint 2013. This e-book is focused only on search, and is meant to give you a working understanding of the new features so that you can get oriented with them and think about how you will deploy and use them. It does not try to cover everything, nor is it meant to be a hands-on guide.
In this book we will be covering five key areas as they relate to search. These key areas are color coded, and represented by the blocks below. Each section contains short chapters that can be read independently or continuously. The goal is to enable readers to focus on the information they need to learn about at the moment.
User Experience
Working with Queries & Results
Working with Content
Architecture, Deployment & Operations
Applications & Development
Not every area of search has changed in SharePoint 2013, and those that are currently familiar with search wont be lost at sea. For example, the deployment model, services architecture, and crawling and connector subsystems are pretty much the same as with SharePoint 2010. End users will see a dramatically different search UI, but they will be able to use it with no training (its quite intuitive). If you have built up a competency in search, youll be able to take it further in many ways which we highlight throughout this e-book.
Deeper Dives:
Technet Whats new in SharePoint 2013 search Blog article from Microsoft Search Group TechNet landing page refreshed weekly with articles on SharePoint 2013 Highlights of Search in SharePoint 2013
SHAREPOINT 2013 THE THE ESSENtIAL ESSENtIAL GUIDE GUIDEtO tO ENtERPRISE ENtERPRISE SEARCH SEARCH
WHATs IN THIs E-BOOK?
Highlights and Key Take-Aways

User Experience
WHATs NEw?
The face of search is totally revamped not just in keeping with the new SharePoint UX overall, but with deep refinements, better display for results using Result Blocks, a hover panel with previews, and more.
BENEFITs
The search experience is easy, clean, and fast.

WHATs NEw?
In SharePoint 2013 search scopes, federated locations, and best bets are now deprecated in favor of result sources, query rules, and result templates.
BENEFITs
SharePoint 2013 is light-years ahead of other search platforms in this area. Result sources, query rules, and result templates off remarkable control over search presentation. These are brand-new concepts, well worth learning they arm site administrators and site collection administrators with the tools to field powerful, effective search.

WHATs NEw?
Crawling is an area that has changed least with SharePoint 2013, but there are still some great enhancements, including continuous crawling.
BENEFITs
With continuous crawling, users get fresher content faster.
Business Connectivity Services has continued to evolve and now supports claims tokens through the BDC. The Content Processing and Linguistics capabilities in SharePoint 2013 search are very strong and extensible.Theres lots of new capabilities including a completely new file parsing mechanism.
Complex security scenarios are more tractable (though still hard). This platform offers a lot of power to developers, as well as providing some key capabilities end users will notice.

WHATs NEw?
Under the hood, there is a new architecture, a new search core, and many new modules that are the culmination of the FAST acquisition not just combining the best of FAST and SharePoint search, but some significant innovations from a continued investment in search.
BENEFITs
Search deployment and management is different, and largely better. Making search hum for O365 fully multi-tenant, smoothly scalable and fault-tolerant, and manageable at multiple levels was a key goal for this release and there are big benefits for on-premise deployments too.

WHATs NEw?
Theres a new development model for SharePoint 2013 generally, and for Search specifically.
BENEFITs
This makes extending search much more accessible, and will foster a lot of exciting search-based applications. A lot of great possibilities are now open to developers. Your users will get more done and enjoy a variety of applications, both built in and tailored all powered by search.
Theres a new Content Extensibility Web Service (CEWS) that opens up content processing for extension. Search is used pervasively throughout the SharePoint 2013 platform, and powers the new web content management (WCM) and e-discovery capabilities, topic pages, the contentby-search web part, myTasks, mySiteView, and more along with great enterprise search, people search, and site search.
TABLE OF CONTENTs
6 Introduction 7
SharePoint 2013 Search is Here
Chapter 1 User Experience The New Face of Search in SharePoint 2013 8 Raising the Bar: The SharePoint 2013 User Experience 10 First Class Search Interactions: More to Love 12 The SharePoint 2013 Search Center Overview 14 Refiners and Faceted Navigation 16 Search Center Setup 18 Chapter 2
Working with Queries and Results New Mechanisms in SharePoint 2013 19 Query Processing: the Search Engines Automatic Transmission 22 Query Rules and Query Suggestions 26 Result Types and Result Templates
28 Chapter 3
Working with Content Crawling, Connectors, and Content Processing 29 Content Capture 33 Content Processing 36 Linguistics Processing
40 Chapter 4
Architecture, Deployment, and Operations Getting under the Hood 41 New Architecture, Single Search Engine Core 45 Indexing and Partitions 47 Analytics 49 Federation and Result sources 52 Search in Exchange 54 Search Administration 58 Upgrade and Migration
63 Chapter 5
Applications and Development New Models for Search-Based Applications 64 The New Development Model in SharePoint 2013 69 The Content Enrichment Web Service (CEWS) 71 Search-Based Applications in SharePoint 2013
77 Conclusion
INTROdUcTION
SharePoint 2013 Search is Here

Theres a New Search in Town
SharePoint 2013 has arrived, and it is chock full of new capabilities and features. This is a release with major architectural changes, built for the next 15 years, and it is very different from SharePoint 2010. With SharePoint 2013, the enterprise search capabilities are dramatically different and very exciting. Search has a new face, a new development model, and some remarkable built-in features. For search Jedis this new platform has a lot to love, it is: SharePoint and a huge architectural change for search specifically, there are also many new features to build on. Peeking under the hood, there is evidence that theres more innovation to come in future releases powerful new mechanisms which arent fully used yet. This isnt a perfect release there are some things that take getting used to, some areas that still need sanding, and some situations where you need to write code or turn to partners to boost the power of your search capabilities. Well point out some of these areas where you can turbocharge your search in this e-book. Search technology (and basically all software that does sophisticated things around human language) is extremely hard in general. High-end search is very powerful, and can be applied in a myriad of situations, so covering everything is at odds with making search easy. The approach of providing hooks for extensibility and encouraging partners and customers to use them works and Microsoft has a great set of partners to pull this off. Search is still hard dont let the easy, simple user experience fool you into thinking otherwise. But Microsoft has done a remarkable job making this high-end technology accessible and easy for the mainstream. You will get enormous benefit from this release, so get to know it.
Clean, fast, and easy to use. Straightforward to install, administer,

and scale.
Provides very powerful high-end search

features.
Makes creating search-based applications

simpler than ever. For search Jedi apprentices, this release will change your world. Search is the Force used pervasively throughout SharePoint 2013 and has the power to transform the way your business uses SharePoint. What is intriguing about this release is that its very clear that Microsofts investment and innovation around search hasnt stopped it has accelerated. Theyve hit a key design target (easy, powerful search that runs on premise or in the cloud) right on the money. Since this release is a key architectural change for
C HAPTER 1
User Experience
The New Face of Search in SharePoint 2013
CHAPTER 1 THE NEW FACE Of SEARCH In SHAREPOInt 2013
Raising the Bar: The SharePoint 2013 User Experience

User experience broadly characterizes the way that people, users, work through user interfaces and information and product-specific concepts to get work done. SharePoints users, broadly, can be pegged to two groups: Business End Users regular, line of business users who utilize SharePoint for specific tasks and projects. IT Users IT professionals who manage, configure, and customize SharePoint for business users. For any new generation of a product, user experience goals are straightforward: make it easier for the user to get work done faster, cheaper, and better. A simple, intuitive, attractive design also helps. Consumers expect ease-of-use and a certain amount of slickness when it comes to interacting with products; the bar is high when it comes to how they can get work done. With SharePoint 2013, there are several developments surrounding user experience that business users can look forward to: Modern UI/Windows 8 Look and Feel The new look and feel confronts users with the most radical update in 20 years (UI news link below) to prepare for a multi-device world. This look and feel for the Windows operating system supports mobile and has the ability to boost productivity for an increasingly mobile work force.
Mobile and Tablet Deployment Support for fluid layouts, touch, and voice interaction mean that using SharePoint on Microsofts Surface tablet and the Apple iPad is much easier and smoother. This means that users can access information anywhere at anytime, with the same ease-of-use theyre familiar with from their desktop.
SharePoint 2013 and Applications The bar is also going up when it comes to ease of access to information. SharePoint 2013 is able to field experiences that are mobile and search driven, as well as for customer and employee only facing sites. There are a variety of full-fledged applications that run on your desktop, in your browser, and on leading mobile devices and present new ways to access and interact with SharePoint information, further enhancing the user experience and productivity.
Open for Designers

If youre familiar with SharePoint you know that you can customize your interface to make it look nearly any way you want to but you also know that the vast majority of business users leave the look and feel as the default and never change it. With SharePoint 2013, you no longer use PowerPoint to create themes in a proprietary format. Its easy to theme sites using HTML (including support for HTML5) as shown below. This opens up SharePoint design to a much wider range of customization by designers, and will result in a lot of very attractive SharePoint sites.
Mobile Challenges and Opportunities: Windows 8 and Metro

Windows 8 devices have a new interaction flow. The desktop, charms, apps, and tiles are distinctly different from the familiar Windows 7 desktop. This represents an opportunity for application developers to create truly engaging user experiences that work across many devices. However, it also poses a challenge for developers to learn the Windows 8 stack, and a learning curve for users. Touch is highly intuitive and highly engaging; that said, the question will be how and when do users gain their first experience and confidence with idiomatic Windows 8. Will learning be amortized in context of your project, or someone elses? Metro, as seen so far in SharePoint 2013, is a sparer, less dense way of presenting information, which is good from a user experience perspective. It also means there is less information displayed per page of results, and that decrease may trouble users who rely on recall over precision in their browsing and scanning. The solution to this problem may be to present information more effectively, to make the less more. In order to provide richer results, the design and consequent development of processing and enrichment processes will require new skills from the SharePoint application developer.
The SharePoint 2013 user experience is a platform-wide update, ready for a new generation of interaction. Changes in the underlying presentation tier, service architecture, Object Model (OM), and Office Apps all further the goal of making it easier to configure and deploy valuable applications in this new delivery environment.
Deeper Dives
TechNet on mobile devices and SharePoint 2013 Blog with highlights of Design Features in SharePoint 2013 Article on Windows 8 UI SharePoint 2013 UI blog
First Class Search Interactions More to Love

SharePoint 2013 has revamped the user experience overall (not just for search), and offers nice user experience improvements for everyone. Highlights of the previous release, SharePoint 2010, included the roll out of the ribbon across all of Office and SharePoint, and the first roll out of Office Web Applications. Search specific developments for the SharePoint 2013 platform for the end user include a flatter, cleaner, and more responsive interface. The flatness comes from a top down design that makes the transition of views in SharePoint Views (sites and document libraries), Search Views (search sites) and Detail Views (snippet and document) invisible. This improved responsiveness comes from the new architecture of the SharePoint 2013 presentation tier, which extensively uses modern HTML, JavaScript, and AJAX style interactions with responsive SharePoint search and metadata services. But thats not all folks, theres a lot more to appreciate about the new Search User Experience with SharePoint 2013 Search: Document Previews Office documents are rendered in the page for easy viewing, so theres less interruption going from one view to the next. Interactive Elements Fly outs or hover card patterns are implemented quickly and cleanly. Search results fly in and additional information about what you are looking for is available with a flick of the mouse.
Transitions Across SharePoint Tasks The disjunction between contextual search and search sites is gone in SharePoint 2013. There are fewer obvious differences between apps; this version of SharePoint does not feel stitched together like previous versions. New developments include the seamless flow between functions such as people search and search verticals. Productivity Search helps users quickly return to important sites and documents by remembering what they have previously searched and clicked. The results of previously searched and clicked items are displayed as query suggestions at the top of the results page. Search Mechanisms Under the Hood Queries, interpreting queries, returning relevant results, and the presentation of those results are pervasive across SharePoint 2013. Its not always obvious that search is there, but
10
search technologies are used across the SharePoint 2013 platform, and key new interfaces lower the complexity of customization IT professionals and application developers need to do in order to support business users. Search powers a number of areas which may or may not be obvious as search: Upgrades to People Search and Social Features making it easy to explore and find people, expertise, and conversations that are important to the task at hand. New Social Features My Sites, Communities,Teams, and Conversations create dynamic content that are quickly indexed via constant incremental crawls and returned through SharePoint 2013 search. Personalization Features search suggestions are personalized, and include visited documents, as described in the chapter on query rules and query suggestions. These show up as if by magic, and many users enjoy them without thinking about search at all. Overall, the search interfaces are clearer and brighter, and all the different parts of SharePoint apps seem to work better together. It is also much easier to customize search-driven experiences in SharePoint 2013 than with any other enterprise search platform.
The New Face of Search

Search in SharePoint 2013 has a completely different look and feel from previous versions; the UI has been largely rewritten. The new face of search in SharePoint 2013 is easy-to-use, clean, and intuitive it offers easy exploration and navigation of information while presenting information in an actionable format.This is a far cry from the ten blue links concept that the industry has been living with for nearly 20 years. There are also a number of changes that have been made to enhance ease-of-access to information supporting both productivity and mobility. While the out of the box interface is clean, and we view this as a positive enhancement, it is not as information-dense as heavy search users demand. There are a number of search-based applications that can bridge user requirements surrounding information access and analysis and we will provide several next steps and options for review at the end of this e-book.
Deeper Dives
Search User Interfaces book by Marti Hearst
11
The SharePoint 2013 Search Center Overview

The SharePoint 2013 Search Center has inherited the new look of SharePoint 2013 overall it is clean, modern, and dynamic. As you can see from the screenshot below, it is quite different than what you are used to seeing.The familiar tabbed interface is apparent, but it has a more streamlined look and feel and includes some new out of the box tabs such as videos.There are also more actions that can be done directly from the search interface, including a hover panel. Some of the capabilities from FAST show up in this release as well deep refiners and document previews in particular.These have been taken to the next level with additional features such as the ability to show histograms for dates, and allow for a search inside the refiners. While both capabilities are welcome, they are somewhat limited whetting your appetite for more. *Note: The refiner counts are turned off by default, but they appear with one click in the web part configuration panel.
Document Previews and the Hover Panel

One of the most exciting new features added to SharePoint 2013 is the integration of document previews right within the search results. This feature leverages a new standalone server that hosts Office Web Applications. With Office Web Apps users can now open a document in a web client environment with reasonably high fidelity while preserving format, fonts, sizing, etc. A key component within the document preview display is the take a look inside functionality. This provides the ability to jump specifically to a relevant section of the document, based on extraction of sections for several document types. For example, because it is likely that the slide titles in a PowerPoint presentation were designed by the presenter to summarize the content of each slide, these titles are extracted and shown as links. This feature is also available for Word documents and Excel documents (focused on graphs and named tables) as well as SharePoint sites (top sub sites and document libraries). There are some limitations to the document preview features with SharePoint 2013. It is relatively slow and missing functionality that other preview products take for granted. This includes search term hit highlighting, the ability to immediately jump to the most relevant page of the document, as well as copy and paste functionality from within the preview. Breadth of content types is another area where SharePoint 2013 previews falls short they are only available for content hosted in SharePoint, and only for a limited set of file
12
formats (for example, Word and PowerPoint, but not PDF).This preview technology was not designed for documents to be consumed via this interface, but rather to determine if this is the particular document that you have been looking for. Notwithstanding these limitations, though, document previews are a boon to the user and a great addition to search. The hover panel paradigm works well in the Search Center. This can be customized and may vary based on content type or tab. Default actions with document preview include the Edit, Send, and View Library features, as well as Follow, a social feature. They also allow some actions directly from the search page, including editing content directly in Office Web Apps.
is available by default. The new hover panel provides a great way to show profiles and content, in addition to social connections.
For many applications, people will want to customize the search center, because it is not as information-dense as heavy search users or search-based applications demand. This type of customization is easy to do, and well cover it later in the chapters about query rules, result sources, and development model Overall, the SharePoint 2013 Search Center interface is better than any other search UI weve seen on the market. It appears to be very robust, and holds true to Microsofts works anywhere commitment. It functions smoothly both in the cloud with Office 365 and on premise, as well as in all of major browsers (Internet Explorer, Mozilla, Chrome), and the
People Search
People Search is another strong part of the Search Center. As with SharePoint 2010, people search lights up with actions when used together with Lync, and phonetic search
13
experience on tablets like the iPad is pretty good. A word to the wise: just dont let a sexy demo or quick test drive lull you into thinking that it just magically works. As with all search products, the navigation depends on having decent metadata. Overall the out of the box interface is clean, fast, and provides relevant results so the basic must have elements of great search are covered. There are also a lot of exciting capabilities that make exploration easier, give users insight, and enable action directly from search. Of course, everything works better with search when all
of the products that are part of the search machine are Microsoft. for example, People search lights up with actions when used with Lync; myTasks work with Project Server; and previews work only documents stored in SharePoint with recent Office formats, and require a separate OWA server. If you dont have servers that run these other products, the additional features associated with them simply dont show up. However, search still works very well even without them. When you have all these parts in place, though, they work extremely well together a big accomplishment for Microsoft with strong productivity benefits to the end user.
Deeper Dives
TechNet creating a search center in SharePoint 2013 Intro to the hover panel Longitude Search Overview
Refiners and Faceted Navigation

Less than ten years ago, the idea of using faceted metadata for flexible search and navigation was just being hatched in an academic research project called the Flamenco project. Now it is de rigueur, it has proven to be effective and enterprise search without it is subpar. Microsoft added search refinement in SharePoint 2010, with the refiners populated by whatever content is in the associated managed properties. SharePoint 2010 created refiners out of the top N results (called shallow refiners top 50 results was the default),
and FAST Search for SharePoint created deep refiners out of the entire result set, even if it was millions of items. With SharePoint 2013, there are now two different modes for the refiner web part: standard search results, and faceted navigation. For standard search results, refiners are generated as they were with FAST Search for SharePoint. You can now define display templates to use for rendering different kinds of refinements, which is a big win over SharePoint 2010. All refiners are now deep refiners. Faceted navigation is more dynamic. It is used in conjunction with term sets (served from the
14
term store), which are also used for navigation in document libraries. With faceted navigation a term from the term store filters what kind of data should display. If the managed property is refinable, the refiners that show can depend on the term. This is handy in many search scenarios, including the online store scenario which inspired it. For example, users can use faceted navigation in an online store to find products more easily. The scenario below uses the term store terms Camera and Laptop and managed properties Megapixel Count, Color, and Manufacturer. So, with faceted navigation your terms would look like this: For the term Camera, add refiners for Megapixel Count and Manufacturer For the term Laptop add refiners for Color and Manufacturer The refiners that show up now are based on that term, which can be set based upon a page or catalog hierarchy, so that you get the following whether you navigate or search to laptops:
Configuring these refiners via the term store is convenient, and there are built-in tools that make is easy to create a hierarchy, customize the refiners within the hierarchy, and set up a very dynamic experience, as shown below.
Navigation and Search Unified

Hierarchy is also used to create results pages, as part of the WCM part of SharePoint 2013. Navigation settings are based on the same hierarchy, so that users can search, navigate, or refine their way to their result. Navigation controls also have built-in customization, as shown below.
15
As you can see, Faceted Navigation is quite a powerful capability. Refiners are available everywhere, they adjust dynamically and can be configured to an exact design all controlled
by metadata. All refiners used in Faceted Navigation are deep refiners, so there are no gaps caused by a missed item in the deeper result set.
Deeper Dives
TechNet Managed metadata overview Technet configure facted navigation in SharePoint 2013
Search Center Setup

For the IT Professional, SharePoint 2013 offers more control over the logic of search applications, and it exposes that control in a clear, consistent, and logical model. Weve outlined key concepts as they relate to Search Center Setup and how they are used to deliver search results. Query Configuration Query Rules are used to control ranking, query intent classifications, synonyms, and query rewriting in SharePoint 2013. Presentation Configuration Query Rules and Display Templates determine what result snippet gets shown for what class of query, what type of document, and for what category of user. The integration of search query processing across the platform means that display templates and query rules are applicable throughout the application.
Faceted Navigation Metadata used for top down navigation (Faceted Navigation) and metadata exposed as search results for bottom-up refinement are now both managed through the term store.
On the Premise or In the Cloud? Get Going Faster

Setting up a new search center is pretty straightforward. As illustrated below, site administrators can easily set up a SharePoint 2013 search center to run on premise or in the cloud.
16
The Search Center itself is a site template, and the good news is that with this latest release some of the rough edges from SharePoint 2010 have been removed. For example, this template now inherits design elements from a master page, so you dont need to jump through hoops to make it match your design. This does not mean that you dont still need to think about how to manage the universal search center which may serve many site collections with different themes and designs but you now have easier control.
Changes to Sites and Site Templates

There are a number of changes to sites and site templates overall in SharePoint 2013. The facilities for sharing (requesting and granting site permissions) are completely revamped and considerably improved, as shown in the screenshot below.
in SharePoint 2010. Most Meeting Workspace site templates from in 2010 have also been discontinued in SharePoint 2013 including the basic, blank, decision, and social meeting workspace templates and the multipage meeting template. They have been replaced by features from other parts of SharePoint and from OneNote and Lync, which all support collaborative work, live conferences, smaller meetings, note-taking, and storage of notes and other conference-generated commentary. The benefit is that projects with multiple contributors and collaboration across geographically distributed teams is streamlined. The facilities for web content management (covered in the Applications section) are remarkably improved and totally driven by search. This makes creating externally-facing sites and applications much more effective. If you have responsibility for explaining and exposing a service or product to a market inside your company, the business-focused features that are new in SharePoint 2013 are a strong proposition for inside the firm audiences. For example, if you provide consulting services internally for a legal practice area, recommendations, customization of search experience based on queries and personalized interaction, etc. enable users to find relevant information more quickly.
The Document Workspace site template has been removed in SharePoint 2013, simplifying the list of Deeper Dives templates available when a new TechNet creating a search center in SharePoint 2013 site collection is created. This will Blog on using the Content by Search Web Part be a big change to users since this template was a workhorse
SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH 17
C HAPTER 2
W orking with Queries and Results
New Mechanisms in SharePoint 2013
18
CHAPTER 2 NEW MECHAnISmS In SHAREPOInT 2013
Query Processing: The Search Engines Automatic Transmission

The search experience involves many different processes, so creating a great search experience requires covering everything from the moment information is pulled from the source systems to the moment it is presented to the user in search results. SharePoint historically had strong coverage on the crawling side via its Business Connectivity Service and Protocol Handler framework, and strong coverage on the presentation side via its XSLT driven core results web parts. FAST Search for SharePoint on the 2010 platform then brought coverage of the content processing area via its pipeline extensibility framework as well as its built-in entity extractors. SharePoint 2013 completes the coverage by providing a strong query processing framework, shown below.
understanding the intent behind the query. You can leverage information such as: Where the query originated from. For example, if you run a search from your companys helpdesk intranet site, you are likely to be looking for FAQs, how tos, or IT specialists. The search engine can now capture that intent to provide more targeted results. Who launched the query. If you are based in the United States, and searching for employee benefits, you are more than likely looking for U.S. employee benefits than for Canada or United Kingdom. What concepts or entities can be recognized in the query. For example, if you were searching for an expense report form, the search engine will return the Excel spreadsheet, InfoPath form, or web page which enables you to file your expense report.
Query Processing in Action

An example of query processing techniques combined would be a search for a weather forecast on Bing. The very first result youll get at the top is the weather forecast for your location. Bing automatically understands the concept behind the query, and then correlates it with information about you, the user (in this case, your location) to provide you with the forecast. It is also worth noting that this answer is not displayed like the other results on screen. It is instead carefully rendered in a visual format to enable you to quickly make a decision based on that information.
But what does Query processing mean exactly? If youre familiar with SharePoint 2010, think of query processing as the evolution of search scopes, federated locations, and best bets. With intranet search indexes now frequently reaching tens of millions of items, formulating the right query is more and more critical to finding relevant information. Fortunately, there are a number of techniques you can use to reformulate the query by
19
Query processing in SharePoint 2013 is intended exactly for these scenarios; to enable a smart, targeted search experience which understands what the user is searching for and to provide the optimal result straight from the search page. This is a very exciting new capability in SharePoint 2013, as it will open up many opportunities to rapidly build new applications driven by search which will look nothing like the standard list of ten blue links.
scopes. The key difference here is that the extra conditions enabled in 2013 go far and beyond what 2010 could do. SharePoint 2013 comes with a strong query builder to apply conditions based on the user, the search page URL (or any parameter found in it), the site, or the current date. Result sources can also be used to return results from remote content, much like federated locations in SharePoint 2010. (The result sources construct is covered in greater detail in the Federation chapter of this e-book). Query Rules allow conditional transformation of queries and results based on custom logic. Imagine you want to simplify searching for budget spreadsheets in your organization. Using query rules, you can type simple search queries such as: budget spreadsheet project X and behind the scenes the request can be transformed into something much more elaborate. The query rule could recognize the terms budget and spreadsheet in the search query and rewrite the query so that the document content type must be budget, the file type Excel, and the file content match the project name you specified in the search keywords. Additionally, the results would be sorted from the most recently modified file so that the freshest information is returned first. It is worth noting that the same Query builder functionality used for Result Sources is also available here as a means to define conditions on query rules or transform user queries
Getting in Gear: Result Sources, Query Rules, and Result Types

So lets dive now into the details of what SharePoint 2013 offers for query processing. We referred to query processing earlier as the evolution of search scopes and best bets. We meant it literally! In SharePoint 2013, search scopes, federated locations and best bets are now deprecated in favor of result sources, query rules, and result blocks. Result Sources enable you to focus searches and subset of the total information accessible in your organization by applying extra conditions to the search queries on behalf of the end-user. Stated as such, they sound very much like 2010 search
20
The last major new feature introduced for query processing on SharePoint 2013 is the Result Type construct. A result type supports the presentation of results in a tailored way, and the result block contains a small subset of results that are related in a specified manner. For instance, you can create several result blocks for sales collateral, knowledge base articles, documentation, etc. so that when a user searches for a specific product you can make sure to always return the top two or three pieces of sales collateral or knowledge base articles matching this query. In spite of the enhanced capabilities these tools provide, you may run into scenarios where they are not suitable or flexible enough for a particular search scenario. For example, geo-searches (ranking or search results filtering based distance), personalized queries (complex query changes based on who executes the query), synonyms expansion, etc. are not supported. In these scenarios you can still rely on the Search API to build your own web part or search application that implements the appropriate logic. The API, is for the most part comparable with the version seen in SharePoint 2010 with a few exceptions. The main exceptions include the removal of the FulltextSqlQuery class and syntax which have been deprecated, and the appearance of the SearchExecutor class which allows you to execute multiple related queries in one shot.
to create pages as all the functionality is user friendly and has point and click interfaces. Microsoft made it even easier by pushing this functionality not only to site collection administrators, but to administrators as well. Thats right, farm level privileges are not required as long as you own a site (such as your personal site) you can use these capabilities to build your own search center. Two examples of applications you can build using these new features: A manufacturing dashboard that displays all about a specific part based on its part number. Information could include the inventory level, the last orders for that part, the instructions on how to use that part, and forum discussions from your customers about that part. A knowledge portal, that enables you to share FAQs, knowledge base articles, documentation, or tutorials to empower your support or helpdesk team. Powering your applications via search has never been easier. The chapter on Search-Based Applications has many more examples, and we encourage you to explore whats possible, and even to try building some of your own.
Deeper Dives
Technet on query processing Blog overview of search in SharePoint 2013 List of terms for query builder New KQL syntax in SharePoint 2013
No Speed Limits
Microsoft has made it very easy to create search pages using this new functionality. In fact, you dont need programing experience
21
Query Rules and Query Suggestions

In the last chapter we introduced Query Processing and several new key concepts including Query Rules and Query Suggestions. Now, lets go into greater detail on these and some other query features like spell check and rank management. Working with these features, you can customize search to a great degree, without writing code.
Query Rules
Query Rules are a brand new feature in SharePoint 2013, and they are designed to enable you to act upon the intent of a query and provide a remarkable amount of control and configurability. The Query Rules framework is composed of three top level components: Query Conditions, Query Actions, and Publishing options. These are all configurable via PowerShell, or via the UI shown to the right. Query Conditions are rule sets that are meant to determine the intent of the query (does the query meet a rule?) Options for this include: Query contains a specific word or words Query contains a word in a specific dictionary Query contains an action word that matches a specific phrase or term set
Query is common in a different source (like Videos result source) Results include a common result type (like file type) Advanced rules which can match across a set of terms, dictionary, regular expression, etc. If the query is against a particular result source (see the Result Source chapter in this book) or category, result source conditions can also be applied. If the Query Condition is met, Query Actions are then triggered.
22
Query Actions specify a series of actions that take place once a query condition is met (what to do if the rule is met). These actions include: Assign a promoted result This replaces the Best Bet and a former FAST Search for SharePoint 2010 feature known as Visual Best Bets. The configuration of the promoted results allow you to specify if the returned action should be treated as a best bet (hyperlink) or as a fully formatted HTML block (Visual Best Bet) Create and assign a results block When a condition is met, one or more results blocks can be triggered. Result blocks specify an additional query to run and how to display results. This feature includes a full query designer so you can build and test queries before finalizing them. You can also include the results above those returned by core results, or interleaved by ranking. Additionally you can choose custom display templates instead of the default for the result or results block. Change the ranked results by changing the query This allows you to assign additional parameters and weighting (XRANK Boosts) values to the query (Query Transforms for those familiar with FAST). For example, if the condition of the rule is met, apply XRANK constant
boost of x number of points. XRANK is a FAST capability that allows you to override the default relevancy ranking by boosting the relevancy score for particular results at query time. Publishing Options Publishing options determine when a query rule is active (When to do this?) A rule may be active in a specific time interval (start date, end date) or always active (by default). You can also configure a review date (triggers an e-mail reminder to review this rule). The power of query rules is not only in the flexibility they provide, but also the richness and complexity that can be derived from them. Imagine a single Query Condition being met, which then triggers a visual best bet, a results block from a remote SharePoint site, a results block from a cloud source, and a query transform that will boost results coming from the cloud. In addition, rules would determine that these actions are only taken between November 25th and December 26th. An example of how this would work in an intranet scenario, would be if you had a query rule that was active only during insurance open enrollment windows.
Query Suggestions
Query suggestions enable users to ask better questions, and make it simpler to search for information. This feature was sorely lacking in SharePoint 2010. In SharePoint 2013, Query Suggestions are supercharged, thanks in part to the addition of the Analytics Processing Component and the Analytics Reporting
23
Database. These components provide for analytics aggregation and persistent storage of these analytics. Some key new features include: My Queries Personal Query Log (in Analytics database), which factors your personal SharePoint activity into the query suggestions. My Sites This capability tracks sites you have visited, and factors them into the query suggestions. Our Terms This feature uses information related to the most frequent queries across all users that match the search terms. Query Suggestions now take two forms: Pre-Query Suggestions and Post-Query Suggestions. Both of these help the user ask better questions by showing you what others have asked before; they differ in when they are displayed and how people use them. Pre Query Suggestions include both a list of queries from other users, and a list of items you have clicked on before, as shown in the screenshot below.
query to help them find information, and to assist them in writing better queries. These suggestions are provided in two forms: 1 A list of items that others are typing for their queries. 2 A list of items you have clicked on before from your personal query log. A key aspect of this feature is that it will never provide a suggestion to a search that did not yield a click-through (someone clicking on the document), and it will never provide a suggestion if the results would lead to a dead end (zero-result query).
Pre-Query Suggestions occur prior to a query being executed. The goal of pre-query suggestions is to aid users in selecting a
Post Query Suggestions are provided after a query is executed and when results are displayed. These suggestions are based upon the results that you have clicked on at least twice. They provide a quick means to go back to a document that you regularly review or select. They are similar to the Related Queries provided with SharePoint 2010. Suggestions can also be tuned (inclusions and exclusions) within
24
the Service Application Admin Pages. It is also important to note that these are not tuned at the site collection level, but only at the SSA level.
Working Wonders with Queries

The mechanisms for query rules, query suggestion, and query spell checking are new with SharePoint 2013, and they may take some getting used to. Previously, there were some capabilities in SharePoint 2010 that processed queries such as the keyword features that applied to synonyms, best bets, and promotions/demotions that are now replaced by query rules. Once you become familiar with these new features, you will find you can work wonders. In spite of all of the obvious pluses, there are some limitations with query rules. You cant call a program from a query rule, which blocks a variety of use cases. For example, synonym expansion is done on full queries, and on pre-built synonyms. This makes expansion easy to understand but has been a big annoyance to many search administrators in SharePoint 2010. This limitation can be addressed, but only through applications available through the Microsoft partner ecosystem not via query rules. Calling applications based on query patterns (for example, pulling up an ATM location app when users search for bank branch near me) is feasible in SharePoint 2013, but not directly from query rules. However, these limitations are important only for a specialized set of search applications. The power of query rules, and query processing generally, in SharePoint 2013 is light-years ahead of other search platforms. Learn to use these mechanisms, and you will be in a great position to dazzle your business users with the power of search.
Query Spell Correction

Spell correction is a familiar and very useful feature, since humans are prone to misspelling and of course fat-fingering. SharePoint 2013 provides spell correction by default, as shown below:
In SharePoint 2010, spell correction was implemented as a series of XML files that defined inclusion and exclusion items for the dictionary. In SharePoint 2013, Query Spell Correction is managed from within the term store of the Managed Metadata Service. Within the term store, Query Spellcheck Exclusions and Inclusions are nodes within the term store, as shown below. Dynamic dictionary creation is still supported, but is now managed from within the term store.
Within the user interface for search, Query Spell Corrections can be configured to use Did You Mean type functionality for query transforms.
25
Deeper Dives
Good blog post on query rules TechNet on query processing List of terms for query builder New KQL syntax in SharePoint 2013
Result Types and Result Templates

Theres another new concept in SharePoint 2013 search, called Result types. Result types let you control how search results will be displayed, and let you display different content in different formats. For example, if you have e-mails, documents, and database records in the same result set, you may want to use different formats for each and display different managed properties for each. With SharePoint 2010, this meant creating complex xslt, and there was no easy way to group similar results together for presentation. With SharePoint 2013, wizards ease configuration of displayed results, and HTML and JavaScript enable you to add finishing touches if needed. The screenshot below has multiple result types, presented in result blocks.Videos, documents, personal recommendations, and a visual best bet (though its no longer called that) all have their own presentation and their own result template.
Results Framework Redux

The Results framework is composed of three parts (as shown below): Rules Engine A list of rules to determine if the result type should be triggered. Property List Associates the rule to document type, content type, or other managed property within SharePoint search. Rendering Template Defines how that particular result will be displayed.
26
managed property before it can be used in a rendering template. 3 Specify where you would like the requested property list items to be displayed using a tagging convention as follows (-#= contenttype =#-) by using a Rendering template. The Rendering template consists of a template that is composed of HTML and might contain JavaScript. Within this simple to edit template (Not like editing XSLT in SharePoint 2010) you can call specific graphics (icons, etc.) and be stylize it in any way that you would normally stylize HTML. Result types may seem complex to master, but once you become familiar with them you will appreciate how powerful they are. There are impressive tools in SharePoint 2013 that facilitate ease of use, and formatting is done using any tool you are familiar with. (SharePoint Designer has dropped the ability to do this kind of formatting, which will be annoying to some, but there are lots of great tools available to work with HTML and JavaScript.) With SharePoint 2010, very few people actually did the kind of formatting and result templating that was possible it was too complex and arcane to use. With SharePoint 2013, you will quickly find that result types and result templates are enjoyable to work with, and youll discover that you use them naturally to make search results look great and work well for users.
Result Types Unleashed

The power of Result Types really becomes evident when looking at a real-world scenario. In the scenario below you have multiple documents that have been assigned content types (i.e. specification documents, data sheets, etc.)
Within Result Types you can: 1 Specify a rule based upon specific criteria. The rules can contain fairly advanced features, such as BOOLEAN logic (i.e. AND OR NOT), equality (i.e. = or !=), or comparison ( < OR > ). These rules can also be applied to managed properties. For example the rule might be ContentType= spec documents). 2 Specify which managed properties you would like to have returned once rule conditions have been met. You must specify at least one
Deeper Dives
Customizing search results via Result Types and Display Templates Technet query variables
27
C HAPTER 3
Crawling, Connectors, and Content Processing
28
CHAPTER 3 CRAWLING, CONNEcTORS, AND CONTENT PROcESSING
Content Capture
Capturing content is fundamental to search if its not crawled and indexed, you cant find it! The process of connecting to content sources, crawling them to get content, and making that content searchable is far more complex than most people realize. It was also one of the most frustrating areas to manage with SharePoint 2010. As a quick orientation, the basic function of a crawler is shown in the figure below. The concept is simple enough: the crawler connects securely to a given content source, maps the content from the source system to the crawled properties of the search engine, and feeds the engine in either a full crawl or an incremental crawl (which finds any changes). What makes content capture different from one search engine to the next is the breadth of connectors, coverage of different security models, and data types, the performance (both throughput and latency), the robustness, and the ease of administration. SharePoint 2013 does well on all counts although most connectors are supplied by Microsofts partners, not Microsoft.
SharePoint 2013 supports multiple crawl components, crawl databases, and content sources as shown below. There are a number of connectors included out of the box: SharePoint HTTP (web crawler) File Share Business Data Connectivity (BDC) Framework also includes these connectors that are built on the BDC framework: Exchange Public Folders Lotus Notes Documentum Connector Taxonomy Connector (connects to MMS) People Profile Connector
29
Connector and Crawling Changes

For the most part, these connectors are essentially the same as the connectors in SharePoint 2010. The connector and crawler infrastructure are the part of SharePoint 2013 taken most directly from SharePoint search, so they have the fewest changes. While few, there are still some notable changes. The web crawler has some nice updates that address previous headaches. These changes include: Anonymous Crawl for HTTP Anonymous authentication allows any user on a web site to access any public content without providing a user name and password challenge. SharePoint 2013 allows you to get at these web sites without associating crawl to a user account.This is handy for general web crawling and makes the setup of web crawls simpler. SharePoint 2010 used the spsearch account to log into sites, which stymied many people trying to crawl SharePoint sites with anonymous access, public web sites, and the like. Previously there were work arounds, but they were painful.The updated functionality now offers a pain free way to perform this task. Asynchronous Web Part Crawl A common way to improve performance of SharePoint sites has been to load web parts asynchronously, which dramatically speeds up the first display of the page. However, crawling these pages for search also delivered incomplete information. In SharePoint 2013 search, the crawler now gets a full rendering of the page in order to index them. This doesnt work for all asynchronous pages, just for most out of the box web part content. But it takes care of the vast majority of problems in this area.
Overall, the most noticeable change in content capture is Continuous Crawling. This is a new method of insuring you have the most current data in your search index, and is available only for SharePoint content. Rather than living with a latency of several minutes and with full crawls that might take many minutes to start populating content in the index, youll see content within seconds! When you enable continuous crawls (using the UI shown below), a crawl schedule no longer applies you are running crawls in parallel and the crawler gets changes from SharePoint sites every N minutes (set to 15 minutes by default but this parameter is changeable). Continuous crawls do not stop for errors, but rather note the error and continue to crawl content. Continuous crawls can occur while other crawls (full or incremental) are active or starting, where incremental crawls need to wait for other incremental crawls to complete prior to starting to crawl. With this capability you can now keep content fresh, and wont experience mysterious delays when additional content sources are added.
30
The Taxonomy connector is new in SharePoint 2013, and you will see it at work even when you dont use it explicitly, since the term store is much more integrated with search. As you will read in other chapters of this book, you can now create entity extractors directly from term sets, set up WCM page hierarchies using the term store, define faceted navigation using taxonomies, and much more.
still be good for high performance or particular tasks. But the primary way to create connectors is through the BDC Framework, which was introduced in SharePoint 2010 as part of Business Connectivity Services (BCS). BCS is an umbrella term for a set of technologies that brings data from external systems into SharePoint Server 2013 and Office 2013 (shown in the figure below). As with SharePoint 2010, you can make new connectors pretty simply. For systems with static schemas, straightforward security, and moderate performance needs, this is not a huge job. There are some great improvements in Business Connectivity Services as a whole for example, theres tooling specifically to create External Content Types against OData sources, there are Representational State Transfer (REST) and Client Side Object Model (CSOM) interfaces, and External Content Types that can be scoped to a single SharePoint app. Unfortunately, none of these apply to search creating an indexing connector for search is not the same as creating an External Content Type. The Business Data Connectivity (BDC) framework is largely the same in SharePoint 2013 as it was in SharePoint 2010, when it comes to search. There is one notable change though Claims tokens are supported through the BDC. Previously, only Active Directory (AD)-format Access Control Lists (ACLs) were supported, which made it nearly impossible to cover some complex security scenarios. With Claims support, many of these scenarios are tractable though still very much the domain of experts.
Building New Connectors

When you start getting into search, you quickly find that you want to get at more and more kinds of content from more and more places. Data silos are everywhere, and search lets you bridge these silos easily and securely. In order to do this, you need a connector for each content source and many organizations have dozens of systems that require connectors well beyond what comes out-of-the box w. Luckily there are two options: leverage a rich set of partner-built connectors or (if you are a developer), create new ones yourself. SharePoint 2013 will still support existing protocol handlers (which are custom interfaces often written in unmanaged C++ code), using an interface used since MOSS 2003 and deprecated since SharePoint 2010. These can
31
One warning you shouldnt underestimate the effort involved in connector development, deployment, and maintenance. Dont fear connector development, but watch out for the classic quicksand trap.Too often a development project gets to basic connectivity quickly but then struggles to get security right and to get high performance and scale. If and when this is successful, the project is then dragged further down in troubleshooting and maintenance, since things change every time the source system changes. Plan your development carefully to avoid this trap.The best way to avoid it is to consider pre-built connectors for any complex system that way you dont have to build your own from scratch, and you dont have to maintain it.
Java Database Connectivity (JDBC) connector which supported direct SQL access to databases. Though these may seem like big gaps, there are ways to cover this functionality in SharePoint 2013, either with different mechanisms (many cases covered by the JDBC connector can be done via the BDC), or with pre-built connectors from Microsoft Partners.
From Crawl to Index

Many of the most significant content capture changes youll see with SharePoint 2013 search dont actually result from the connector and crawling components. For example, the content processing component adds some remarkable capabilities that show up to the end user looking like better content.The Indexer has lower latency and is much more robust, which is one key to continuous crawling and also alleviates many of the weird issues people encountered with crawling after outage events with SharePoint 2010 (which could cause the crawler and index to be out of sync.) Additionally, improvements in schema management make mapping content much simpler with SharePoint 2013. All of these areas are covered in other chapters of this book but they contribute to the improvements discussed above to provide robust, scalable, and high performance content capture. This is a great foundation to build on for any search deployment or search-based application.
Changes from FAST Search for SharePoint

If you are used to FAST, there are a number of changes you will notice.These changes are all a byproduct of moving to a common, single search engine. First, there is no way to push content to index in SharePoint 2013. (With FAST, there was a mechanism called the Content API). There are also three connectors that you will notice are gone: Lotus Notes which had performance, security, and flexibility features beyond the Notes connector included with SharePoint 2010 and 2013. The Enterprise Web Crawler which rendered dynamic sites, had high performance, and several high-end features.
Deeper Dives:
TechNet on managing continuous crawls MSDN on searching new content with SharePoint 2013 Longitude Connectors Overview
32
Content Processing
Content Processing is an essential pillar of search quality, but it is typically invisible to the end user. The development of content processing in SharePoint 2013 is focused on implementing platform-wide capabilities, and integrating and supporting built-in search-based applications such as WCM and e-discovery. In order to support the wide range of scenarios that depend on search, Microsoft provided extensibility, so that customers and partners can leverage the new search platform and hook into content processing. The Content Processing component is brand new with SharePoint 2013. It takes content from the crawler and prepares it for indexing, as shown below. With SharePoint 2013, there is also a new Analytics Processing component that feeds information into Content Processing.
New Content Processing Subsystem With a Heritage

To understand the changes in the content processing structure within SharePoint 2013, it is useful to look at the heritage of this release, especially for the content processing component. Those familiar with the final version of a stand-alone search engine offering from Microsoft, Fast Search for Internet Sites 2010 (aka FSIS), understand that FSIS was composed of three main structural components, as shown below. They were: Core FAST search Engine (FAST ESP 5.3) in red in the figure below which was a complete search engine, to which new components were added on the content side and query side. Content Transformation Services (CTS) which was responsible for content processing and ingestion and introduced the concept of processing flows. Flows are much more dynamic and expressive than the straight linear pipeline architecture found in Fast Search for SharePoint 2010. Interaction Management Services (IMS) which managed all query and result processing, using processing flows.
33
In SharePoint 2013, the underlying dataflow engine for content processing, which was first introduced as CTS, has been extended and enriched to host the content processing tasks for the entire SharePoint platform. Successful integration of a new content processing flow for search and enrichment for the whole SharePoint platform is a significant investment and engineering achievement. The benefits are potential scale out, improved management, cloud ready system architecture, and an improvement to Microsofts ability to integrate new content enrichment features inside the SharePoint platform.
New format handlers implement document parsing. They replace IFilters for OOB document metadata. Higher throughput for Office document types and for PDF. Automatic content-based file format detection removes dependencies on file extensions. Content processing throughput and error reporting (this is tied to crawl reporting) is comprehensive and far simpler to understand. Search analytics processing (which we cover in more depth in the chapter on Analytics) is an important new platform capability. The analytics module feeds information back into Content Processing for a variety of purposesfor example, to improve search relevance based on user behavior. Usage and search action events document exposures and document click-throughs are recorded into a new SharePoint 2013 analytics store. They are then processed in a form that enables search relevance to account for, for example, popular content, relevant query terms, or, in the context of recommendations, boosts for related user/ related item results. This also supports search history boosts.
New Capabilities for the IT Pro

For the IT Professional concerned with how content is processed, enriched, and made ready for search, these SharePoint 2013 content processing feature areas stand out (we cover these in more depth in the chapter on Linguistics): Linguistics features, in particular around phonetic search for person names, continue to improve in scope depth. Cross-lingual name search (via People Search), for example, is a remarkable feature that makes it easy to find people (since human names are notoriously hard to spell right). Entity Extraction management, which was previously done via a set of separate files and ad hoc PowerShell scripting, is now moved into the Term Store a big win because there is now a good UI and a robust set of tools with it.
Hooks for the Savvy Developer

For developers familiar with the extensibility of FAST Search for SharePoint, SharePoint 2013 offers similar mechanisms. However, the content processing flow and search index are not as open as with previous FAST platforms they are more of a streamlined and closed utility. You
34
will enjoy how easy it is to set up and operate these capabilities, and how little head-scratching you do in development but you will be frustrated at how little you can get at. This is a sensible tradeoff in the context of a major platform upgrade and in accommodation of a hosted multi-tenant deployment model (O365). The capabilities and ability to extend them is still there, but it feels limited. There are times that it takes sophistication and inventiveness to do what you want with the hooks provided. The extension point for content processing is the Content Enrichment Web Service (CEWS). This is a new mechanism to enable content processing, called from a content processing flow at a single point, as shown below. We will cover CEWS in more depth in its own chapter, and touch on its applications in the chapter on Linguistics.
* Note the CEWS call-out is not part of O365 and is only available for the Enterprise Edition of SharePoint 2013. SharePoints management of content processing is highly scalable and streamlined. SP2013 content processing straddles the on-premise deployment of SharePoint and the deployment of SharePoint in hosted form via O365. If content enrichment beyond what is provided in SharePoint 2013 is important for your application, especially for content you already have, prepare to look for custom solutions that leverage the Content Enrichment Web Service.
Deeper Dives
MSDN Section on Content Enrichment Web Service (CEWS) TechNet content processing description
35
Linguistics Processing
Linguistic processing, which aims to leverage the meaning of documents or words, is the special sauce of search and one of the most mysterious and difficult to understand areas. Human language is a tricky thing, and algorithms aimed at understanding it are complex and imperfect yet this is what makes it seems like search just works for end users. Linguistic tools, such as spellchecking of queries or grammatical normalizing of content or queries, can greatly simplify users search experience. Covering the wide variety of languages (SharePoint 2013 search covers 85 languages, from Afrikaans to Zulu) also means that you can find content that is generated by users from across geographic boundaries.
In preparing content for indexing, linguistics are applied in stages, each one building on the previous one. The figure below gives an overview of these steps in what is often called the pipeline. (The steps in gray are not OOB, but illustrate some of what is possible by adding third-party components.)
Linguistic processing is applied to both content and queries (as shown above), using a similar framework under the hood. As mentioned in other chapters, the content processing and query processing components have a heritage from modules called CTS and IMS, and they share an underlying framework for processing flows.
First, files must be parsed, teasing the indexable text out of PowerPoint, OneNote, PDF, etc. During this process the language is detected, since processing English is different from processing Japanese. Words and patterns (dates, times, URLs, etc) are found, based on the text and language. Next, the magic begins a variety of types of Text Analytics technology is then applied. Stemming or lemmatization (which allows forms of the same base word to be matched, for example sing, singing, sung, or incorporate and incorporating), synonyms (matching, for example, car and auto), and concept detection of various forms deal with the wide variety of ways humans say essentially the same thing. Entity extraction, which is a key linguistic capability for SharePoint 2013 search, and techniques like categorization, relationship extraction, and sentiment analysis add metadata
36
that greatly improves the ability to find and explore information. Microsoft has the deepest natural language processing development capability on earth, because it has labs around the planet. This was strengthened with the FAST acquisition, since one of FASTs specialties was linguistics applied to search. Strong language processing features show up in SharePoint 2013 search, which has continued a tradition of steady improvement in this area and has some extremely strong linguistic technology, including many improvements from SharePoint 2010. Some of the changes will be directly apparent to the end user, but many of them show up in subtle ways, and some are only relevant to specialists handling unusual situations. For those coming from SharePoint 2010 search, theres some remarkable new capabilities and improvements. For those coming from a FAST based platform, the capabilities are familiar, but are now much easier to work with. There are some capabilities you are used to from FAST which are no longer there we mention the major ones as we cover each area. There are some changes in SharePoint 2013 that will be noticeable to nearly all search deployments: document parsing is foremost, but also synonym management and custom entity extractors. Some changes will only be apparent or available to those extending search, and some will be visible only to a specialized group of deployments.
Automatic file format detection no longer relies on file extensions, eliminating the kind of errors that happened when users or applications do creative things like making .memo files. Deep link extraction works like a table of contents generator and allows you to click into previews for Word and PowerPoint formats. Metadata extraction for titles, authors, and dates provides better metadata and is much easier to understand than the techniques used in SharePoint 2010 (where Optimistic Title extraction was one of the top sources of user confusion). High-performance format handlers for HTML, DOCX, PPTX, TXT, Image, XML and PDF formats mean faster crawls and indexing. The new parsing facility is enabled by default and supports 55 of the most common file formats, including things like Montage, Visio and OneNote. By comparison, the 2010 Microsoft Filter Pack supported 15 formats, and the Advanced Filter Pack (available for FAST only) supported 422. For most deployments, this means you will no longer have to seek out third party IFilters though the IFilter API is still supported and there is a rich assortment of IFilters on the market that cover file types beyond the OOB 55.
Changes in Document Parsing

SharePoint 2013 introduces a completely new document parsing facility, with some big improvements. These changes include:
Other Changes Youll Notice

Language detection has changed with SharePoint 2013. In SharePoint 2010, language
37
detection was done chunk wise on document parts like paragraphs. Now a much larger part of the document is used. The advantage of this is that language detection is generally better the more language you can look at the more reliably you can tell what language its in. There is a downside to this approach, however documents that have mixed languages partly in English and partly in French, for example, arent handled as well. The Term Store (MMS) is well integrated with search now, which provides a number of big benefits. Customizations to Query Spelling Correction are now managed in the term store both inclusions and exclusions (shown below).
outside of the term store Synonyms via a UI or PowerShell, Custom Extractors via PowerShell, and spell correction via a dynamic dictionary based on content in the index or a static OOB dictionary. Offensive Content Filtering was a feature that could be enabled in FAST Search for SharePoint. This feature, made it easy to shield users from obscenities and profane language that is found in content (even business content) remarkably often. However, it is no longer supported with SharePoint 2013, so youll need to find a third-party alternative if this is important to you. Substring search, another FAST-only feature, was also removed. This provided n-gram matching without taking into consideration word boundaries, which was good for applications like part numbers.
Changes in Extensibility
There are notable changes in how you can extend linguistics processing with SharePoint 2013. These include: Custom Extractors (previously FAST only) are more powerful, and you can have more of them (12 rather than the five allowed with FAST Search for SharePoint). These allow you to provide a list of terms (via PowerShell) and match them in the content, populating managed properties with consistent metadata which is the lifeblood of information discovery. Custom Word-Breaking now requires only one language-independent dictionary, rather than the one-dictionary-per-language facility in SharePoint 2010.
Property Extraction (previously a FAST-only feature) is also manageable in the term store (shown below). However, only company names are available if you were using property extraction for people names or place names, youll need to find a third-party alternative. Some things are still managed
38
Customize stemming (done via registry settings in SharePoint 2010) is no longer supported. Third party specialists will find ways to customize this level of linguistics and handle specialized cases. The biggest change is the availability of the Content Enrichment Web Service (CEWS). This provides a way to add linguistic processing of any type, such as the examples in gray in pipeline figure above (concept extraction, relationship extraction, geo-tagging, summarization, etc). With FAST Search for SharePoint, it was possible to extend the content processing pipeline through a sandboxed application, but this was both slow and limited in the information it could access. SharePoint 2013 introduces a much more open API which makes it possible to add specialized linguistics at lower levels as well as sophisticated text analytics. CEWS is covered in more depth in a separate chapter.
Putting Linguistics to work

All of these cool capabilities come into their own when developing more specialized search based applications. This has become much more powerful with the application development hooks and tooling available, and you should expect to see some amazing applications built on SharePoint 2013 using these capabilities.
Deeper Dives
Technet article on linguistic search features in SP 2013 MSDN Section on Custom Word Breakers Longitude AutoClassifier Overview
39
C HAPTER 4
A rchitecture, Deployment, and Operations

Getting Under the Hood
40
CHAPTER 4 GETTINg UNDER THE HOOD
New Architecture, Single Search Engine Core

The first and foremost change to search within SharePoint 2013 is there is only one search engine core. The idea that you would use the FAST engine for content and the SharePoint engine for people is completely eliminated in this release. There is now only one search engine within the SharePoint 2013 platform which you can think of as bringing FAST to all search tiers. Powerful indexing, linguistics, extraction, and query expressiveness that are the heritage of FAST are now evident throughout the platform. To appreciate the evolution from SharePoint 2010, its worth looking at the history in this area. The acquisition of FAST Search and Transfer in 2008 was regarded by the industry as a major step forward in taking the lead in the enterprise search marketplace. The incorporation of FAST within the overall SharePoint 2010 architecture allowed organizations to leverage enterprise class search capabilities in a platform that was within the cost and budget requirements of todays enterprises. Unfortunately, the acquisition occurred midway between release cycles. This forced Microsoft to determine which features would be available in the wave 14 release (SharePoint 2010), and which features would need to be included in the next release.
FAST Search for SharePoint is a very powerful product but there are numerous rough edges due primarily to the lack of time in the previous development cycle. The timeline also resulted in a hybrid architecture, with separate SharePoint and FAST farms, as shown below. This could be awkward and confusing to work with.
With the release of SharePoint 2013, the full realization of Microsofts investment in FAST Search and Transfer is now evident. The capabilities now available take enterprise search to a whole new level. They are the result of a new search architecture. The architecture, shown below, is relatively simple, though much of it is new.
41
There is a good walkthrough of the components on TechNet, which we wont repeat here. Each of the components shown are covered in at least one chapter of this book as well. However before we move forward, there are a few essential things to understand: Search is fully integrated into SharePoint, and there is no longer a separate Search Server. Certainly, a SharePoint 2013 server or services farm can be used only for search. To do this, you do want to have the MMS (term store) and User Profile service, at minimum much as you did in SharePoint 2010. There are four different databases, each independent from the other. All of them can be partitioned, mirrored, and managed.The Crawl database scales with the amount of content crawled, so this is typically the database that has multiple instances in a large search deployment. Every component can be scaled out for capacity and for fault tolerance. Previously, there could be only one Search administration component, which meant you had to do creative workarounds to create truly fault-tolerant configurations. Search is now multitenant except for a few things, such as the CEWS API. Much more administration can be done at the site collection (or tenant) level.
Not just a Merger of FAST and SharePoint

You can think of this architecture as bring FAST to every tier of SharePoint, but it is much more than that. This is not a mere merging of FAST and SharePoint nearly every component in this architecture is new. Just as SharePoint 2013 is a major architectural release overall, search is in many ways a radical re-architecture. The computational platform underlying the search based interaction for SharePoint 2013 is a powerful distributed dataflow engine (called NodeRunner). An illustration that underscores this is shown below. This is the same architecture, though not using the official technology. The Crawl and OOB connectors (aka crawl component), are the least changed part of search in SharePoint 2013, and they retain the mssearch.exe name under the hood. The Content Processing Framework and Interaction Management Framework (aka Query Processing Component) are running flows, similar to CTS and IMS in the FAST Search for Internet Sites 2010 product (see the Content Processing chapter). These are running under NodeRunner. So is the search core which is neither the FAST ESP core nor the SharePoint Search core.
42
Its a new, next-gen search core that was the result of a decade of research and development at FAST, hardened through the Microsoft development process. Also new in this architecture is the Analyzer (aka Analytics Processing Component), which we cover in the chapter on Analytics. The content processing component writes information about links and URLs to the link database. In turn, the analytics processing component writes information related to the relevance of these links and URLs to the search index via the content processing component. This enables some powerful capabilities like recommendations and usage-based relevance enhancement. If you look inside the search service, you will find several search processes. This includes MSSearch.exe (for the crawl component), NodeRunner.exe (which hosts search components), and a Host Controller (a Windows Service that supervises NodeRunner processes. The Host Controller monitors NodeRunner processes, detects failures, and restarts processes if they do fail. There can be multiple NodeRunner instances on the same
server, each hosting one search component. On a default single server install there will be 5 instances of the NodeRunner.exe process, as shown to the left. Although there is a fascinating dataflow engine and a next-gen search core, those are not exposed for developers the only points of configuration for interaction are ResultSources, QueryRules, and CEWS. In SharePoint 2013, configuration alternatives are circumscribed to assure that no configuration would result in excessive resource consumption for that instance relative to other instances that may be running through the same service. So, QueryRules run effectively in a sandbox that restricts calls to non-SharePoint services.
Full Range of Search Topologies

When people think of architecture, some think of deployment topologies machines, nodes, and processes. There is lots of good material on this physical architecture, which we will not repeat here. But well give you a flavor. As with SharePoint 2010, the minimum configuration is just one node, and the minimum configuration with fault tolerance is two nodes (FAST Search for SharePoint required two and four nodes respectively). Scaling from there to ultra-scale search (including the scale of O365) is possible, and you can grow incrementally. The medium farm topology, shown below based on the TechNet recommendation at www.microsoft.com/en-us/download/details. aspx?id=30383, is capable of supporting approximately 40 million items in the index. Note
43
that we expect the density (items per node) of SharePoint 2013 search to go up dramatically over time, just as FAST Search for SharePoint density did.The initial focus has been on scale-out in order to support O365, not on density.
apply it to different applications, and develop on top of it. What will you notice about this architecture? There are many things beyond the capabilities that meet the eye. For example: The core engine is different, so relevance is different. Since Microsoft has a lot of data with which to tune relevance, youll notice first that the relevance is better OOB. But if you had customized relevance or spent time focused on it, you may have some work to do or you may have a pleasant surprise. Indexing is atomic in the new search core. That has some very interesting implications, but mostly youll notice that its more robust and that you can do normal backup and restore. For nearly all search engines its a dirty secret that data can occasionally get lost in indexing (so one in a million items may go missing), and an outage can result in needing a full re-index but this core will be different. Scale-out is possible on a huge scale big enough to run O365, and big enough for any challenge you can throw at it. FAST was always great at large scale, but this is
What Matters is it Works

As a developer or user you dont really need to know about the underlying algorithms or dataflow engine used in search. In fact, the search algorithms used by almost all search cores are a complex combination of linguistics and statistics, tuned heuristically. You can enjoy the result, and learn how to operate it well,
44
a different level; there should be less black art to building out big or highthroughput systems. Ultimately, what matters is that it works. Other than the dogfooding done at Microsoft (which is pretty big), there isnt much production experience with SharePoint 2013 yet, but every indication is that this is an architecture that is extremely solid for both SharePoint generally and search specifically.
Indexing and Partitions

In SharePoint 2013, there is a brand new search indexing core that is optimized for high volume throughout and overall scalability. The index component is the core of search; it accepts and administers both content and queries. Content data is indexed and stored in index partitions while the index component simultaneously handles queries and generates results. Like many other features of SharePoint 2013, the Index Component and related architecture resembles FAST, with the ability to separate indexes into partitions for query loads and data volumes alike. This is a significant improvement over SharePoint 2010. The index is completely contained in these partitions and stored in the file system, without requiring a separate dip into SQL for metadata or for security entitlements another huge improvement over SharePoint 2010, where the merge of results and security prevented deep refinement and also could bring performance to a snails pace.
Deeper Dives
SharePoint 2013 Search Logical Architecture Technet Search technical diagrams TechNet on Planning for SharePoint 2013
Index partitions are separate, which provide a lot of flexibility. They can be stored individually on disk in a file set. Alternately, they can be further divided into discrete sections containing a unique index component.
45
Microsoft has also developed a new nomenclature to describe the structure of the index. In FAST Search for SharePoint 2010, the structure of the index and configuration was described in terms of rows and columns. Adding columns increased the amount of content you can index and adding rows increased query volume throughput and redundancy. In SharePoint 2013 they have now adopted a Partition/Replica model to define functions within the overall search index, as shown below. Partitions are logical divisions of the overall search index. The entire index is composed of the aggregation of all the primary replicas across the logical partitions. When content is sent to the indexing component, a transaction is generated to acknowledge receipt of the content. Each partition then indexes the content from this transaction log. Secondary replicas are created as read only copies of the primary replica for scaling query volume of adding redundancy to the overall architecture.
one or more replicas of the index. The indexing component is responsible for managing and distributing the index across partitions. If an additional partition is added, the indexing component is responsible for the re-distribution of data across all the partitions. It is important to note that you can add additional partitions without re-indexing the data, but removal of a partition will force a complete re-indexing of all content.
A Simpler, More Robust Approach

This new structure of the search index in SharePoint 2013 allows for a fully redundant, scalable means of indexing content. The fact that you are not copying index files from server to server and row to row means there is considerably lower latency to making search indexes replicated and available. This also significantly reduces the server to server chatter that existed in previous versions. Each partition operates independently thereby increasing throughput and performance of the overall search sub-system. In a nutshell, the benefits of this approach are: 1 Better indexing throughput 2 Less network chatter 3 Faster availability of the search index. As previously mentioned, the indexer is now atomic, which is a major breakthrough in search technology. Though the change is invisible to you, so youll notice that its more robust and that you can do normal backup and restore. Indexing and partitioning are deep stuff, and this is a new core capability done well.
Within a partition, there is only one primary replica that is responsible for writing data in the partition. Each partition can be served by
46
Analytics
Analytics are an often overlooked area, but have a crucial role in search both in providing insight into user behavior and system operations, and in improving the user experience. SharePoint 2013 has a new analytics architecture, which merges web analytics (where people click and navigate) and search analytics (what people search for and what results they get).This is a great improvement over SharePoint 2010, where the web analytics service application was quite limited in both capability and scale.The result is called the Web Analytics Platform, which has been completely redesigned and integrated into the search service application of SharePoint 2013. The analytics architecture consists of the analytics processing component, analytics reporting database and link database (as shown below). The analytics processing component analyzes crawled items (search analytics) and how users interact with search results (usage analytics). It uses the information to improve search relevance, and to create search reports, recommendations, and deep links.
The Analytics Processing Component extracts two kinds of information: Search analytics information such as links, anchor text, information related to people, metadata, etc. from items that it receives via the content processing component and stores the information in the link database. Usage analytics information such as the number of times an item is viewed, from the front-end via the event store. The analytics processing component analyzes both types of information. The results are then returned to the content processing component to be included in the search index. Results from usage analytics are also stored in the analytics reporting database for reporting purposes. The analytics component updates the SharePoint search index at time intervals set via a timer job, so it is independent of the crawl schedule. This can be confusing if you are trying to understand why search relevance changed. There is an extension point for custom events, but the analytics processing and search index update data flows are sealed from enrichment updates outside the SharePoint 2013 crawl. The results are most visible to the user as reports and recommendations. But there are several other ways that analytics shows up: Search relevance is enhanced based on user behavior (views, click thru, etc.) Popularity of content and of topics in discussion threads which is driven from number of views as well as number of unique users to view and can be viewed directly
47
Popularity can also be used to create views through the Content by Search (CBS) Web Part Usage analytics in WCM are particularly important, since they provide essential insight into the effectiveness of your web site. These analytics are search driven, built to scale (scaling was a weakness in SharePoint 2010), and open for extension. A Top Pages web part is included by default. Some data like view counts are also pushed into the index so it can be included in search results, sorted on (i.e. whats most viewed), etc. Personalized search queries and personal query suggestions in SharePoint 2013 are based on analytics data and usage information for each user. Recommendations (both item-to-item and popularity based) are available through this approach, as shown below. The recommended for you list is simply a preconfigured Content by Search web part it looks like a static list but its generated dynamically by search. The addition of both the Link database and the Analytics Reporting database provide for a great deal more personalization, analysis, and relevancy within the engine. The Analytics reporting database has been added to keep track of all forms of analytics. Search Analytics analyze crawled items and how users interact with search results. These actions are stored
in the event store within the Web Front End (WFE) server and are regularly pushed to the analytics processing component where the actions are analyzed and reconciled. They are then pushed into the analytics reporting database and made available to the query and processing components. This allows for search to keep track of user actions, queries, and trends to provide the user with better search results and suggestions. This database now powers features such as personal and engine-wide query suggestions, favorites, and other search personalization components not found in any other enterprise search platform today. Within the analytics system, there are five parts: Event: Each item comes into the system as an event with certain parameters Filtering and Normalization: Each event is looked at for special handling, normalization, and filtering; some are filtered out Custom Events: You can configure up to 12 custom events in addition to what comes OOB Calculation: Sum or average across events Reports: A number of default reports are available, including top queries, most popular documents in a library or site, and historic usage of an item (view counts)
48
The figure below shows an overview of the data flow for usage analytics, usage events, and recommendations.
provided by the engine, as well as improving the quality of queries the user issues.
Deeper Dives
TechNet overview of Analytics in SharePoint 2013
Federation and Result Sources

Federation has been present in SharePoint since Microsoft released Search Server 2008 and Service Pack 2 for MOSS 2007. In a nutshell, this is the ability to query multiple search indexes on behalf of the user and to return all of these results together in a single view. Thanks to federation, users no longer have to use multiple search centers in order to search all content accessible within their organization. Instead they can go to a single search page and get all results available in one place.
Note that 2010 web analytics arent supported running 14 mode, so running in 14 mode means running without any analytics.
Better Analytics Mean Better Search

The quality of search results has direct correlation to the quality of the query and the volume of information that you provide to the search engine. In SharePoint 2013, the addition of the analytics reporting database significantly increases the quality and quantity of information that is provided to the search engine. Knowledge about the person asking the question and the community asking the question greatly improves the quality of results
49
Federating or Indexing?
Whenever someone is newly introduced to federation the immediate next questions that come up is: how does federation relate to indexing? Why should I continue to index remote systems if I can federate these? The truth is that indexing, if possible, is always better. If you index the content you can control relevancy, freshness, performance, faceted navigation, and filtering for the end-users (among other things). When you federate across search indices, you essentially relinquish control of these and become dependent on what the other system is capable of. With federation, your page will also be as slow as the slowest search engine queries and as relevant as the weakest sear engine queried. So federating results must be done carefully. Federation has proven very useful for scenarios where indexing may not be desirable or even feasible. For instance, your content is spread across multiple offices with low bandwidth connection, making any remote crawling last for days. In such conditions, you would not be able to keep your index fresh enough for your end-users. Another scenario is when you have so much content to index that it may not fit within a farm. Imagine, for instance, a 50,000-employee company wanting to search across SharePoint and e-mails. Even at a low estimate of 10,000 items per mailbox (thats roughly six months for an information
worker), this would represent over half a billion items to index! Finally, the remote source may not allow for crawling, technically or through license restrictions (imagine a secured deep-web content provider). In these cases federation is pretty much the only way to go.
Result Sources for Federation in SharePoint 2013

SharePoint 2013 offers improved federation capabilities via a functionality called Result Source. On top of the Open Search protocol already supported in MOSS 2007 and SharePoint 2010, you can now federate results from remote SharePoint farms via result sources. This allows SharePoint 2013 to better cover distributed organizations. A result source is quite easy to configure, as shown below.
While the options on SharePoint 2010 to provide organization-wide search were limited to a multi-search center or a published centralized search service, SharePoint 2013 let you federate across farms. You can now have one farm per region or office location and federate results across farms using results sources. You can do the same between your intranet and extranet farms. While simple on the surface, this functionality fills a serious gap that existed in the overall
50
scalability of SharePoint 2010. In the marketplace, FAST and SharePoint were being criticized for not having a global systems architecture. The approach was to tell users to centrally index all content in a large central farm, if the latency allowed. For global organizations, this was often not feasible. There are limitations to the remote result source construct. It is limited to SharePoint 2013 and requires that all federated farms be upgraded to SharePoint 2013. Results are not interleaved, which is what users typically expect; rather, they are provided in result blocks. Refiners are also not combined in any way. Overcoming these limitations is an exercise left to partners. But despite these limitations, remote result sources are a major step forward and a great feature to use. Result sources also take over the function of scopes in SharePoint 2010. They are a more powerful tool than both scopes and federation, and are worth getting to know.
Security via oAuth

SharePoint 2013 can also provide securitytrimmed results in a much more streamlined way. The Kerberos protocol is no longer a pre-requisite to providing security-trimmed results. Instead SharePoint 2013 offers strong security support through federation by leveraging the claim-based authentication mechanism built into SharePoint 2013 or by using the single-sign-on/secure store service. A trust must be established between the farms using a new method called oAuth which allows the passing of the current users claims to the remote farm when making the search request. This is similar in concept to establishing a trust between servers to consume service applications. oAuth is a new methodology replacing Kerberos shared authentication. When combining result sources and result blocks, administrators can offer their users a single list of results comprised of both local and remote results. The remote results are shown as result blocks (one per source) either above all results, or merged within the local results returned. Note however that faceted navigation and property filtering are still driven by local content only and do not reflect any filters or facets available from the remote indexes.
Exchange 2013 Result Source

SharePoint 2013 also allows administrators to federate results between SharePoint and Exchange, providing a unified search experience where users can search both SharePoint content and their mailboxes through a single search center without having to index. Exchange remains in control of indexing the mailboxes and users can search across systems using federation with no additional hardware requirement. This is available because Exchange 2013 has the same underlying search core (see the Exchange Search chapter)
Office 365 and SharePoint Online

Office 365 has rendered organizations more agile by enabling them to consume SharePoint as a service without having to worry about capacity, backup, or maintenance. However it also created a new challenge as organizations migrating to the cloud were now facing siloed data with some content available online and
51
some content available within the organization network only. There was no single place to search both sets of content from. SharePoint 2013 solves this scenario by enabling Remote SharePoint result sources to also support SharePoint online, therefore enabling scenarios where SharePoint online can federate with the on-premise search engine or vice versa. Result sources represent a key piece of technology to help organization migrate to SharePoint online.
The figure below shows an overview of the Exchange Search architecture in Exchange 2010. Full-text indexes are not stored in your Exchange databases. The search index data for a particular mailbox database is stored in a directory that resides in the same location as the database files. In Exchange 2013, the exsearch capability is replaced with a new search engine and index.
Deeper Dives
Microsofts comparison of indexing vs. federating TechNet configuring result sources Federation Use Cases Federation vs. Indexing
Search in Exchange
Search in Exchange 2013 has been given a facelift. Pull back the curtain, and it is the same new search core used with SharePoint 2013, optimized for large volumes of e-mail. To provide some comparison, Microsoft Exchange Server 2010 Search allows users to perform full-text searches across documents and attachments in messages that are stored in their mailboxes. Exchange Search (also known as full-text indexing) creates the initial index by crawling all messages in mailboxes within an Exchange 2010 database. As new messages arrive, Exchange 2010 Search updates the index based on notifications from the Microsoft Exchange Information Store service. This provides a much more powerful, more effective search for exchange users available through Outlook and Outlook web access alike. Another significant outcome of this change is that Exchange 2013 can appear as a result source to SharePoint 2013, as introduced in the chapter on Federation. This opens up a number of scenarios combining e-mail and other documents. In previous versions of SharePoint, you had the ability to connect to, and index exchange public folders but not personal inboxes. That remains the same with SharePoint 2013 (unless third party connectors are used), but now there is an ability to federate to exchange.
52
The key concept to understand in regard to this functionality is that each system handles the data resident within its silo (e-mail, tasks, contacts in Exchange 2013 and Documents and lists in SharePoint 2013). As discussed in the Federation chapter, there is some downside to this approach federation does not provide the same content processing, relevance, or performance as indexing. But this level of integration between SharePoint and Exchange is a wonderful feature that will help many users.You can get a single view across Exchange and SharePoint, as shown below. One of the new key features in SharePoint 2013 that relies heavily upon this tight integration between SharePoint 2013 and Exchange 2013 is the new Enterprise Content Management (ECM) stack and the associated e-Discovery components. From the e-Discovery perspective, the integration of SharePoint and Exchange allow for in place preservation of information within SharePoint and Exchange. The e-Discovery console allows for dashboard view of integrated, enterprise-wide case management.
A Unified View is a Better View

Since the first release of SharePoint, there has always been a desire to be able to support searching your personal inbox to provide a more holistic view of your information. In
previous versions of SharePoint, there was support for indexing content from Microsoft Exchange, but only in public folders. With the release of SharePoint 2013 and the fact that Exchange 2013 is using the same search infrastructure it is now possible to provide federated access to personal inbox results within SharePoint 2013. The primary benefits of this approach are: 1 Exchange 2013 and SharePoint 2013 leverage the same core search sub-system 2 Possible to include federated personal inbox results from Exchange 2013 3 Eliminates the need to re-index all inbox data within SharePoint 2013
53
Deeper Dives
TechNet Whats new in Exchange 2013 Overview of eDiscovery and In-Place Holds (SharePoint 2013)
Simpler Architecture Means Simpler Administration

SharePoint 2013 search is simpler to administer on many levels than SharePoint 2010 was. Part of this is that there is only one search engine core, and no hybrid architecture (see the One Search Core chapter). For FAST Search for SharePoint, you had to install two farms (a FAST farm and a SharePoint farm) and make them work together, including creating multiple search service applications. There was extra work in installation, extra work in configuration, and extra work in reconfiguring sites. There was also more troubleshooting because the architecture was more complex. There is now only one search core, only one installation, and only one search service application. There is a much simpler architecture, as shown below. As a result SharePoint 2013 is much simpler to install, configure, and troubleshoot as a result.
Search Administration
There tends to be a preconception that search requires no administration. This is due in part to the simplicity of the search interface and the general lack of awareness of how search works. But it is also due to peoples experience of web search, where they dont have to do any upkeep. Little do they realize that Google.com has over 4,000 people administering search full time! Administering Enterprise search doesnt take that much work, but it does need to be someones job (even if not a full time job). There are two main levels of administration: system administration (installation, configuration, topology management), and search administration (rules, best bets, looking for no-results searches).
54
Multiple Administration Components

As mentioned in a previous chapter, the Search Administration Component is now fault tolerant, a big advantage for SharePoint 2013. The administration database now contains only configuration and log information (it also held security entitlements in SharePoint 2010). There are new tools to export and import configuration information, including PowerShell commands, so there are some very cool things you can do in configuration management.
As a new thing in SharePoint 2013, you now have site collection level search administration too. Its pretty similar to central administration, naturally with a few limitations. Site collection administrators can set up and manage App catalogs, do term store management and User Profile Management, as shown in the screenshot below. Site collection administrators also have the power to manage some search settings in their site collections a huge step forward.
Administration at Multiple Levels

Central Administration is still your friend with SharePoint 2013, and still the place where you create search service applications. You will find some new services applications on SharePoint 2013 (such as the Machine Translation Service) and improvements on existing ones. But many of the operations will be familiar. The screenshot below shows an example list of service applications. Search Administration at this level is pretty comprehensive, as you can see just by looking at the search administration screen below (note that this is from Office 365, where its called tenant administration)
55
It is natural that this level of administration was introduced in SharePoint 2013 because of the emphasis on running multitenant in the cloud. Site collection administrators can start crawls; create result sources, and much more. This includes creating managed properties, which could only be done via central administration in SharePoint 2010, despite the fact that site collection or site administrators typically understand their content and crawled properties much better than central IT. Site administrators also have much more power with SharePoint 2013. They cannot create managed properties, but they have significant control over search which applies to their sites. The table below shows some examples of what Site Collection and Site Administrators can do.
sources in order to give powerful search options to their end users. Query Rules and Result Types can be managed down to the site level. These have a wizard for configuration (for example, the query builder interface) with a built-in preview of what the results look like. Result Sources are easy to manage, as shown below.
Administering the New Mechanisms

In other chapters we described new mechanisms like Query Rules, Result Types, and Result Sources. These are very powerful for the administrator. A search service application administrator can create result sources, and the site collection administrators site owner and site designers can also create and configure result
There are very significant improvements in Analytics, resulting from the new Analytics module. There are also better crawl reports, and process reports (see below). Since the Host Controller (described in the One Search Core chapter) is monitoring all NodeRunner processes, it can give the administrator a lot of insight into the system operations.
56
PowerShell like in SharePoint 2010, but in 2013 site collection administrators now have the ability to call a specific ranking model defined by the SSA admin from within query components at the site level. This means that site collection administrators can do much more with relevance control and ranking, choosing from a library created by the central administrator. PowerShell is available at all levels: central, site collection, and site administration, which gives you much more power. For example, we can create a PowerShell script for configuring all our search settings from the very beginning, from creating a search service application, modifying its settings, creating the content sources, etc. PowerShell can also retrieve, create, or modify query results. In addition PowerShell can get keywords, modify ranking models, and more. If you havent learned PowerShell already, you will definitely want to learn it now.
PowerShell is Your Friend

PowerShell support was added to SharePoint 2010 and many administrators fell in love with it for good reason. There are even more PowerShell options in SharePoint 2013. This includes more PowerShell commands for search: general search administration, crawling, search service application, querying, metadata, and topology. In SharePoint 2013, PowerShell can now manage content sources and crawlers, not just report status. There are new options for creating a new search topology based on an XML configuration file, along with export and import commands. This means you will be able to create the same search topology in your staging environment, in your test environment, development environment and production environment. This can be very useful for performance testing, custom development, creating standardized configurations, etc. Ranking models are still configured via
Big Advances in Search Administration

Search administration is still a complex task in SharePoint 2013, but Microsoft made the job much easier in this new version. The new single search core provides the power of FAST with a much simpler configuration than FAST Search for SharePoint. Search topology administration is still complex, but the topology can be much bigger and much more powerful. There are
57
improvements on administration from all sides: crawling, content processing, query processing, analytics, and user experience. This is a search that administrators can learn to love.
Upgrade and Migration

You will love the capabilities of SharePoint 2013, and you probably own them already. But how much pain is it to move to them? Many organizations endured a very painful move from SharePoint 2003 to SharePoint 2007 and are still wary, despite a generally smooth move from SharePoint 2007 to SharePoint 2010. The good news is that this release has put a lot of focus on upgrades and there is a lot of good material. In order to move customers on O365 to the new release, Microsoft had to develop techniques for doing this more smoothly than ever before. The bad news is that upgrades are still tricky, especially for large and highly customized SharePoint farms. Even though the upgrade itself is fairly straightforward, there are usually lots of factors besides the software itself the hardware necessary to handle an upgrade (there are no in-place upgrades to the new version), the user awareness and education, and the work needed to take advantage of new features. There are techniques that can reduce the risk and pain of upgrades, especially for search. These include things like use of cross-version federation and search-first migration. But lets start with a look at the standard basic upgrade.
Deeper Dives
TechNet Index of Windows PowerShell cmdlets for SharePoint 2013 Technet search topology in SharePoint Server 2013 SharePoint 2013 Developer Dashboard TechNet Manage the search schema in SharePoint 2013 TechNet View search diagnostics in SharePoint Server 2013
Database Attach Upgrade

The only upgrade method for going from SharePoint 2010 to SharePoint 2013 is a Database Attach Upgrade. (In-place upgrades are now only for build-to-build changes). This works for both content databases and services databases.
SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH 58
The search databases have changed significantly with SharePoint 2013. The search administration database supports a database attach upgrade, but the search index databases do not. As with essentially all search engines, to do an upgrade you will need to recrawl your content. One very nice advantage with SharePoint 2013 is that you can use PowerShell to make this happen with much less effort. The Database Attach method does help a lot with search. Content sources, server mappings, scheme, federated locations, scopes, best bets, and the like are all preserved and upgraded. As mentioned in the search administration chapter, there are tools for configuration import and export as well as PowerShell commands that can do very interesting things, including automate and tailor the upgrade process.
users to preview an existing site in 15 mode. Deferred site collection upgrade permits use of SharePoint 2010s UI with fewer operational hassles, while retaining the master page, JScript, SPF, and CSS applications of SharePoint 2010. This is an expensive operation, so you probably dont want to use it everywhere, but it is a great facility to allow for safe, well managed upgrades both from the software perspective and the user perspective. With Search, an upgrade of search centers generates result templates that include the hover panel, and which have previews (when a separate Office Web Apps server or set of servers is available). Scopes are upgraded but cant be changed they are replaced by the new Result Templates, but the corresponding result templates arent generated automatically.
Deferred Site Collection Upgrade

The visual upgrade available in SharePoint Server 2010 has been replaced by a deferred site collection upgrade in SharePoint 2013. This allows existing 2010 site collections to work unchanged in SharePoint 2013. No SharePoint 2010 installation is required; SharePoint 2013 has all of the required SharePoint 2010 files included. This process is much safer, because it is deeply backwards compatible. It is the default for all site collections upon a database upgrade, which then automatically are running in 14 mode on SharePoint 2013 servers. There is a new facility for health checks along with the upgrade, and a cool capability to create Upgrade Evaluation Sites. Essentially, this makes a side by side copy of an existing site collection, and allows
Working Across Versions: Search-First Migration

Since search is greatly improved in SharePoint 2013, it may be worth considering a search first upgrade. This lets you get the benefit of the new features and capabilities, without needing to do everything at once. You can upgrade your content farms at any pace that works for you, while serving everything from SharePoint 2013 search. This pattern uses something called SharePoint 2013 Federated Services. Only a few federated services support this: Search, Profile, Social, Secure Store, Managed Metadata, and BCS. But this is everything you need to do a search first migration.
59
There are several steps to a search-first migration, as shown below.
Sequentially the steps are as follows: 1 Deploy and configure a new SharePoint 2013 Services farm, including a search center. Migrate the search settings from the SharePoint 2010 farm. When the search-first migration is complete, this farm provides search functionality to end-users who are still working in the SharePoint Server 2010 farm. 2 Crawl all content in the SharePoint Server 2010 farm by using a crawler (or multiple crawlers) in the SharePoint 2013 services farm. Continue to crawl this content regularly. 3 Configure the SharePoint Server 2010 farm to consume search from the 2013 services farm, using federated services. Some things will be best consumed by doing redirects (for example using a new search center with the new functionality cant be done via federated services). The search-first migration pattern opens the door for a much wider set of possibilities hybrid solutions.
Hybrid Solutions
When you talk about the upgrade of Search from SharePoint 2010 to SharePoint 2013 there is the potential for some hybrid solutions using different versions of SharePoint or using cloud and on-premise SharePoint instances in the same company. Generally, hybrid means a combination of on-prem and cloud content in a single view. There are several ways to accomplish this, including indexing and federation as mentioned in the Federation chapter. The figure below illustrates the various permutations of hybrid configurations.
60
Crawling and indexing content from the cloud (such as from O365) is a very solid way to create a unified view, and has the benefit that indexing generally has: unified content processing, solid and consistent relevance and navigation, and consistent fast performance. Although this scenario is not supported by OOB connectors with SharePoint 2013, there are partner-built connectors that accommodate it. With SharePoint 2013, the remote result source construct means that a view can be created using federation, specifically between O365 and on-prem SharePoint. There are limitations to the remote result source construct. It is limited to SharePoint 2013 and requires that all federated farms to be upgraded to SharePoint 2013. Results are not interleaved, which is what users typically expect; rather, they are provided in result blocks. And refiners are not combined in any way. Overcoming these limitations is an exercise left to partners. But despite these limitations, remote result sources are a major step forward and a great feature to use.
The same idea applies to more general scenarios. When you have more than one SharePoint farm, you can handle cross-version scenarios. You can have a Search on SharePoint 2013, while you have content and other applications on SharePoint 2010 or even in SharePoint 2007. You can have SharePoint 2013 in the cloud with SharePoint 2010 on-prem. You can include other content in the cloud that should be crawled, such as Microsoft CRM online or SalesForce.com. With these techniques, its possible to field a very broad with different versions and different options. This helps with many things, including migration. Federation applies well to cross-version scenarios. Although SharePoint 2013 only supports same-version remote result sources, it is feasible for partners to create federation across multiple versions, which appear as a result source. A configuration like the one shown below provides many benefits. With respect to upgrades and migration, it means that legacy search systems can be left in place and federated into a SharePoint 2013 search center. While this is not as good as combining all content into a common index, it is a very useful technique that allows you to upgrade or migrate complex systems a piece at a time.
Cross-version Configurations
How do these scenarios help with upgrade and migration? If you extend them to cross-version configurations, it becomes clear. Search-first migration is an example of crawling on-prem content from on-prem search (the upper left scenario in the figure above), but across versions. By crawling SharePoint 2010 content from SharePoint 2013 search, you can provide an upgrade path that can be done a step at a time, maximizing the benefit to users while minimizing initial effort.
61
Cross-version Hybrid Configurations

A variant of this is support of cross-version hybrid configurations. Specifically, you may wish to adopt SharePoint 2013 online via O365 to have different versions.You may not actually have a choice in the matter, since O365 will shift to SharePoint 2013 fairly quickly, faster than you may be ready to upgrade your on prem SharePoint farms. But the remote result source mechanism in SharePoint 2013 is very powerful, and has solved many of the toughest aspects of managing hybrid configurations with O365 such as security and single sign-on. It is feasible (though not OOB) to apply this to a crossversion hybrid configuration as shown below.
The case of migrating from SharePoint 2010 search to SharePoint 2013 search is the best supported one. There are some gotchas in this migration as mentioned throughout this e-book, but the process is generally smooth and well covered by Microsoft. Going from SharePoint 2010 search to SharePoint 2013 is a step up in nearly every way, so there arent that many rough spots to consider. If you are migrating from FAST Search for SharePoint, many of the same tools and techniques apply, but there are more corner cases to consider and more feature changes to consider. If you are moving from FAST ESP or FAST Search for Internet Sites, there are significantly more considerations. The migration patterns and techniques still apply, but you are more likely to have a heavily customized search deployment that uses special FAST features which have been supplanted by other mechanisms. There is help available however. Microsoft has a big ecosystem of partners and there are some that have specific focus, tools, and techniques for this kind of migration. You may not get direct support from Microsoft, but you can tap into this ecosystem for help.
Migration From Previous Search Versions

The changes in SharePoint 2013 search are powerful and far reaching. Fielding a single new search core resolves a historical challenge with Microsoft search (a complex product lineup with many different versions). One tricky aspect of this change is that migrating from previous search versions depends on the flavor of search you are migrating from.
Summary Options for Upgrade and Migration

Upgrading to SharePoint 2013 can be seamless, and there are valuable tools and processes provided OOB. The only supported approach is a database attach upgrade, so you should expect to provide extra hardware resources for your upgrade. But the deferred site collection upgrade facility provides a safe approach to upgrades and lets you delegate the work for
62
each site collection to the appropriate owner if you like. Upgrading search is part of upgrading SharePoint, and the standard upgrade process from SharePoint 2010 to SharePoint 2013 covers search well. But search poses special challenges the more complex and customized your search configuration is, the more challenging the upgrade will be. Search also offers solutions to many upgrade challenges for SharePoint as a whole. Since search bridges information silos, it can bridge across different farms, across different versions, and across on-prem and in the cloud instances. Techniques such as search-first migration, crawling remote farms from SharePoint 2013, and use of federation are available not OOB but through Microsofts ecosystem. A unified view across these different dimensions provides users a great experience while allowing you to upgrade or migrate one piece at a time.
Deeper Dives
Services upgrade overview for SharePoint Server 2013 SharePoint Online administration BA Insight resources for Integrating O365 content TechNet SharePoint Server 2010 deprecated search features TechNet FAST Search Server 2010 for SharePoint deprecated features
63
C HAPTER 5
Applications and Development

New Models for Search-Based Applications
64
CHAPTER 5 NEW MODELS FOR SEARCH-BASED APPLICATIOnS
The New Development Model in SharePoint 2013

With a new development model for SharePoint 2013 and for search, the capability to extend search is much more accessible. We think this development will foster a lot of exciting searchbased applications. Development with SharePoint 2013 emphasizes standard web technologies such as JavaScript and HTML, client side programming and remote calls. Theres a focus on running applications in the cloud, and there are several options for extending the out-of-the-box capabilities of the product. There is also the option to build business solutions with no or minimal use of server-side code. JavaScript and modern HTML and CSS know-how are important for the UI designer and developer on SharePoint 2013. It should be easier for designers to use tools they are familiar with. Visual Studio 2012 offers strong tooling for both Office 2013 apps and SharePoint 2013 applications and solutions. A key goal the SharePoint 2013 for customization scenarios was to make developing applications for SharePoint much more like developing Facebook apps.
2010 CSOM is a Windows Communication Foundation (WCF) service with three different proxies to enable Silverlight, JavaScript, and .Net managed clients to call into SharePoint remotely. With SharePoint 2013 the server side code runs off the SharePoint server farm via declarative hooks like apps, declarative workflow and remote events which then communicate back to SharePoint using CSOM or REST.
A New Programming Model

The figure below gives a birds eye view of the changes between the SharePoint 2010 and SharePoint 2013 programming models. In SharePoint 2010, your custom code ran either server-side in SharePoint (as fully trusted code or in a sandboxed solution), or via a Client Side Object Model (CSOM). The SharePoint
There are lots of advantages to this model. Traditional SharePoint development was heavy lifting and had a steep learning curve; the new SharePoint 2013 model is much more manageable which will open up SharePoint to a much wider audience of developers. Serverside code can impact the performance of SharePoint, be complex to install and upgrade, and cant be run on public cloud services. The CSOM in SharePoint 2013 is much more powerful you can do almost everything the server side APIs did in SharePoint 2010. In addition, it supports OData now the
65
leading industry protocol for performing CRUD (Create, Read, Update and Delete) operations against data, as shown below. Depending on your deployment scenario, you can still use sandbox and farm solutions to push server side code to SharePoint 2013, however, Microsoft recommends that developers follow the new app model as the preferred way of building their custom applications for SharePoint 2013. The message is dont make any new sandbox solutions and build new farm solutions only if you absolutely have to.
3 Azure Auto-Hosted App (which runs in an Azure instance which is invisibly provisioned by Office 365) Apps are simple and powerful, but they have a number of limitations, and there are still many cases where SharePoint solutions are called for instead. Anything that uses server-side code, does farm-level work, has a high level of complexity, or has installation coupling or dependencies calls for a SharePoint solution rather than a SharePoint App.
Whats Special for Search

The SharePoint 2013 Search CSOM opens most (but not all) of the Query object model functionality for online, on-premises, and mobile development; the search results data is in JavaScript Object Notation (JSON). Queries support two language syntaxes: KQL (Keyword Query Language) and FQL (Fast Query Language); SQL is no longer supported. In addition to the CSOM, there is a REST (Representational State Transfer) service, so you can remotely execute queries against the SharePoint 2013 Search service from client applications by using any technology that supports REST web requests. The Search REST service exposes two endpoints, query and suggest, and will support both GET and POST operations. Results are returned in either XML or JSON format. At one level, a search app is just another SharePoint app, and a search solution is
66
Apps for SharePoint

Theres a new way of packaging and deploying code in SharePoint 2013 which is aimed at development of lightweight apps. Apps for SharePoint dont live in SharePoint. They execute in the browser client or on a remote Web Server; theyre granted permission into SharePoint sites via OAuth (a standard for providing delegated authorization to apps); they communicate over the new SharePoint 2013 CSOM APIs. There are three types of apps you can build for SharePoint 2013: 1 SharePoint-Hosted App (which runs within the browser) 2 Provider-Hosted App (which runs on another web server in the datacenter or cloud)
just another SharePoint solution. This is revolutionary enough: it means you can use search via a REST interface, include it in an Office App, and use it easily in combination with other parts of SharePoint. But customizing search also means creating or customizing connectors using BCS or a protocol handler (see the content capture chapter), customizing linguistics using the Content Enrichment Web Service (CEWS) (see the content processing, linguistics, and CEWS chapters), working with other service applications, and more. There are numerous search-specific web parts, including the new Content by Search web part (shown below), which is a powerful swiss-army knife tool.
automated language translation of files (think multilingual search), and the Work Management Service that provides task aggregation functionality. If you are doing query-side-only work, you might be able to use an app model. But for the most part, developing sophisticated searchbased applications will remain the domain of SharePoint solutions with SharePoint 2013. There are several things (connectors and pipeline extensibility) which are still per SSA and not per tenant.
Building Search-based Applications

SharePoint 2013 is a great platform for building search-based applications. These run a wide gamut, from configuring departmental centers using query rules and result blocks, through extending content processing to add domain-specific entity extraction, to creating brand new user experiences. They are often specific to role, industry, and topic and they usually have a strong and measurable business value because the end users use them for specific purposes. Wherever there is an identifiable group with a need to work with unstructured content (or a mix of structured and unstructured content) theres a need for a search-based application. We discuss this more in the chapter on Search-Based Applications. Nothing is perfect, and there are still challenges with the search development model. Some of the limits of SharePoint 2013 include: On the content side, there is no push API for content, nor an ability to do partial updates. Continuous crawls
Search combines well with other parts of SharePoint with content management, with workflows, with BI, and with sites. It also can be used with several of the new service applications in SharePoint 2013. These include the Machine Translation Service that supports
67
are limited to SharePoint content only. Theres no mechanism for getting external data indexed into O365. Developers that want to approximate these from the outside have to live with limited performance and build a very complex structures or use third party frameworks. Many of the mechanisms inside search are sealed and cant be extended. Update groups, query flows, analytics processing, web crawling are examples. Its completely understandable that these be kept intact from meddling developers, and there are some of these that can be influenced safely using partner products. But its frustrating to see these mechanisms and not be able to touch them. The SharePoint App model and SharePoint Marketplace are aimed at lightweight, simple apps and not something you would use for a full business application today.
Developing with search is still hard. Intrinsically, areas like content processing and relevance are imperfect, since were dealing with human language and subjective opinions of the right answer. There are no joins or aggregation internal to search so there are limits to combining structured and unstructured content. But SharePoint 2013 is far ahead of any search platform in terms of available capabilities, performance, ease, and safety of development. And there is a strong ecosystem with available building blocks and complementary capabilities to use in creating great applications with search.
Deeper Dives
Book chapter on developing with search from Wrox Professional SharePoint 2010 Development MSDN overview on developing with SharePoint 2013 MSDN section on building search queries with SharePoint 2013 SharePoint 2013 Developer Dashboard
68
The Content Enrichment Web Service (CEWS)

As mentioned in the Content Processing chapter, one of the biggest changes in SharePoint 2013 is the availability of the Content Enrichment Web Service (CEWS). This provides a way to add linguistic processing of any type, such as concept extraction, relationship extraction, geo-tagging, summarization, etc. With FAST Search for SharePoint, it was possible to extend the content processing pipeline through a sandboxed application, but this was both slow and limited in the information it could access. SharePoint 2013 introduces a much more open API which makes it possible to add specialized linguistics at lower levels as well as sophisticated text analytics. CEWS is the key extension point for content processing in fact, the only extension point outside of changing the content or modifying it in a custom connector. There is no Content API in SharePoint 2013 for updating metadata into search index independent of a crawl. CEWS calls an external web service using SOAP via a proxy, as shown below.
Big Changes in Pipeline Extensibility

CEWS replaces FAST Search for SharePoints pipeline extensibility stage, which had a number of shortcomings. With FAST Search for SharePoint, an executable was run within a sandbox near the end of the content processing pipeline this was a major performance bottleneck and deployment headache. Some crawled properties were available, but derived properties were not. No properties could be modified you could return things in new properties but only to a limited extent. And the executable was called for all content, so filtering logic was needed outside the pipeline, and the performance penalty of calling it was incurred for all content. With SharePoint 2013, things have changed dramatically. Using a web service callout opens up many options and removes some of the difficulties in writing pipeline extension stages. The processing pipeline passes designated managed properties (including document text) to the remote service. There are hidden and read-only properties, but some managed properties (like Title) can be modified. The mechanism for CEWS is fairly simple: The content processing component sends a SOAP RPC call to a configurable endpoint over HTTP. The payload contains an array of property objects. The web service performs some custom logic on the array of property objects, and returns an array of modified or new property objects.
69
The web service must send a response to the web service client within a given timeout. No specific authentication or encryption mechanisms are supported as part of the contract.You can, however, apply your own security on the transport mechanism. A trigger condition is registered in the ContentEnrichmentConfiguration object which allows control of when the content flow calls out to an external web service. A set of PowerShell commandlets are provided to control the configuration, and there are robust error handling mechanisms built in.
The advantage of this is that you can provide custom linguistics even at a fairly low level, and influence other aspects of the pipeline. The control afforded by this is wonderful and will be exciting to those wanting to address specific linguistic processing at a low level. The disadvantage is that you cant leverage the work done in the pipeline when you are doing external processing, as shown below. This not only means extra work as a developer, but introduces the potential that linguistic processing could get out of sync.
What to Look Out For

For those familiar with pipeline extensibility, you will find CEWS easy to use. However, there are a variety of limitations and gotchas to look out for. One key difference in CEWS from FAST Search for SharePoint is where it is called. Specifically, you get content and managed properties after document parsing but before word breaking, as shown below.
The extensibility call outs are invoked synchronously, in line with the processing flow, so long-running enrichment tasks or batch-oriented processing tasks will require enrichment data flow management independent of and outside SharePoint 2013. Not all managed properties (or any crawled properties) are visible to the CEWS and less state (potentially useful for supplemental linguistics processing) is exposed than in FS4SP. Finally, the CEWS is visible as a single logical endpoint to the potentially many content processing flow instances in SharePoint 2013. There is only one ContentEnrichmentConfiguration object active, and only one trigger, etc. This means
70
that throughput management, and support for multiple enrichment stages (more than once instance of taxonomy classifiers or custom entity extractors) need to be managed externally, which will pose some interoperability challenges if you are interested in doing multiple types of content enrichment. * Note: The CEWS call out is not part of O365 and is only available for the Enterprise Edition of SharePoint 2013. CEWS is a new mechanism in SharePoint 2013. It has many nice aspects it is a more standard, higher performance mechanism than that available in the past. It also provides the ability to modify some managed properties, making it possible to address use cases that were nearly impossible with FAST Search for SharePoint. CEWS also has limitations, and using it will require special attention by developers. But all mechanisms have limitations. Overall, Microsoft has provided a strong and essential extensibility mechanism that lets you do magic things with content processing and linguistics.
Search-Based Applications with SharePoint 2013

SharePoint 2013 is designed to support applications. Many parts of SharePoint operate out-of-the-box as applications (formerly called workloads, although this term doesnt seem to be used much with the new release). In addition to a new development model (covered in the previous chapter), a new App model and App marketplace, and an emphasis on running applications in the cloud, there are many capabilities to leverage in building new applications. Mobile applications, which played poorly with SharePoint 2010, are fully supported now. SharePoint is, more than ever, an application platform with a set of prebuilt applications and apps included. Search-based applications are applications like any other, except that they take advantage of search technology in addition to other elements of SharePoint to create flexible and powerful user experiences. Because search is essential for dealing with diverse content, especially unstructured content, applications using search are found everywhere, and their importance is growing rapidly in step with the explosion of content volume. Yet search is generally not well understood or fully used by developers. Even though search is simple on the outside, it is complicated on the inside. Many people arent comfortable with the notion of a search-driven application until they see one.
Deeper Dives
MSDN Section on Content Enrichment Web Service (CEWS)
71
A Platform for Search-based Applications

SharePoint 2013 is explicitly meant to support search-based applications. As the figure below shows, search is built as an extensible platform. There are both general-purpose search and some pre-built search-based applications included and search is also used pervasively throughout SharePoint, especially in WCM and MySites. Most importantly there are great facilities for deploying apps and applications using search, with tooling and hooks specifically for application developers. So partners and customers can create Search-Based Applications and deploy them on the same platform.
with Lync which provides presence information and makes it easy to connect with people directly from search results. Site search (aimed at making public web sites easily navigable) is a big step up with this release as well. There are also search facilities built into each site for example, every document library now has a search box at the top that enables users to search across metadata and the full text of its documents, and the result list is presented as a standard SharePoint view rather than as a results page. A video search SBA is provided out-of-the-box, including a pre-built presentation format that makes it easy to recognize the video content youre looking for. There are significant enhancements in video support for SharePoint 2013 generally, including a built-in HTMLHTML 5 video player. The use of video including enterprise podcasts will be on the rise, so video search is now an important facility.
Search Driven Web Content Management

Web content management makes extensive use of search in SharePoint 2013. Search makes it possible to create compelling user experiences, and drives several key features. Content by search The new Content by Search web part displays indexed content, letting you show content dynamically across multiple site collections. Users dont know this is search powered, it just looks like well-presented content, as illustrated in the screenshot below. For a case like online catalogs, this is an essential mechanism and one that works very well.
Out-of-the-box Applications
There are three general-purpose search applications included out-of-the-box with SharePoint 2013. Intranet search -typically used for all employees to find content throughout the enterprise, benefits from personalized search results based on search history and rich contextual previews. People search (which includes the advances from SharePoint 2010 such as phonetic name search) is integrated
72
metadata navigation defined from values in the term store is available. Page hierarchies, URLs, and Topic Pages Pages and page hierarchies are easily defined from the term store. You can also generate topic pages, which makes SEO straightforward. The figure below illustrates how this works; SharePoint now generates friendly URLs which makes this process work like any normal site.
Theres an HTML-based presentation template model that makes it easy to fine tune the look and feel, and built-in web part editors to set up the query driving the content presentation, as shown below. This doesnt require writing any code and is well within the reach of a business analyst. You see immediate previews of what the results will look like.
Recommendations A new recommendation facility is included which can surface suggestions based either on popularity or on correlations between items (see chapter on Analytics) There are other exciting things about WCM with SharePoint 2013. Standard web design tools and workflows are supported; there are great facilities for content variations including a built-in language translation service; you can publish easily across sites, and video and images are easily embedded and beautifully rendered across multiple devices and resolutions. The URLs generated are clean, and search-engine optimization is directly supported. You can use catalog-enabled sites for scenarios such as a content repository, knowledge base, or product catalog. But the heart of WCM in this release is search, which makes dynamic page generation and remarkable site experiences possible.
Metadata Navigation As described in the chapter on refinement and faceted navigation, facets are available for users to drill into content. In addition to refiners (which are driven from the values in the content),
73
MySites Driven by Search

The social features in SharePoint, including MySites are dramatically enhanced, building on the capabilities introduced in SharePoint 2010. SharePoint 2013 adds new features that improve and facilitate the enterprise social activities within the organization: you can follow people as well as content, share personal documents easily and keep track of access, keep up-to-date with activities of interest. Under the hood, there are two lists for providing social features: the Microfeed list and the Social List. Search drives several key social features in SharePoint 2013, even ones where its not apparent that search is used under the hood. Clicking on a hash tag in a post or discussion shows a list of all conversations about that topic enterprise-wide. In MySites, users can access a list of all SharePoint tasks assigned to them, regardless of which sites the assignments are stored in.They can also see the documents they are following, as illustrated below. Another example is in My docs: shared with me, which shows you all the documents shared with you from everyones My Documents. It looks like a form view but, in reality, it uses Search underneath to aggregate content from all MySites across site collections. Behind the curtain, theres a query against a ShareWith field for your name, which also filters out docs shared with everyone. All security trimmed, naturally.
e-Discovery Driven by search

SharePoint 2013 has gone one giant step further toward fielding a full e-Discovery
application. There is now unified discovery across Exchange, SharePoint and Lync, as shown below. Exchange now has the same search infrastructure as SharePoint, which makes unifying the search much easier (Lync archives via Exchange). The Discovery Center in SharePoint uses this to provide a unified console, with in-place holds that dont impact the end users ongoing work. Theres more to e-Discovery than search, of course preservation, holds, policy management, and export. But search is the cornerstone and is what makes it possible to recall all the information needed to react to legal actions, without getting irrelevant information that you have to sift through. The e-Discovery functionality in SharePoint Server 2013 provides is a big step up from
74
SharePoint 2010, and is probably the first time you could consider this to be a full applications. There are several parts to e-Discovery: A site collection where you perform e-discovery queries across multiple SharePoint farms and Exchange servers and preserve the items that are discovered. In-place preservation of Exchange mailboxes and SharePoint sites including SharePoint list items and SharePoint pages while still allowing users to work with site content. Support for searching and exporting content from file shares. The ability to export discovered content from Exchange Server 2013 and SharePoint Server 2013. The eDiscovery Center site template creates a portal for discovery cases and lets you conduct searches, place content on hold, and export content. For each case, you create a new collaboration site that uses the eDiscovery Case site template. You can export the results of an eDiscovery search for later import into a review tool. SharePoint 2013 provides in-place holds content that is put on hold is preserved, but users can still change it. The state of the content at the time of preservation is recorded. If a user changes the content or even deletes it, the original, preserved version is still available. To implement eDiscovery across an enterprise, you configure SharePoint 2013
Search to crawl all file shares and websites that contain discoverable content, and configure the central Search service application to include results from Exchange Server 2013. Any content from SharePoint 2013, Exchange 2013, or a file share or website that is indexed by Search or by Exchange Server 2013 can be discovered from the eDiscovery Center.
Customize, Extend, and Create New Search-Based Applications

Search-based applications are found over a very wide range of roles, industries, and levels of sophistication. There are common patterns to these applications; the table below shows just a few of these application patterns. SharePoint 2013 provides models that span a spectrum from simple configuration, through extension of capabilities, to creation of new sophisticated search based applications. The new mechanisms in SharePoint 2013 for customizing user experience (query rules, result blocks, and result sources) and the ability to theme SharePoint easily provide a lot of power
75
for customizing search experiences without any code at all. Many areas can be extended connectors, content processing, relevance, query processing, and UI with moderate effort and standard tools. Fully custom code is supported as well. We find that the use of modular building blocks speeds the construction of search based apps dramatically. Since these applications follow common patterns, a relatively small number of sophisticated modules can cover a large number of applications. If you undertake a sophisticated search-based application, consider whats available on the market as well as what you might build yourself since pre-built building blocks can save substantial time and reduce risk.
Deeper Dives
Book chapter on developing with search from Wrox Professional SharePoint 2010 Development Overview of eDiscovery and In-Place Holds (SharePoint 2013) Blog on using the Content by Search Web Part BA Insight Search as a Development Platform TotalView Search-Based Applications
Acceleration of Search With SharePoint 2013

Microsoft has taken a big step forward in helping people do more with search: It is far easier to own and use high-end search capabilities Search is used pervasively Some search-based applications are built-in It is easier to create and operate tailored search-based applications We expect many more interesting applications to emerge as a result.
76
Conclusion
77
CONcLUsION
This e-book has covered a lot of ground, since SharePoint 2013 has so many underlying changes, new capabilities, and new features. Weve tried to cover everything in concise, readable chapters, across five major sections: User Experience; Working with Queries; Working with Content; Architecture, Deployment, and Operations; and Development and Applications.
User Experience
This new platform has a lot to love; it is:
Clean, fast, and easy to use Straightforward to install, administer, and scale Provides very powerful high-end search
features
Microsoft has done a remarkable job making this high-end technology accessible and easy for the mainstream. However, it is not a perfect platform, and there are still challenges with search. Search is, after all, a journey. BA Insight is entirely focused on the road that lies ahead for search and SharePoint 2013, and we stand ready to help you on your journey. As you learn more about SharePoint 2013 and search, here are some things to consider and some steps wed suggest:
Makes creating search-based applications

simpler than ever
78
CONcLUsION
THINgs TO cONsIdER
SUggEsTEd NEXT sTEps
SharePoint 2013 includes a very powerful new search engine. There are new mechanisms in SharePoint 2013 (result sources, query rules, and result types) that replace familiar ones, and take some getting used to. These are now in the hands of site collection administrators and site administrators, so there is much more control at that level. Crawling and BCS have evolved further in SharePoint 2013, including a new continuous crawl feature, however connectors are still largely left to partners. The new search core in SharePoint 2013 is different from either FAST or SharePoint 2010, and you will notice improvements in relevance, performance, and robustness. Hybrid configurations across on-prem SharePoint 2013 and O365 are supported OOB using result sources. Crossversion configurations are not supported OOB but there are techniques and partner products for these cases. Though SharePoint 2013 Search is great, there are still limitations and cases where the mechanisms dont cover what you wish to accomplish. The term store is now an administrative center for entity extraction, query suggestions, faceted refinement, WCM page hierarchies, and more. If you are coming from FAST, you will recognize a lot of concepts and powerful features. But you will also notice a number of things missing. SharePoint 2013 has a new development model that is lightweight and available to a much wider range of developers. Search in SharePoint 2013 is a powerful platform designed to support search-based applications.
Get to know the new release ASAP download the bits, read about it, and confer with folks that know it. Try to develop a champion amongst your site administrators, who learns the new tools. Set up a playpen system where people can get used to the new mechanisms.
Take stock of your current and future content sources and think about extending search to more content. Look at learning how to make simple connectors yourself, and at Microsoft connector partners for more complex systems. Consider how quickly you can migrate to the new platform. Factor in techniques which allow you to upgrade a step at a time, such as search-first migration and cross-version federation. Consider adopting O365 quickly, in ways that you dont need to do it all at once. Talk to Microsoft Partners about federation, cross-version configurations, and migration. Look to the Microsoft partner ecosystem for training, components, and innovative solutions. Get familiar with the term store. Find out where there are key lists in your organization (product names, project names, industries, etc) you will be able to import these into the term store and use them for entity extraction. Focus on the problem, not the specific mechanism theres a way to get it solved with this platform. Turn to Microsoft Partners for products that round out all the possibilities. Consider applying JavaScript developers to building SharePoint Apps. Look around your organization for opportunities to apply search-based applications.
79
BA Insight is Social!
Read our blog: www.DoMoreWithSearch.com Follow us on Twitter: @bainsight Linkedin Group: Microsoft Enterprise Search Or find us on Facebook
BA Insight is a leader in agile information integration, enabling business to drive innovation by leveraging all knowledge and data across the enterprise. Offering new generation, cost effective alternative to expensive systems integration, the companys award-winning technology provides a scalable foundation for liberating enterprise data, both structured and unstructured. Microsofts go-to partner for advanced search technologies, BA Insight enables customers to leverage their investments in SharePoint, FAST and other enterprise systems, and extend them with an overlay of easy-to-assemble, highly targeted business applications. Since 2004, more than three million users around the world have relied on BA Insight for low-cost, on-demand access to the information they need. To learn more about BA Insight, visit www.BAinsight.com.

BA Insight SharePoint 2013 Enterprise Search Guide

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BA Insight SharePoint 2013 Enterprise Search Guide

Uploaded by

Copyright:

Available Formats

The Essential Guide to Enterprise Search in SharePoint 2013

ABOUT THE AUTHORs

Jeff Fried, CTO, BA Insight

Agnes Molnar, MVP

Michael Himelstein, vTSP

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH 

WHATs INTRODUCTiON IN THIs E-BOOK?

Working with Queries & Results

Working with Content

Architecture, Deployment & Operations

Applications & Development

WHATs IN THIs E-BOOK?

Highlights and Key Take-Aways

 The search experience is easy, clean, and fast.

Working with Queries & Results

Working with Content

  With continuous crawling, users get fresher content faster.

Architecture, Deployment & Operations

Applications & Development

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH 

SharePoint 2013 Search is Here

SharePoint 2013 Search is Here

Clean, fast, and easy to use. Straightforward to install, administer,

Provides very powerful high-end search

Makes creating search-based applications

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH 

CHAPTER 1 THE NEW FACE Of SEARCH In SHAREPOInt 2013

Raising the Bar: The SharePoint 2013 User Experience

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH 

CHAPTER 1 THE NEW FACE Of SEARCH In SHAREPOInt 2013

Open for Designers

Mobile Challenges and Opportunities: Windows 8 and Metro

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH 

CHAPTER 1 THE NEW FACE Of SEARCH In SHAREPOInt 2013

First Class Search Interactions More to Love

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH 

CHAPTER 1 THE NEW FACE Of SEARCH In SHAREPOInt 2013

The New Face of Search

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH 

CHAPTER 1 THE NEW FACE Of SEARCH In SHAREPOInt 2013

The SharePoint 2013 Search Center Overview

Document Previews and the Hover Panel

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH 

CHAPTER 1 THE NEW FACE Of SEARCH In SHAREPOInt 2013

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH 

CHAPTER 1 THE NEW FACE Of SEARCH In SHAREPOInt 2013

Refiners and Faceted Navigation

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH 

CHAPTER 1 THE NEW FACE Of SEARCH In SHAREPOInt 2013

Navigation and Search Unified

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH 

CHAPTER 1 THE NEW FACE Of SEARCH In SHAREPOInt 2013

Search Center Setup

On the Premise or In the Cloud? Get Going Faster

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH 

CHAPTER 1 THE NEW FACE Of SEARCH In SHAREPOInt 2013

Changes to Sites and Site Templates

W  orking with Queries and Results

New Mechanisms in SharePoint 2013

CHAPTER 2 NEW MECHAnISmS In SHAREPOInT 2013

Query Processing: The Search Engines Automatic Transmission

Query Processing in Action

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH 

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

The search experience is easy, clean, and fast.

With continuous crawling, users get fresher content faster.

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

W orking with Queries and Results

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

Working with Content

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

A rchitecture, Deployment, and Operations

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH

SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH