Professional Documents
Culture Documents
BA Insight SharePoint 2013 Enterprise Search Guide
BA Insight SharePoint 2013 Enterprise Search Guide
Everything You Need to Know to Get the Most Out of Search and Search-based Applications
Tony Malandain
Tony Malandain is a co-founder of BA-Insight. Tony architected and built the first version of the product which gained significant momentum on the Microsoft Office SharePoint Server (MOSS) and positioned BA Insight as the leading Enhanced Search vendor for SharePoint. Tony was awarded a patent for the core AptivRank technology, which monitors usage behavior of search users to influence relevancy automatically.
Eric Moore
Eric Moore is the lead for BA Insights Search Interactions and Content Enrichment teams. He is accustomed to living at the leading edge of search, and has deep experience with multimedia search, XML search, and content enrichment. Prior to BA Insight, Eric worked for five years at FAST and on the Microsoft Search Platform team. Eric has developed state of the art Products, algorithms, and platforms for specialized information workers.
Theres a lot to say about SharePoint 2013, and about search in SharePoint 2013. This e-book is focused only on search, and is meant to give you a working understanding of the new features so that you can get oriented with them and think about how you will deploy and use them. It does not try to cover everything, nor is it meant to be a hands-on guide.
In this book we will be covering five key areas as they relate to search. These key areas are color coded, and represented by the blocks below. Each section contains short chapters that can be read independently or continuously. The goal is to enable readers to focus on the information they need to learn about at the moment.
User Experience
Not every area of search has changed in SharePoint 2013, and those that are currently familiar with search wont be lost at sea. For example, the deployment model, services architecture, and crawling and connector subsystems are pretty much the same as with SharePoint 2010. End users will see a dramatically different search UI, but they will be able to use it with no training (its quite intuitive). If you have built up a competency in search, youll be able to take it further in many ways which we highlight throughout this e-book.
Deeper Dives:
Technet Whats new in SharePoint 2013 search Blog article from Microsoft Search Group TechNet landing page refreshed weekly with articles on SharePoint 2013 Highlights of Search in SharePoint 2013
SHAREPOINT 2013 THE THE ESSENtIAL ESSENtIAL GUIDE GUIDEtO tO ENtERPRISE ENtERPRISE SEARCH SEARCH
The face of search is totally revamped not just in keeping with the new SharePoint UX overall, but with deep refinements, better display for results using Result Blocks, a hover panel with previews, and more.
BENEFITs
In SharePoint 2013 search scopes, federated locations, and best bets are now deprecated in favor of result sources, query rules, and result templates.
BENEFITs
SharePoint 2013 is light-years ahead of other search platforms in this area. Result sources, query rules, and result templates off remarkable control over search presentation. These are brand-new concepts, well worth learning they arm site administrators and site collection administrators with the tools to field powerful, effective search.
Crawling is an area that has changed least with SharePoint 2013, but there are still some great enhancements, including continuous crawling.
BENEFITs
Business Connectivity Services has continued to evolve and now supports claims tokens through the BDC. The Content Processing and Linguistics capabilities in SharePoint 2013 search are very strong and extensible.Theres lots of new capabilities including a completely new file parsing mechanism.
Complex security scenarios are more tractable (though still hard). This platform offers a lot of power to developers, as well as providing some key capabilities end users will notice.
Under the hood, there is a new architecture, a new search core, and many new modules that are the culmination of the FAST acquisition not just combining the best of FAST and SharePoint search, but some significant innovations from a continued investment in search.
BENEFITs
Search deployment and management is different, and largely better. Making search hum for O365 fully multi-tenant, smoothly scalable and fault-tolerant, and manageable at multiple levels was a key goal for this release and there are big benefits for on-premise deployments too.
Theres a new development model for SharePoint 2013 generally, and for Search specifically.
BENEFITs
This makes extending search much more accessible, and will foster a lot of exciting search-based applications. A lot of great possibilities are now open to developers. Your users will get more done and enjoy a variety of applications, both built in and tailored all powered by search.
Theres a new Content Extensibility Web Service (CEWS) that opens up content processing for extension. Search is used pervasively throughout the SharePoint 2013 platform, and powers the new web content management (WCM) and e-discovery capabilities, topic pages, the contentby-search web part, myTasks, mySiteView, and more along with great enterprise search, people search, and site search.
TABLE OF CONTENTs
6 Introduction 7
Chapter 1 User Experience The New Face of Search in SharePoint 2013 8 Raising the Bar: The SharePoint 2013 User Experience 10 First Class Search Interactions: More to Love 12 The SharePoint 2013 Search Center Overview 14 Refiners and Faceted Navigation 16 Search Center Setup 18 Chapter 2
Working with Queries and Results New Mechanisms in SharePoint 2013 19 Query Processing: the Search Engines Automatic Transmission 22 Query Rules and Query Suggestions 26 Result Types and Result Templates
28 Chapter 3
Working with Content Crawling, Connectors, and Content Processing 29 Content Capture 33 Content Processing 36 Linguistics Processing
40 Chapter 4
Architecture, Deployment, and Operations Getting under the Hood 41 New Architecture, Single Search Engine Core 45 Indexing and Partitions 47 Analytics 49 Federation and Result sources 52 Search in Exchange 54 Search Administration 58 Upgrade and Migration
63 Chapter 5
Applications and Development New Models for Search-Based Applications 64 The New Development Model in SharePoint 2013 69 The Content Enrichment Web Service (CEWS) 71 Search-Based Applications in SharePoint 2013
77 Conclusion
INTROdUcTION
C HAPTER 1
User Experience
The New Face of Search in SharePoint 2013
Mobile and Tablet Deployment Support for fluid layouts, touch, and voice interaction mean that using SharePoint on Microsofts Surface tablet and the Apple iPad is much easier and smoother. This means that users can access information anywhere at anytime, with the same ease-of-use theyre familiar with from their desktop.
SharePoint 2013 and Applications The bar is also going up when it comes to ease of access to information. SharePoint 2013 is able to field experiences that are mobile and search driven, as well as for customer and employee only facing sites. There are a variety of full-fledged applications that run on your desktop, in your browser, and on leading mobile devices and present new ways to access and interact with SharePoint information, further enhancing the user experience and productivity.
The SharePoint 2013 user experience is a platform-wide update, ready for a new generation of interaction. Changes in the underlying presentation tier, service architecture, Object Model (OM), and Office Apps all further the goal of making it easier to configure and deploy valuable applications in this new delivery environment.
Deeper Dives
TechNet on mobile devices and SharePoint 2013 Blog with highlights of Design Features in SharePoint 2013 Article on Windows 8 UI SharePoint 2013 UI blog
Transitions Across SharePoint Tasks The disjunction between contextual search and search sites is gone in SharePoint 2013. There are fewer obvious differences between apps; this version of SharePoint does not feel stitched together like previous versions. New developments include the seamless flow between functions such as people search and search verticals. Productivity Search helps users quickly return to important sites and documents by remembering what they have previously searched and clicked. The results of previously searched and clicked items are displayed as query suggestions at the top of the results page. Search Mechanisms Under the Hood Queries, interpreting queries, returning relevant results, and the presentation of those results are pervasive across SharePoint 2013. Its not always obvious that search is there, but
10
search technologies are used across the SharePoint 2013 platform, and key new interfaces lower the complexity of customization IT professionals and application developers need to do in order to support business users. Search powers a number of areas which may or may not be obvious as search: Upgrades to People Search and Social Features making it easy to explore and find people, expertise, and conversations that are important to the task at hand. New Social Features My Sites, Communities,Teams, and Conversations create dynamic content that are quickly indexed via constant incremental crawls and returned through SharePoint 2013 search. Personalization Features search suggestions are personalized, and include visited documents, as described in the chapter on query rules and query suggestions. These show up as if by magic, and many users enjoy them without thinking about search at all. Overall, the search interfaces are clearer and brighter, and all the different parts of SharePoint apps seem to work better together. It is also much easier to customize search-driven experiences in SharePoint 2013 than with any other enterprise search platform.
Deeper Dives
Search User Interfaces book by Marti Hearst
11
12
formats (for example, Word and PowerPoint, but not PDF).This preview technology was not designed for documents to be consumed via this interface, but rather to determine if this is the particular document that you have been looking for. Notwithstanding these limitations, though, document previews are a boon to the user and a great addition to search. The hover panel paradigm works well in the Search Center. This can be customized and may vary based on content type or tab. Default actions with document preview include the Edit, Send, and View Library features, as well as Follow, a social feature. They also allow some actions directly from the search page, including editing content directly in Office Web Apps.
is available by default. The new hover panel provides a great way to show profiles and content, in addition to social connections.
For many applications, people will want to customize the search center, because it is not as information-dense as heavy search users or search-based applications demand. This type of customization is easy to do, and well cover it later in the chapters about query rules, result sources, and development model Overall, the SharePoint 2013 Search Center interface is better than any other search UI weve seen on the market. It appears to be very robust, and holds true to Microsofts works anywhere commitment. It functions smoothly both in the cloud with Office 365 and on premise, as well as in all of major browsers (Internet Explorer, Mozilla, Chrome), and the
People Search
People Search is another strong part of the Search Center. As with SharePoint 2010, people search lights up with actions when used together with Lync, and phonetic search
13
experience on tablets like the iPad is pretty good. A word to the wise: just dont let a sexy demo or quick test drive lull you into thinking that it just magically works. As with all search products, the navigation depends on having decent metadata. Overall the out of the box interface is clean, fast, and provides relevant results so the basic must have elements of great search are covered. There are also a lot of exciting capabilities that make exploration easier, give users insight, and enable action directly from search. Of course, everything works better with search when all
of the products that are part of the search machine are Microsoft. for example, People search lights up with actions when used with Lync; myTasks work with Project Server; and previews work only documents stored in SharePoint with recent Office formats, and require a separate OWA server. If you dont have servers that run these other products, the additional features associated with them simply dont show up. However, search still works very well even without them. When you have all these parts in place, though, they work extremely well together a big accomplishment for Microsoft with strong productivity benefits to the end user.
Deeper Dives
TechNet creating a search center in SharePoint 2013 Intro to the hover panel Longitude Search Overview
and FAST Search for SharePoint created deep refiners out of the entire result set, even if it was millions of items. With SharePoint 2013, there are now two different modes for the refiner web part: standard search results, and faceted navigation. For standard search results, refiners are generated as they were with FAST Search for SharePoint. You can now define display templates to use for rendering different kinds of refinements, which is a big win over SharePoint 2010. All refiners are now deep refiners. Faceted navigation is more dynamic. It is used in conjunction with term sets (served from the
14
term store), which are also used for navigation in document libraries. With faceted navigation a term from the term store filters what kind of data should display. If the managed property is refinable, the refiners that show can depend on the term. This is handy in many search scenarios, including the online store scenario which inspired it. For example, users can use faceted navigation in an online store to find products more easily. The scenario below uses the term store terms Camera and Laptop and managed properties Megapixel Count, Color, and Manufacturer. So, with faceted navigation your terms would look like this: For the term Camera, add refiners for Megapixel Count and Manufacturer For the term Laptop add refiners for Color and Manufacturer The refiners that show up now are based on that term, which can be set based upon a page or catalog hierarchy, so that you get the following whether you navigate or search to laptops:
Configuring these refiners via the term store is convenient, and there are built-in tools that make is easy to create a hierarchy, customize the refiners within the hierarchy, and set up a very dynamic experience, as shown below.
15
As you can see, Faceted Navigation is quite a powerful capability. Refiners are available everywhere, they adjust dynamically and can be configured to an exact design all controlled
by metadata. All refiners used in Faceted Navigation are deep refiners, so there are no gaps caused by a missed item in the deeper result set.
Deeper Dives
TechNet Managed metadata overview Technet configure facted navigation in SharePoint 2013
Faceted Navigation Metadata used for top down navigation (Faceted Navigation) and metadata exposed as search results for bottom-up refinement are now both managed through the term store.
16
The Search Center itself is a site template, and the good news is that with this latest release some of the rough edges from SharePoint 2010 have been removed. For example, this template now inherits design elements from a master page, so you dont need to jump through hoops to make it match your design. This does not mean that you dont still need to think about how to manage the universal search center which may serve many site collections with different themes and designs but you now have easier control.
in SharePoint 2010. Most Meeting Workspace site templates from in 2010 have also been discontinued in SharePoint 2013 including the basic, blank, decision, and social meeting workspace templates and the multipage meeting template. They have been replaced by features from other parts of SharePoint and from OneNote and Lync, which all support collaborative work, live conferences, smaller meetings, note-taking, and storage of notes and other conference-generated commentary. The benefit is that projects with multiple contributors and collaboration across geographically distributed teams is streamlined. The facilities for web content management (covered in the Applications section) are remarkably improved and totally driven by search. This makes creating externally-facing sites and applications much more effective. If you have responsibility for explaining and exposing a service or product to a market inside your company, the business-focused features that are new in SharePoint 2013 are a strong proposition for inside the firm audiences. For example, if you provide consulting services internally for a legal practice area, recommendations, customization of search experience based on queries and personalized interaction, etc. enable users to find relevant information more quickly.
The Document Workspace site template has been removed in SharePoint 2013, simplifying the list of Deeper Dives templates available when a new TechNet creating a search center in SharePoint 2013 site collection is created. This will Blog on using the Content by Search Web Part be a big change to users since this template was a workhorse
SHAREPOINT 2013 THE ESSENtIAL GUIDE tO ENtERPRISE SEARCH 17
C HAPTER 2
18
understanding the intent behind the query. You can leverage information such as: Where the query originated from. For example, if you run a search from your companys helpdesk intranet site, you are likely to be looking for FAQs, how tos, or IT specialists. The search engine can now capture that intent to provide more targeted results. Who launched the query. If you are based in the United States, and searching for employee benefits, you are more than likely looking for U.S. employee benefits than for Canada or United Kingdom. What concepts or entities can be recognized in the query. For example, if you were searching for an expense report form, the search engine will return the Excel spreadsheet, InfoPath form, or web page which enables you to file your expense report.
But what does Query processing mean exactly? If youre familiar with SharePoint 2010, think of query processing as the evolution of search scopes, federated locations, and best bets. With intranet search indexes now frequently reaching tens of millions of items, formulating the right query is more and more critical to finding relevant information. Fortunately, there are a number of techniques you can use to reformulate the query by
19
Query processing in SharePoint 2013 is intended exactly for these scenarios; to enable a smart, targeted search experience which understands what the user is searching for and to provide the optimal result straight from the search page. This is a very exciting new capability in SharePoint 2013, as it will open up many opportunities to rapidly build new applications driven by search which will look nothing like the standard list of ten blue links.
scopes. The key difference here is that the extra conditions enabled in 2013 go far and beyond what 2010 could do. SharePoint 2013 comes with a strong query builder to apply conditions based on the user, the search page URL (or any parameter found in it), the site, or the current date. Result sources can also be used to return results from remote content, much like federated locations in SharePoint 2010. (The result sources construct is covered in greater detail in the Federation chapter of this e-book). Query Rules allow conditional transformation of queries and results based on custom logic. Imagine you want to simplify searching for budget spreadsheets in your organization. Using query rules, you can type simple search queries such as: budget spreadsheet project X and behind the scenes the request can be transformed into something much more elaborate. The query rule could recognize the terms budget and spreadsheet in the search query and rewrite the query so that the document content type must be budget, the file type Excel, and the file content match the project name you specified in the search keywords. Additionally, the results would be sorted from the most recently modified file so that the freshest information is returned first. It is worth noting that the same Query builder functionality used for Result Sources is also available here as a means to define conditions on query rules or transform user queries
20
The last major new feature introduced for query processing on SharePoint 2013 is the Result Type construct. A result type supports the presentation of results in a tailored way, and the result block contains a small subset of results that are related in a specified manner. For instance, you can create several result blocks for sales collateral, knowledge base articles, documentation, etc. so that when a user searches for a specific product you can make sure to always return the top two or three pieces of sales collateral or knowledge base articles matching this query. In spite of the enhanced capabilities these tools provide, you may run into scenarios where they are not suitable or flexible enough for a particular search scenario. For example, geo-searches (ranking or search results filtering based distance), personalized queries (complex query changes based on who executes the query), synonyms expansion, etc. are not supported. In these scenarios you can still rely on the Search API to build your own web part or search application that implements the appropriate logic. The API, is for the most part comparable with the version seen in SharePoint 2010 with a few exceptions. The main exceptions include the removal of the FulltextSqlQuery class and syntax which have been deprecated, and the appearance of the SearchExecutor class which allows you to execute multiple related queries in one shot.
to create pages as all the functionality is user friendly and has point and click interfaces. Microsoft made it even easier by pushing this functionality not only to site collection administrators, but to administrators as well. Thats right, farm level privileges are not required as long as you own a site (such as your personal site) you can use these capabilities to build your own search center. Two examples of applications you can build using these new features: A manufacturing dashboard that displays all about a specific part based on its part number. Information could include the inventory level, the last orders for that part, the instructions on how to use that part, and forum discussions from your customers about that part. A knowledge portal, that enables you to share FAQs, knowledge base articles, documentation, or tutorials to empower your support or helpdesk team. Powering your applications via search has never been easier. The chapter on Search-Based Applications has many more examples, and we encourage you to explore whats possible, and even to try building some of your own.
Deeper Dives
Technet on query processing Blog overview of search in SharePoint 2013 List of terms for query builder New KQL syntax in SharePoint 2013
No Speed Limits
Microsoft has made it very easy to create search pages using this new functionality. In fact, you dont need programing experience
21
Query Rules
Query Rules are a brand new feature in SharePoint 2013, and they are designed to enable you to act upon the intent of a query and provide a remarkable amount of control and configurability. The Query Rules framework is composed of three top level components: Query Conditions, Query Actions, and Publishing options. These are all configurable via PowerShell, or via the UI shown to the right. Query Conditions are rule sets that are meant to determine the intent of the query (does the query meet a rule?) Options for this include: Query contains a specific word or words Query contains a word in a specific dictionary Query contains an action word that matches a specific phrase or term set
Query is common in a different source (like Videos result source) Results include a common result type (like file type) Advanced rules which can match across a set of terms, dictionary, regular expression, etc. If the query is against a particular result source (see the Result Source chapter in this book) or category, result source conditions can also be applied. If the Query Condition is met, Query Actions are then triggered.
22
Query Actions specify a series of actions that take place once a query condition is met (what to do if the rule is met). These actions include: Assign a promoted result This replaces the Best Bet and a former FAST Search for SharePoint 2010 feature known as Visual Best Bets. The configuration of the promoted results allow you to specify if the returned action should be treated as a best bet (hyperlink) or as a fully formatted HTML block (Visual Best Bet) Create and assign a results block When a condition is met, one or more results blocks can be triggered. Result blocks specify an additional query to run and how to display results. This feature includes a full query designer so you can build and test queries before finalizing them. You can also include the results above those returned by core results, or interleaved by ranking. Additionally you can choose custom display templates instead of the default for the result or results block. Change the ranked results by changing the query This allows you to assign additional parameters and weighting (XRANK Boosts) values to the query (Query Transforms for those familiar with FAST). For example, if the condition of the rule is met, apply XRANK constant
boost of x number of points. XRANK is a FAST capability that allows you to override the default relevancy ranking by boosting the relevancy score for particular results at query time. Publishing Options Publishing options determine when a query rule is active (When to do this?) A rule may be active in a specific time interval (start date, end date) or always active (by default). You can also configure a review date (triggers an e-mail reminder to review this rule). The power of query rules is not only in the flexibility they provide, but also the richness and complexity that can be derived from them. Imagine a single Query Condition being met, which then triggers a visual best bet, a results block from a remote SharePoint site, a results block from a cloud source, and a query transform that will boost results coming from the cloud. In addition, rules would determine that these actions are only taken between November 25th and December 26th. An example of how this would work in an intranet scenario, would be if you had a query rule that was active only during insurance open enrollment windows.
Query Suggestions
Query suggestions enable users to ask better questions, and make it simpler to search for information. This feature was sorely lacking in SharePoint 2010. In SharePoint 2013, Query Suggestions are supercharged, thanks in part to the addition of the Analytics Processing Component and the Analytics Reporting
23
Database. These components provide for analytics aggregation and persistent storage of these analytics. Some key new features include: My Queries Personal Query Log (in Analytics database), which factors your personal SharePoint activity into the query suggestions. My Sites This capability tracks sites you have visited, and factors them into the query suggestions. Our Terms This feature uses information related to the most frequent queries across all users that match the search terms. Query Suggestions now take two forms: Pre-Query Suggestions and Post-Query Suggestions. Both of these help the user ask better questions by showing you what others have asked before; they differ in when they are displayed and how people use them. Pre Query Suggestions include both a list of queries from other users, and a list of items you have clicked on before, as shown in the screenshot below.
query to help them find information, and to assist them in writing better queries. These suggestions are provided in two forms: 1 A list of items that others are typing for their queries. 2 A list of items you have clicked on before from your personal query log. A key aspect of this feature is that it will never provide a suggestion to a search that did not yield a click-through (someone clicking on the document), and it will never provide a suggestion if the results would lead to a dead end (zero-result query).
Pre-Query Suggestions occur prior to a query being executed. The goal of pre-query suggestions is to aid users in selecting a
Post Query Suggestions are provided after a query is executed and when results are displayed. These suggestions are based upon the results that you have clicked on at least twice. They provide a quick means to go back to a document that you regularly review or select. They are similar to the Related Queries provided with SharePoint 2010. Suggestions can also be tuned (inclusions and exclusions) within
24
the Service Application Admin Pages. It is also important to note that these are not tuned at the site collection level, but only at the SSA level.
In SharePoint 2010, spell correction was implemented as a series of XML files that defined inclusion and exclusion items for the dictionary. In SharePoint 2013, Query Spell Correction is managed from within the term store of the Managed Metadata Service. Within the term store, Query Spellcheck Exclusions and Inclusions are nodes within the term store, as shown below. Dynamic dictionary creation is still supported, but is now managed from within the term store.
Within the user interface for search, Query Spell Corrections can be configured to use Did You Mean type functionality for query transforms.
25
Deeper Dives
Good blog post on query rules TechNet on query processing List of terms for query builder New KQL syntax in SharePoint 2013
26
managed property before it can be used in a rendering template. 3 Specify where you would like the requested property list items to be displayed using a tagging convention as follows (-#= contenttype =#-) by using a Rendering template. The Rendering template consists of a template that is composed of HTML and might contain JavaScript. Within this simple to edit template (Not like editing XSLT in SharePoint 2010) you can call specific graphics (icons, etc.) and be stylize it in any way that you would normally stylize HTML. Result types may seem complex to master, but once you become familiar with them you will appreciate how powerful they are. There are impressive tools in SharePoint 2013 that facilitate ease of use, and formatting is done using any tool you are familiar with. (SharePoint Designer has dropped the ability to do this kind of formatting, which will be annoying to some, but there are lots of great tools available to work with HTML and JavaScript.) With SharePoint 2010, very few people actually did the kind of formatting and result templating that was possible it was too complex and arcane to use. With SharePoint 2013, you will quickly find that result types and result templates are enjoyable to work with, and youll discover that you use them naturally to make search results look great and work well for users.
Within Result Types you can: 1 Specify a rule based upon specific criteria. The rules can contain fairly advanced features, such as BOOLEAN logic (i.e. AND OR NOT), equality (i.e. = or !=), or comparison ( < OR > ). These rules can also be applied to managed properties. For example the rule might be ContentType= spec documents). 2 Specify which managed properties you would like to have returned once rule conditions have been met. You must specify at least one
Deeper Dives
Customizing search results via Result Types and Display Templates Technet query variables
27
C HAPTER 3
28
Content Capture
Capturing content is fundamental to search if its not crawled and indexed, you cant find it! The process of connecting to content sources, crawling them to get content, and making that content searchable is far more complex than most people realize. It was also one of the most frustrating areas to manage with SharePoint 2010. As a quick orientation, the basic function of a crawler is shown in the figure below. The concept is simple enough: the crawler connects securely to a given content source, maps the content from the source system to the crawled properties of the search engine, and feeds the engine in either a full crawl or an incremental crawl (which finds any changes). What makes content capture different from one search engine to the next is the breadth of connectors, coverage of different security models, and data types, the performance (both throughput and latency), the robustness, and the ease of administration. SharePoint 2013 does well on all counts although most connectors are supplied by Microsofts partners, not Microsoft.
SharePoint 2013 supports multiple crawl components, crawl databases, and content sources as shown below. There are a number of connectors included out of the box: SharePoint HTTP (web crawler) File Share Business Data Connectivity (BDC) Framework also includes these connectors that are built on the BDC framework: Exchange Public Folders Lotus Notes Documentum Connector Taxonomy Connector (connects to MMS) People Profile Connector
29
Overall, the most noticeable change in content capture is Continuous Crawling. This is a new method of insuring you have the most current data in your search index, and is available only for SharePoint content. Rather than living with a latency of several minutes and with full crawls that might take many minutes to start populating content in the index, youll see content within seconds! When you enable continuous crawls (using the UI shown below), a crawl schedule no longer applies you are running crawls in parallel and the crawler gets changes from SharePoint sites every N minutes (set to 15 minutes by default but this parameter is changeable). Continuous crawls do not stop for errors, but rather note the error and continue to crawl content. Continuous crawls can occur while other crawls (full or incremental) are active or starting, where incremental crawls need to wait for other incremental crawls to complete prior to starting to crawl. With this capability you can now keep content fresh, and wont experience mysterious delays when additional content sources are added.
30
The Taxonomy connector is new in SharePoint 2013, and you will see it at work even when you dont use it explicitly, since the term store is much more integrated with search. As you will read in other chapters of this book, you can now create entity extractors directly from term sets, set up WCM page hierarchies using the term store, define faceted navigation using taxonomies, and much more.
still be good for high performance or particular tasks. But the primary way to create connectors is through the BDC Framework, which was introduced in SharePoint 2010 as part of Business Connectivity Services (BCS). BCS is an umbrella term for a set of technologies that brings data from external systems into SharePoint Server 2013 and Office 2013 (shown in the figure below). As with SharePoint 2010, you can make new connectors pretty simply. For systems with static schemas, straightforward security, and moderate performance needs, this is not a huge job. There are some great improvements in Business Connectivity Services as a whole for example, theres tooling specifically to create External Content Types against OData sources, there are Representational State Transfer (REST) and Client Side Object Model (CSOM) interfaces, and External Content Types that can be scoped to a single SharePoint app. Unfortunately, none of these apply to search creating an indexing connector for search is not the same as creating an External Content Type. The Business Data Connectivity (BDC) framework is largely the same in SharePoint 2013 as it was in SharePoint 2010, when it comes to search. There is one notable change though Claims tokens are supported through the BDC. Previously, only Active Directory (AD)-format Access Control Lists (ACLs) were supported, which made it nearly impossible to cover some complex security scenarios. With Claims support, many of these scenarios are tractable though still very much the domain of experts.
31
One warning you shouldnt underestimate the effort involved in connector development, deployment, and maintenance. Dont fear connector development, but watch out for the classic quicksand trap.Too often a development project gets to basic connectivity quickly but then struggles to get security right and to get high performance and scale. If and when this is successful, the project is then dragged further down in troubleshooting and maintenance, since things change every time the source system changes. Plan your development carefully to avoid this trap.The best way to avoid it is to consider pre-built connectors for any complex system that way you dont have to build your own from scratch, and you dont have to maintain it.
Java Database Connectivity (JDBC) connector which supported direct SQL access to databases. Though these may seem like big gaps, there are ways to cover this functionality in SharePoint 2013, either with different mechanisms (many cases covered by the JDBC connector can be done via the BDC), or with pre-built connectors from Microsoft Partners.
Deeper Dives:
TechNet on managing continuous crawls MSDN on searching new content with SharePoint 2013 Longitude Connectors Overview
32
Content Processing
Content Processing is an essential pillar of search quality, but it is typically invisible to the end user. The development of content processing in SharePoint 2013 is focused on implementing platform-wide capabilities, and integrating and supporting built-in search-based applications such as WCM and e-discovery. In order to support the wide range of scenarios that depend on search, Microsoft provided extensibility, so that customers and partners can leverage the new search platform and hook into content processing. The Content Processing component is brand new with SharePoint 2013. It takes content from the crawler and prepares it for indexing, as shown below. With SharePoint 2013, there is also a new Analytics Processing component that feeds information into Content Processing.
33
In SharePoint 2013, the underlying dataflow engine for content processing, which was first introduced as CTS, has been extended and enriched to host the content processing tasks for the entire SharePoint platform. Successful integration of a new content processing flow for search and enrichment for the whole SharePoint platform is a significant investment and engineering achievement. The benefits are potential scale out, improved management, cloud ready system architecture, and an improvement to Microsofts ability to integrate new content enrichment features inside the SharePoint platform.
New format handlers implement document parsing. They replace IFilters for OOB document metadata. Higher throughput for Office document types and for PDF. Automatic content-based file format detection removes dependencies on file extensions. Content processing throughput and error reporting (this is tied to crawl reporting) is comprehensive and far simpler to understand. Search analytics processing (which we cover in more depth in the chapter on Analytics) is an important new platform capability. The analytics module feeds information back into Content Processing for a variety of purposesfor example, to improve search relevance based on user behavior. Usage and search action events document exposures and document click-throughs are recorded into a new SharePoint 2013 analytics store. They are then processed in a form that enables search relevance to account for, for example, popular content, relevant query terms, or, in the context of recommendations, boosts for related user/ related item results. This also supports search history boosts.
34
will enjoy how easy it is to set up and operate these capabilities, and how little head-scratching you do in development but you will be frustrated at how little you can get at. This is a sensible tradeoff in the context of a major platform upgrade and in accommodation of a hosted multi-tenant deployment model (O365). The capabilities and ability to extend them is still there, but it feels limited. There are times that it takes sophistication and inventiveness to do what you want with the hooks provided. The extension point for content processing is the Content Enrichment Web Service (CEWS). This is a new mechanism to enable content processing, called from a content processing flow at a single point, as shown below. We will cover CEWS in more depth in its own chapter, and touch on its applications in the chapter on Linguistics.
* Note the CEWS call-out is not part of O365 and is only available for the Enterprise Edition of SharePoint 2013. SharePoints management of content processing is highly scalable and streamlined. SP2013 content processing straddles the on-premise deployment of SharePoint and the deployment of SharePoint in hosted form via O365. If content enrichment beyond what is provided in SharePoint 2013 is important for your application, especially for content you already have, prepare to look for custom solutions that leverage the Content Enrichment Web Service.
Deeper Dives
MSDN Section on Content Enrichment Web Service (CEWS) TechNet content processing description
35
Linguistics Processing
Linguistic processing, which aims to leverage the meaning of documents or words, is the special sauce of search and one of the most mysterious and difficult to understand areas. Human language is a tricky thing, and algorithms aimed at understanding it are complex and imperfect yet this is what makes it seems like search just works for end users. Linguistic tools, such as spellchecking of queries or grammatical normalizing of content or queries, can greatly simplify users search experience. Covering the wide variety of languages (SharePoint 2013 search covers 85 languages, from Afrikaans to Zulu) also means that you can find content that is generated by users from across geographic boundaries.
In preparing content for indexing, linguistics are applied in stages, each one building on the previous one. The figure below gives an overview of these steps in what is often called the pipeline. (The steps in gray are not OOB, but illustrate some of what is possible by adding third-party components.)
Linguistic processing is applied to both content and queries (as shown above), using a similar framework under the hood. As mentioned in other chapters, the content processing and query processing components have a heritage from modules called CTS and IMS, and they share an underlying framework for processing flows.
First, files must be parsed, teasing the indexable text out of PowerPoint, OneNote, PDF, etc. During this process the language is detected, since processing English is different from processing Japanese. Words and patterns (dates, times, URLs, etc) are found, based on the text and language. Next, the magic begins a variety of types of Text Analytics technology is then applied. Stemming or lemmatization (which allows forms of the same base word to be matched, for example sing, singing, sung, or incorporate and incorporating), synonyms (matching, for example, car and auto), and concept detection of various forms deal with the wide variety of ways humans say essentially the same thing. Entity extraction, which is a key linguistic capability for SharePoint 2013 search, and techniques like categorization, relationship extraction, and sentiment analysis add metadata
36
that greatly improves the ability to find and explore information. Microsoft has the deepest natural language processing development capability on earth, because it has labs around the planet. This was strengthened with the FAST acquisition, since one of FASTs specialties was linguistics applied to search. Strong language processing features show up in SharePoint 2013 search, which has continued a tradition of steady improvement in this area and has some extremely strong linguistic technology, including many improvements from SharePoint 2010. Some of the changes will be directly apparent to the end user, but many of them show up in subtle ways, and some are only relevant to specialists handling unusual situations. For those coming from SharePoint 2010 search, theres some remarkable new capabilities and improvements. For those coming from a FAST based platform, the capabilities are familiar, but are now much easier to work with. There are some capabilities you are used to from FAST which are no longer there we mention the major ones as we cover each area. There are some changes in SharePoint 2013 that will be noticeable to nearly all search deployments: document parsing is foremost, but also synonym management and custom entity extractors. Some changes will only be apparent or available to those extending search, and some will be visible only to a specialized group of deployments.
Automatic file format detection no longer relies on file extensions, eliminating the kind of errors that happened when users or applications do creative things like making .memo files. Deep link extraction works like a table of contents generator and allows you to click into previews for Word and PowerPoint formats. Metadata extraction for titles, authors, and dates provides better metadata and is much easier to understand than the techniques used in SharePoint 2010 (where Optimistic Title extraction was one of the top sources of user confusion). High-performance format handlers for HTML, DOCX, PPTX, TXT, Image, XML and PDF formats mean faster crawls and indexing. The new parsing facility is enabled by default and supports 55 of the most common file formats, including things like Montage, Visio and OneNote. By comparison, the 2010 Microsoft Filter Pack supported 15 formats, and the Advanced Filter Pack (available for FAST only) supported 422. For most deployments, this means you will no longer have to seek out third party IFilters though the IFilter API is still supported and there is a rich assortment of IFilters on the market that cover file types beyond the OOB 55.
37
detection was done chunk wise on document parts like paragraphs. Now a much larger part of the document is used. The advantage of this is that language detection is generally better the more language you can look at the more reliably you can tell what language its in. There is a downside to this approach, however documents that have mixed languages partly in English and partly in French, for example, arent handled as well. The Term Store (MMS) is well integrated with search now, which provides a number of big benefits. Customizations to Query Spelling Correction are now managed in the term store both inclusions and exclusions (shown below).
outside of the term store Synonyms via a UI or PowerShell, Custom Extractors via PowerShell, and spell correction via a dynamic dictionary based on content in the index or a static OOB dictionary. Offensive Content Filtering was a feature that could be enabled in FAST Search for SharePoint. This feature, made it easy to shield users from obscenities and profane language that is found in content (even business content) remarkably often. However, it is no longer supported with SharePoint 2013, so youll need to find a third-party alternative if this is important to you. Substring search, another FAST-only feature, was also removed. This provided n-gram matching without taking into consideration word boundaries, which was good for applications like part numbers.
Changes in Extensibility
There are notable changes in how you can extend linguistics processing with SharePoint 2013. These include: Custom Extractors (previously FAST only) are more powerful, and you can have more of them (12 rather than the five allowed with FAST Search for SharePoint). These allow you to provide a list of terms (via PowerShell) and match them in the content, populating managed properties with consistent metadata which is the lifeblood of information discovery. Custom Word-Breaking now requires only one language-independent dictionary, rather than the one-dictionary-per-language facility in SharePoint 2010.
Property Extraction (previously a FAST-only feature) is also manageable in the term store (shown below). However, only company names are available if you were using property extraction for people names or place names, youll need to find a third-party alternative. Some things are still managed
38
Customize stemming (done via registry settings in SharePoint 2010) is no longer supported. Third party specialists will find ways to customize this level of linguistics and handle specialized cases. The biggest change is the availability of the Content Enrichment Web Service (CEWS). This provides a way to add linguistic processing of any type, such as the examples in gray in pipeline figure above (concept extraction, relationship extraction, geo-tagging, summarization, etc). With FAST Search for SharePoint, it was possible to extend the content processing pipeline through a sandboxed application, but this was both slow and limited in the information it could access. SharePoint 2013 introduces a much more open API which makes it possible to add specialized linguistics at lower levels as well as sophisticated text analytics. CEWS is covered in more depth in a separate chapter.
Deeper Dives
Technet article on linguistic search features in SP 2013 MSDN Section on Custom Word Breakers Longitude AutoClassifier Overview
39
C HAPTER 4
40
FAST Search for SharePoint is a very powerful product but there are numerous rough edges due primarily to the lack of time in the previous development cycle. The timeline also resulted in a hybrid architecture, with separate SharePoint and FAST farms, as shown below. This could be awkward and confusing to work with.
With the release of SharePoint 2013, the full realization of Microsofts investment in FAST Search and Transfer is now evident. The capabilities now available take enterprise search to a whole new level. They are the result of a new search architecture. The architecture, shown below, is relatively simple, though much of it is new.
41
There is a good walkthrough of the components on TechNet, which we wont repeat here. Each of the components shown are covered in at least one chapter of this book as well. However before we move forward, there are a few essential things to understand: Search is fully integrated into SharePoint, and there is no longer a separate Search Server. Certainly, a SharePoint 2013 server or services farm can be used only for search. To do this, you do want to have the MMS (term store) and User Profile service, at minimum much as you did in SharePoint 2010. There are four different databases, each independent from the other. All of them can be partitioned, mirrored, and managed.The Crawl database scales with the amount of content crawled, so this is typically the database that has multiple instances in a large search deployment. Every component can be scaled out for capacity and for fault tolerance. Previously, there could be only one Search administration component, which meant you had to do creative workarounds to create truly fault-tolerant configurations. Search is now multitenant except for a few things, such as the CEWS API. Much more administration can be done at the site collection (or tenant) level.
42
Its a new, next-gen search core that was the result of a decade of research and development at FAST, hardened through the Microsoft development process. Also new in this architecture is the Analyzer (aka Analytics Processing Component), which we cover in the chapter on Analytics. The content processing component writes information about links and URLs to the link database. In turn, the analytics processing component writes information related to the relevance of these links and URLs to the search index via the content processing component. This enables some powerful capabilities like recommendations and usage-based relevance enhancement. If you look inside the search service, you will find several search processes. This includes MSSearch.exe (for the crawl component), NodeRunner.exe (which hosts search components), and a Host Controller (a Windows Service that supervises NodeRunner processes. The Host Controller monitors NodeRunner processes, detects failures, and restarts processes if they do fail. There can be multiple NodeRunner instances on the same
server, each hosting one search component. On a default single server install there will be 5 instances of the NodeRunner.exe process, as shown to the left. Although there is a fascinating dataflow engine and a next-gen search core, those are not exposed for developers the only points of configuration for interaction are ResultSources, QueryRules, and CEWS. In SharePoint 2013, configuration alternatives are circumscribed to assure that no configuration would result in excessive resource consumption for that instance relative to other instances that may be running through the same service. So, QueryRules run effectively in a sandbox that restricts calls to non-SharePoint services.
43
that we expect the density (items per node) of SharePoint 2013 search to go up dramatically over time, just as FAST Search for SharePoint density did.The initial focus has been on scale-out in order to support O365, not on density.
apply it to different applications, and develop on top of it. What will you notice about this architecture? There are many things beyond the capabilities that meet the eye. For example: The core engine is different, so relevance is different. Since Microsoft has a lot of data with which to tune relevance, youll notice first that the relevance is better OOB. But if you had customized relevance or spent time focused on it, you may have some work to do or you may have a pleasant surprise. Indexing is atomic in the new search core. That has some very interesting implications, but mostly youll notice that its more robust and that you can do normal backup and restore. For nearly all search engines its a dirty secret that data can occasionally get lost in indexing (so one in a million items may go missing), and an outage can result in needing a full re-index but this core will be different. Scale-out is possible on a huge scale big enough to run O365, and big enough for any challenge you can throw at it. FAST was always great at large scale, but this is
44
a different level; there should be less black art to building out big or highthroughput systems. Ultimately, what matters is that it works. Other than the dogfooding done at Microsoft (which is pretty big), there isnt much production experience with SharePoint 2013 yet, but every indication is that this is an architecture that is extremely solid for both SharePoint generally and search specifically.
Deeper Dives
SharePoint 2013 Search Logical Architecture Technet Search technical diagrams TechNet on Planning for SharePoint 2013
Index partitions are separate, which provide a lot of flexibility. They can be stored individually on disk in a file set. Alternately, they can be further divided into discrete sections containing a unique index component.
45
Microsoft has also developed a new nomenclature to describe the structure of the index. In FAST Search for SharePoint 2010, the structure of the index and configuration was described in terms of rows and columns. Adding columns increased the amount of content you can index and adding rows increased query volume throughput and redundancy. In SharePoint 2013 they have now adopted a Partition/Replica model to define functions within the overall search index, as shown below. Partitions are logical divisions of the overall search index. The entire index is composed of the aggregation of all the primary replicas across the logical partitions. When content is sent to the indexing component, a transaction is generated to acknowledge receipt of the content. Each partition then indexes the content from this transaction log. Secondary replicas are created as read only copies of the primary replica for scaling query volume of adding redundancy to the overall architecture.
one or more replicas of the index. The indexing component is responsible for managing and distributing the index across partitions. If an additional partition is added, the indexing component is responsible for the re-distribution of data across all the partitions. It is important to note that you can add additional partitions without re-indexing the data, but removal of a partition will force a complete re-indexing of all content.
Within a partition, there is only one primary replica that is responsible for writing data in the partition. Each partition can be served by
46
Analytics
Analytics are an often overlooked area, but have a crucial role in search both in providing insight into user behavior and system operations, and in improving the user experience. SharePoint 2013 has a new analytics architecture, which merges web analytics (where people click and navigate) and search analytics (what people search for and what results they get).This is a great improvement over SharePoint 2010, where the web analytics service application was quite limited in both capability and scale.The result is called the Web Analytics Platform, which has been completely redesigned and integrated into the search service application of SharePoint 2013. The analytics architecture consists of the analytics processing component, analytics reporting database and link database (as shown below). The analytics processing component analyzes crawled items (search analytics) and how users interact with search results (usage analytics). It uses the information to improve search relevance, and to create search reports, recommendations, and deep links.
The Analytics Processing Component extracts two kinds of information: Search analytics information such as links, anchor text, information related to people, metadata, etc. from items that it receives via the content processing component and stores the information in the link database. Usage analytics information such as the number of times an item is viewed, from the front-end via the event store. The analytics processing component analyzes both types of information. The results are then returned to the content processing component to be included in the search index. Results from usage analytics are also stored in the analytics reporting database for reporting purposes. The analytics component updates the SharePoint search index at time intervals set via a timer job, so it is independent of the crawl schedule. This can be confusing if you are trying to understand why search relevance changed. There is an extension point for custom events, but the analytics processing and search index update data flows are sealed from enrichment updates outside the SharePoint 2013 crawl. The results are most visible to the user as reports and recommendations. But there are several other ways that analytics shows up: Search relevance is enhanced based on user behavior (views, click thru, etc.) Popularity of content and of topics in discussion threads which is driven from number of views as well as number of unique users to view and can be viewed directly
47
Popularity can also be used to create views through the Content by Search (CBS) Web Part Usage analytics in WCM are particularly important, since they provide essential insight into the effectiveness of your web site. These analytics are search driven, built to scale (scaling was a weakness in SharePoint 2010), and open for extension. A Top Pages web part is included by default. Some data like view counts are also pushed into the index so it can be included in search results, sorted on (i.e. whats most viewed), etc. Personalized search queries and personal query suggestions in SharePoint 2013 are based on analytics data and usage information for each user. Recommendations (both item-to-item and popularity based) are available through this approach, as shown below. The recommended for you list is simply a preconfigured Content by Search web part it looks like a static list but its generated dynamically by search. The addition of both the Link database and the Analytics Reporting database provide for a great deal more personalization, analysis, and relevancy within the engine. The Analytics reporting database has been added to keep track of all forms of analytics. Search Analytics analyze crawled items and how users interact with search results. These actions are stored
in the event store within the Web Front End (WFE) server and are regularly pushed to the analytics processing component where the actions are analyzed and reconciled. They are then pushed into the analytics reporting database and made available to the query and processing components. This allows for search to keep track of user actions, queries, and trends to provide the user with better search results and suggestions. This database now powers features such as personal and engine-wide query suggestions, favorites, and other search personalization components not found in any other enterprise search platform today. Within the analytics system, there are five parts: Event: Each item comes into the system as an event with certain parameters Filtering and Normalization: Each event is looked at for special handling, normalization, and filtering; some are filtered out Custom Events: You can configure up to 12 custom events in addition to what comes OOB Calculation: Sum or average across events Reports: A number of default reports are available, including top queries, most popular documents in a library or site, and historic usage of an item (view counts)
48
The figure below shows an overview of the data flow for usage analytics, usage events, and recommendations.
provided by the engine, as well as improving the quality of queries the user issues.
Deeper Dives
TechNet overview of Analytics in SharePoint 2013
Note that 2010 web analytics arent supported running 14 mode, so running in 14 mode means running without any analytics.
49
Federating or Indexing?
Whenever someone is newly introduced to federation the immediate next questions that come up is: how does federation relate to indexing? Why should I continue to index remote systems if I can federate these? The truth is that indexing, if possible, is always better. If you index the content you can control relevancy, freshness, performance, faceted navigation, and filtering for the end-users (among other things). When you federate across search indices, you essentially relinquish control of these and become dependent on what the other system is capable of. With federation, your page will also be as slow as the slowest search engine queries and as relevant as the weakest sear engine queried. So federating results must be done carefully. Federation has proven very useful for scenarios where indexing may not be desirable or even feasible. For instance, your content is spread across multiple offices with low bandwidth connection, making any remote crawling last for days. In such conditions, you would not be able to keep your index fresh enough for your end-users. Another scenario is when you have so much content to index that it may not fit within a farm. Imagine, for instance, a 50,000-employee company wanting to search across SharePoint and e-mails. Even at a low estimate of 10,000 items per mailbox (thats roughly six months for an information
worker), this would represent over half a billion items to index! Finally, the remote source may not allow for crawling, technically or through license restrictions (imagine a secured deep-web content provider). In these cases federation is pretty much the only way to go.
While the options on SharePoint 2010 to provide organization-wide search were limited to a multi-search center or a published centralized search service, SharePoint 2013 let you federate across farms. You can now have one farm per region or office location and federate results across farms using results sources. You can do the same between your intranet and extranet farms. While simple on the surface, this functionality fills a serious gap that existed in the overall
50
scalability of SharePoint 2010. In the marketplace, FAST and SharePoint were being criticized for not having a global systems architecture. The approach was to tell users to centrally index all content in a large central farm, if the latency allowed. For global organizations, this was often not feasible. There are limitations to the remote result source construct. It is limited to SharePoint 2013 and requires that all federated farms be upgraded to SharePoint 2013. Results are not interleaved, which is what users typically expect; rather, they are provided in result blocks. Refiners are also not combined in any way. Overcoming these limitations is an exercise left to partners. But despite these limitations, remote result sources are a major step forward and a great feature to use. Result sources also take over the function of scopes in SharePoint 2010. They are a more powerful tool than both scopes and federation, and are worth getting to know.
51
some content available within the organization network only. There was no single place to search both sets of content from. SharePoint 2013 solves this scenario by enabling Remote SharePoint result sources to also support SharePoint online, therefore enabling scenarios where SharePoint online can federate with the on-premise search engine or vice versa. Result sources represent a key piece of technology to help organization migrate to SharePoint online.
The figure below shows an overview of the Exchange Search architecture in Exchange 2010. Full-text indexes are not stored in your Exchange databases. The search index data for a particular mailbox database is stored in a directory that resides in the same location as the database files. In Exchange 2013, the exsearch capability is replaced with a new search engine and index.
Deeper Dives
Microsofts comparison of indexing vs. federating TechNet configuring result sources Federation Use Cases Federation vs. Indexing
Search in Exchange
Search in Exchange 2013 has been given a facelift. Pull back the curtain, and it is the same new search core used with SharePoint 2013, optimized for large volumes of e-mail. To provide some comparison, Microsoft Exchange Server 2010 Search allows users to perform full-text searches across documents and attachments in messages that are stored in their mailboxes. Exchange Search (also known as full-text indexing) creates the initial index by crawling all messages in mailboxes within an Exchange 2010 database. As new messages arrive, Exchange 2010 Search updates the index based on notifications from the Microsoft Exchange Information Store service. This provides a much more powerful, more effective search for exchange users available through Outlook and Outlook web access alike. Another significant outcome of this change is that Exchange 2013 can appear as a result source to SharePoint 2013, as introduced in the chapter on Federation. This opens up a number of scenarios combining e-mail and other documents. In previous versions of SharePoint, you had the ability to connect to, and index exchange public folders but not personal inboxes. That remains the same with SharePoint 2013 (unless third party connectors are used), but now there is an ability to federate to exchange.
52
The key concept to understand in regard to this functionality is that each system handles the data resident within its silo (e-mail, tasks, contacts in Exchange 2013 and Documents and lists in SharePoint 2013). As discussed in the Federation chapter, there is some downside to this approach federation does not provide the same content processing, relevance, or performance as indexing. But this level of integration between SharePoint and Exchange is a wonderful feature that will help many users.You can get a single view across Exchange and SharePoint, as shown below. One of the new key features in SharePoint 2013 that relies heavily upon this tight integration between SharePoint 2013 and Exchange 2013 is the new Enterprise Content Management (ECM) stack and the associated e-Discovery components. From the e-Discovery perspective, the integration of SharePoint and Exchange allow for in place preservation of information within SharePoint and Exchange. The e-Discovery console allows for dashboard view of integrated, enterprise-wide case management.
previous versions of SharePoint, there was support for indexing content from Microsoft Exchange, but only in public folders. With the release of SharePoint 2013 and the fact that Exchange 2013 is using the same search infrastructure it is now possible to provide federated access to personal inbox results within SharePoint 2013. The primary benefits of this approach are: 1 Exchange 2013 and SharePoint 2013 leverage the same core search sub-system 2 Possible to include federated personal inbox results from Exchange 2013 3 Eliminates the need to re-index all inbox data within SharePoint 2013
53
Deeper Dives
TechNet Whats new in Exchange 2013 Overview of eDiscovery and In-Place Holds (SharePoint 2013)
Search Administration
There tends to be a preconception that search requires no administration. This is due in part to the simplicity of the search interface and the general lack of awareness of how search works. But it is also due to peoples experience of web search, where they dont have to do any upkeep. Little do they realize that Google.com has over 4,000 people administering search full time! Administering Enterprise search doesnt take that much work, but it does need to be someones job (even if not a full time job). There are two main levels of administration: system administration (installation, configuration, topology management), and search administration (rules, best bets, looking for no-results searches).
54
As a new thing in SharePoint 2013, you now have site collection level search administration too. Its pretty similar to central administration, naturally with a few limitations. Site collection administrators can set up and manage App catalogs, do term store management and User Profile Management, as shown in the screenshot below. Site collection administrators also have the power to manage some search settings in their site collections a huge step forward.
55
It is natural that this level of administration was introduced in SharePoint 2013 because of the emphasis on running multitenant in the cloud. Site collection administrators can start crawls; create result sources, and much more. This includes creating managed properties, which could only be done via central administration in SharePoint 2010, despite the fact that site collection or site administrators typically understand their content and crawled properties much better than central IT. Site administrators also have much more power with SharePoint 2013. They cannot create managed properties, but they have significant control over search which applies to their sites. The table below shows some examples of what Site Collection and Site Administrators can do.
sources in order to give powerful search options to their end users. Query Rules and Result Types can be managed down to the site level. These have a wizard for configuration (for example, the query builder interface) with a built-in preview of what the results look like. Result Sources are easy to manage, as shown below.
There are very significant improvements in Analytics, resulting from the new Analytics module. There are also better crawl reports, and process reports (see below). Since the Host Controller (described in the One Search Core chapter) is monitoring all NodeRunner processes, it can give the administrator a lot of insight into the system operations.
56
PowerShell like in SharePoint 2010, but in 2013 site collection administrators now have the ability to call a specific ranking model defined by the SSA admin from within query components at the site level. This means that site collection administrators can do much more with relevance control and ranking, choosing from a library created by the central administrator. PowerShell is available at all levels: central, site collection, and site administration, which gives you much more power. For example, we can create a PowerShell script for configuring all our search settings from the very beginning, from creating a search service application, modifying its settings, creating the content sources, etc. PowerShell can also retrieve, create, or modify query results. In addition PowerShell can get keywords, modify ranking models, and more. If you havent learned PowerShell already, you will definitely want to learn it now.
57
improvements on administration from all sides: crawling, content processing, query processing, analytics, and user experience. This is a search that administrators can learn to love.
Deeper Dives
TechNet Index of Windows PowerShell cmdlets for SharePoint 2013 Technet search topology in SharePoint Server 2013 SharePoint 2013 Developer Dashboard TechNet Manage the search schema in SharePoint 2013 TechNet View search diagnostics in SharePoint Server 2013
The search databases have changed significantly with SharePoint 2013. The search administration database supports a database attach upgrade, but the search index databases do not. As with essentially all search engines, to do an upgrade you will need to recrawl your content. One very nice advantage with SharePoint 2013 is that you can use PowerShell to make this happen with much less effort. The Database Attach method does help a lot with search. Content sources, server mappings, scheme, federated locations, scopes, best bets, and the like are all preserved and upgraded. As mentioned in the search administration chapter, there are tools for configuration import and export as well as PowerShell commands that can do very interesting things, including automate and tailor the upgrade process.
users to preview an existing site in 15 mode. Deferred site collection upgrade permits use of SharePoint 2010s UI with fewer operational hassles, while retaining the master page, JScript, SPF, and CSS applications of SharePoint 2010. This is an expensive operation, so you probably dont want to use it everywhere, but it is a great facility to allow for safe, well managed upgrades both from the software perspective and the user perspective. With Search, an upgrade of search centers generates result templates that include the hover panel, and which have previews (when a separate Office Web Apps server or set of servers is available). Scopes are upgraded but cant be changed they are replaced by the new Result Templates, but the corresponding result templates arent generated automatically.
59
Sequentially the steps are as follows: 1 Deploy and configure a new SharePoint 2013 Services farm, including a search center. Migrate the search settings from the SharePoint 2010 farm. When the search-first migration is complete, this farm provides search functionality to end-users who are still working in the SharePoint Server 2010 farm. 2 Crawl all content in the SharePoint Server 2010 farm by using a crawler (or multiple crawlers) in the SharePoint 2013 services farm. Continue to crawl this content regularly. 3 Configure the SharePoint Server 2010 farm to consume search from the 2013 services farm, using federated services. Some things will be best consumed by doing redirects (for example using a new search center with the new functionality cant be done via federated services). The search-first migration pattern opens the door for a much wider set of possibilities hybrid solutions.
Hybrid Solutions
When you talk about the upgrade of Search from SharePoint 2010 to SharePoint 2013 there is the potential for some hybrid solutions using different versions of SharePoint or using cloud and on-premise SharePoint instances in the same company. Generally, hybrid means a combination of on-prem and cloud content in a single view. There are several ways to accomplish this, including indexing and federation as mentioned in the Federation chapter. The figure below illustrates the various permutations of hybrid configurations.
60
Crawling and indexing content from the cloud (such as from O365) is a very solid way to create a unified view, and has the benefit that indexing generally has: unified content processing, solid and consistent relevance and navigation, and consistent fast performance. Although this scenario is not supported by OOB connectors with SharePoint 2013, there are partner-built connectors that accommodate it. With SharePoint 2013, the remote result source construct means that a view can be created using federation, specifically between O365 and on-prem SharePoint. There are limitations to the remote result source construct. It is limited to SharePoint 2013 and requires that all federated farms to be upgraded to SharePoint 2013. Results are not interleaved, which is what users typically expect; rather, they are provided in result blocks. And refiners are not combined in any way. Overcoming these limitations is an exercise left to partners. But despite these limitations, remote result sources are a major step forward and a great feature to use.
The same idea applies to more general scenarios. When you have more than one SharePoint farm, you can handle cross-version scenarios. You can have a Search on SharePoint 2013, while you have content and other applications on SharePoint 2010 or even in SharePoint 2007. You can have SharePoint 2013 in the cloud with SharePoint 2010 on-prem. You can include other content in the cloud that should be crawled, such as Microsoft CRM online or SalesForce.com. With these techniques, its possible to field a very broad with different versions and different options. This helps with many things, including migration. Federation applies well to cross-version scenarios. Although SharePoint 2013 only supports same-version remote result sources, it is feasible for partners to create federation across multiple versions, which appear as a result source. A configuration like the one shown below provides many benefits. With respect to upgrades and migration, it means that legacy search systems can be left in place and federated into a SharePoint 2013 search center. While this is not as good as combining all content into a common index, it is a very useful technique that allows you to upgrade or migrate complex systems a piece at a time.
Cross-version Configurations
How do these scenarios help with upgrade and migration? If you extend them to cross-version configurations, it becomes clear. Search-first migration is an example of crawling on-prem content from on-prem search (the upper left scenario in the figure above), but across versions. By crawling SharePoint 2010 content from SharePoint 2013 search, you can provide an upgrade path that can be done a step at a time, maximizing the benefit to users while minimizing initial effort.
61
The case of migrating from SharePoint 2010 search to SharePoint 2013 search is the best supported one. There are some gotchas in this migration as mentioned throughout this e-book, but the process is generally smooth and well covered by Microsoft. Going from SharePoint 2010 search to SharePoint 2013 is a step up in nearly every way, so there arent that many rough spots to consider. If you are migrating from FAST Search for SharePoint, many of the same tools and techniques apply, but there are more corner cases to consider and more feature changes to consider. If you are moving from FAST ESP or FAST Search for Internet Sites, there are significantly more considerations. The migration patterns and techniques still apply, but you are more likely to have a heavily customized search deployment that uses special FAST features which have been supplanted by other mechanisms. There is help available however. Microsoft has a big ecosystem of partners and there are some that have specific focus, tools, and techniques for this kind of migration. You may not get direct support from Microsoft, but you can tap into this ecosystem for help.
62
each site collection to the appropriate owner if you like. Upgrading search is part of upgrading SharePoint, and the standard upgrade process from SharePoint 2010 to SharePoint 2013 covers search well. But search poses special challenges the more complex and customized your search configuration is, the more challenging the upgrade will be. Search also offers solutions to many upgrade challenges for SharePoint as a whole. Since search bridges information silos, it can bridge across different farms, across different versions, and across on-prem and in the cloud instances. Techniques such as search-first migration, crawling remote farms from SharePoint 2013, and use of federation are available not OOB but through Microsofts ecosystem. A unified view across these different dimensions provides users a great experience while allowing you to upgrade or migrate one piece at a time.
Deeper Dives
Services upgrade overview for SharePoint Server 2013 SharePoint Online administration BA Insight resources for Integrating O365 content TechNet SharePoint Server 2010 deprecated search features TechNet FAST Search Server 2010 for SharePoint deprecated features
63
C HAPTER 5
64
2010 CSOM is a Windows Communication Foundation (WCF) service with three different proxies to enable Silverlight, JavaScript, and .Net managed clients to call into SharePoint remotely. With SharePoint 2013 the server side code runs off the SharePoint server farm via declarative hooks like apps, declarative workflow and remote events which then communicate back to SharePoint using CSOM or REST.
There are lots of advantages to this model. Traditional SharePoint development was heavy lifting and had a steep learning curve; the new SharePoint 2013 model is much more manageable which will open up SharePoint to a much wider audience of developers. Serverside code can impact the performance of SharePoint, be complex to install and upgrade, and cant be run on public cloud services. The CSOM in SharePoint 2013 is much more powerful you can do almost everything the server side APIs did in SharePoint 2010. In addition, it supports OData now the
65
leading industry protocol for performing CRUD (Create, Read, Update and Delete) operations against data, as shown below. Depending on your deployment scenario, you can still use sandbox and farm solutions to push server side code to SharePoint 2013, however, Microsoft recommends that developers follow the new app model as the preferred way of building their custom applications for SharePoint 2013. The message is dont make any new sandbox solutions and build new farm solutions only if you absolutely have to.
3 Azure Auto-Hosted App (which runs in an Azure instance which is invisibly provisioned by Office 365) Apps are simple and powerful, but they have a number of limitations, and there are still many cases where SharePoint solutions are called for instead. Anything that uses server-side code, does farm-level work, has a high level of complexity, or has installation coupling or dependencies calls for a SharePoint solution rather than a SharePoint App.
just another SharePoint solution. This is revolutionary enough: it means you can use search via a REST interface, include it in an Office App, and use it easily in combination with other parts of SharePoint. But customizing search also means creating or customizing connectors using BCS or a protocol handler (see the content capture chapter), customizing linguistics using the Content Enrichment Web Service (CEWS) (see the content processing, linguistics, and CEWS chapters), working with other service applications, and more. There are numerous search-specific web parts, including the new Content by Search web part (shown below), which is a powerful swiss-army knife tool.
automated language translation of files (think multilingual search), and the Work Management Service that provides task aggregation functionality. If you are doing query-side-only work, you might be able to use an app model. But for the most part, developing sophisticated searchbased applications will remain the domain of SharePoint solutions with SharePoint 2013. There are several things (connectors and pipeline extensibility) which are still per SSA and not per tenant.
Search combines well with other parts of SharePoint with content management, with workflows, with BI, and with sites. It also can be used with several of the new service applications in SharePoint 2013. These include the Machine Translation Service that supports
67
are limited to SharePoint content only. Theres no mechanism for getting external data indexed into O365. Developers that want to approximate these from the outside have to live with limited performance and build a very complex structures or use third party frameworks. Many of the mechanisms inside search are sealed and cant be extended. Update groups, query flows, analytics processing, web crawling are examples. Its completely understandable that these be kept intact from meddling developers, and there are some of these that can be influenced safely using partner products. But its frustrating to see these mechanisms and not be able to touch them. The SharePoint App model and SharePoint Marketplace are aimed at lightweight, simple apps and not something you would use for a full business application today.
Developing with search is still hard. Intrinsically, areas like content processing and relevance are imperfect, since were dealing with human language and subjective opinions of the right answer. There are no joins or aggregation internal to search so there are limits to combining structured and unstructured content. But SharePoint 2013 is far ahead of any search platform in terms of available capabilities, performance, ease, and safety of development. And there is a strong ecosystem with available building blocks and complementary capabilities to use in creating great applications with search.
Deeper Dives
Book chapter on developing with search from Wrox Professional SharePoint 2010 Development MSDN overview on developing with SharePoint 2013 MSDN section on building search queries with SharePoint 2013 SharePoint 2013 Developer Dashboard
68
69
The web service must send a response to the web service client within a given timeout. No specific authentication or encryption mechanisms are supported as part of the contract.You can, however, apply your own security on the transport mechanism. A trigger condition is registered in the ContentEnrichmentConfiguration object which allows control of when the content flow calls out to an external web service. A set of PowerShell commandlets are provided to control the configuration, and there are robust error handling mechanisms built in.
The advantage of this is that you can provide custom linguistics even at a fairly low level, and influence other aspects of the pipeline. The control afforded by this is wonderful and will be exciting to those wanting to address specific linguistic processing at a low level. The disadvantage is that you cant leverage the work done in the pipeline when you are doing external processing, as shown below. This not only means extra work as a developer, but introduces the potential that linguistic processing could get out of sync.
The extensibility call outs are invoked synchronously, in line with the processing flow, so long-running enrichment tasks or batch-oriented processing tasks will require enrichment data flow management independent of and outside SharePoint 2013. Not all managed properties (or any crawled properties) are visible to the CEWS and less state (potentially useful for supplemental linguistics processing) is exposed than in FS4SP. Finally, the CEWS is visible as a single logical endpoint to the potentially many content processing flow instances in SharePoint 2013. There is only one ContentEnrichmentConfiguration object active, and only one trigger, etc. This means
70
that throughput management, and support for multiple enrichment stages (more than once instance of taxonomy classifiers or custom entity extractors) need to be managed externally, which will pose some interoperability challenges if you are interested in doing multiple types of content enrichment. * Note: The CEWS call out is not part of O365 and is only available for the Enterprise Edition of SharePoint 2013. CEWS is a new mechanism in SharePoint 2013. It has many nice aspects it is a more standard, higher performance mechanism than that available in the past. It also provides the ability to modify some managed properties, making it possible to address use cases that were nearly impossible with FAST Search for SharePoint. CEWS also has limitations, and using it will require special attention by developers. But all mechanisms have limitations. Overall, Microsoft has provided a strong and essential extensibility mechanism that lets you do magic things with content processing and linguistics.
Deeper Dives
MSDN Section on Content Enrichment Web Service (CEWS)
71
with Lync which provides presence information and makes it easy to connect with people directly from search results. Site search (aimed at making public web sites easily navigable) is a big step up with this release as well. There are also search facilities built into each site for example, every document library now has a search box at the top that enables users to search across metadata and the full text of its documents, and the result list is presented as a standard SharePoint view rather than as a results page. A video search SBA is provided out-of-the-box, including a pre-built presentation format that makes it easy to recognize the video content youre looking for. There are significant enhancements in video support for SharePoint 2013 generally, including a built-in HTMLHTML 5 video player. The use of video including enterprise podcasts will be on the rise, so video search is now an important facility.
Out-of-the-box Applications
There are three general-purpose search applications included out-of-the-box with SharePoint 2013. Intranet search -typically used for all employees to find content throughout the enterprise, benefits from personalized search results based on search history and rich contextual previews. People search (which includes the advances from SharePoint 2010 such as phonetic name search) is integrated
72
metadata navigation defined from values in the term store is available. Page hierarchies, URLs, and Topic Pages Pages and page hierarchies are easily defined from the term store. You can also generate topic pages, which makes SEO straightforward. The figure below illustrates how this works; SharePoint now generates friendly URLs which makes this process work like any normal site.
Theres an HTML-based presentation template model that makes it easy to fine tune the look and feel, and built-in web part editors to set up the query driving the content presentation, as shown below. This doesnt require writing any code and is well within the reach of a business analyst. You see immediate previews of what the results will look like.
Recommendations A new recommendation facility is included which can surface suggestions based either on popularity or on correlations between items (see chapter on Analytics) There are other exciting things about WCM with SharePoint 2013. Standard web design tools and workflows are supported; there are great facilities for content variations including a built-in language translation service; you can publish easily across sites, and video and images are easily embedded and beautifully rendered across multiple devices and resolutions. The URLs generated are clean, and search-engine optimization is directly supported. You can use catalog-enabled sites for scenarios such as a content repository, knowledge base, or product catalog. But the heart of WCM in this release is search, which makes dynamic page generation and remarkable site experiences possible.
Metadata Navigation As described in the chapter on refinement and faceted navigation, facets are available for users to drill into content. In addition to refiners (which are driven from the values in the content),
73
application. There is now unified discovery across Exchange, SharePoint and Lync, as shown below. Exchange now has the same search infrastructure as SharePoint, which makes unifying the search much easier (Lync archives via Exchange). The Discovery Center in SharePoint uses this to provide a unified console, with in-place holds that dont impact the end users ongoing work. Theres more to e-Discovery than search, of course preservation, holds, policy management, and export. But search is the cornerstone and is what makes it possible to recall all the information needed to react to legal actions, without getting irrelevant information that you have to sift through. The e-Discovery functionality in SharePoint Server 2013 provides is a big step up from
74
SharePoint 2010, and is probably the first time you could consider this to be a full applications. There are several parts to e-Discovery: A site collection where you perform e-discovery queries across multiple SharePoint farms and Exchange servers and preserve the items that are discovered. In-place preservation of Exchange mailboxes and SharePoint sites including SharePoint list items and SharePoint pages while still allowing users to work with site content. Support for searching and exporting content from file shares. The ability to export discovered content from Exchange Server 2013 and SharePoint Server 2013. The eDiscovery Center site template creates a portal for discovery cases and lets you conduct searches, place content on hold, and export content. For each case, you create a new collaboration site that uses the eDiscovery Case site template. You can export the results of an eDiscovery search for later import into a review tool. SharePoint 2013 provides in-place holds content that is put on hold is preserved, but users can still change it. The state of the content at the time of preservation is recorded. If a user changes the content or even deletes it, the original, preserved version is still available. To implement eDiscovery across an enterprise, you configure SharePoint 2013
Search to crawl all file shares and websites that contain discoverable content, and configure the central Search service application to include results from Exchange Server 2013. Any content from SharePoint 2013, Exchange 2013, or a file share or website that is indexed by Search or by Exchange Server 2013 can be discovered from the eDiscovery Center.
75
for customizing search experiences without any code at all. Many areas can be extended connectors, content processing, relevance, query processing, and UI with moderate effort and standard tools. Fully custom code is supported as well. We find that the use of modular building blocks speeds the construction of search based apps dramatically. Since these applications follow common patterns, a relatively small number of sophisticated modules can cover a large number of applications. If you undertake a sophisticated search-based application, consider whats available on the market as well as what you might build yourself since pre-built building blocks can save substantial time and reduce risk.
Deeper Dives
Book chapter on developing with search from Wrox Professional SharePoint 2010 Development Overview of eDiscovery and In-Place Holds (SharePoint 2013) Blog on using the Content by Search Web Part BA Insight Search as a Development Platform TotalView Search-Based Applications
76
Conclusion
77
CONcLUsION
This e-book has covered a lot of ground, since SharePoint 2013 has so many underlying changes, new capabilities, and new features. Weve tried to cover everything in concise, readable chapters, across five major sections: User Experience; Working with Queries; Working with Content; Architecture, Deployment, and Operations; and Development and Applications.
User Experience
Clean, fast, and easy to use Straightforward to install, administer, and scale Provides very powerful high-end search
features
Microsoft has done a remarkable job making this high-end technology accessible and easy for the mainstream. However, it is not a perfect platform, and there are still challenges with search. Search is, after all, a journey. BA Insight is entirely focused on the road that lies ahead for search and SharePoint 2013, and we stand ready to help you on your journey. As you learn more about SharePoint 2013 and search, here are some things to consider and some steps wed suggest:
78
CONcLUsION
THINgs TO cONsIdER
SharePoint 2013 includes a very powerful new search engine. There are new mechanisms in SharePoint 2013 (result sources, query rules, and result types) that replace familiar ones, and take some getting used to. These are now in the hands of site collection administrators and site administrators, so there is much more control at that level. Crawling and BCS have evolved further in SharePoint 2013, including a new continuous crawl feature, however connectors are still largely left to partners. The new search core in SharePoint 2013 is different from either FAST or SharePoint 2010, and you will notice improvements in relevance, performance, and robustness. Hybrid configurations across on-prem SharePoint 2013 and O365 are supported OOB using result sources. Crossversion configurations are not supported OOB but there are techniques and partner products for these cases. Though SharePoint 2013 Search is great, there are still limitations and cases where the mechanisms dont cover what you wish to accomplish. The term store is now an administrative center for entity extraction, query suggestions, faceted refinement, WCM page hierarchies, and more. If you are coming from FAST, you will recognize a lot of concepts and powerful features. But you will also notice a number of things missing. SharePoint 2013 has a new development model that is lightweight and available to a much wider range of developers. Search in SharePoint 2013 is a powerful platform designed to support search-based applications.
Get to know the new release ASAP download the bits, read about it, and confer with folks that know it. Try to develop a champion amongst your site administrators, who learns the new tools. Set up a playpen system where people can get used to the new mechanisms.
Take stock of your current and future content sources and think about extending search to more content. Look at learning how to make simple connectors yourself, and at Microsoft connector partners for more complex systems. Consider how quickly you can migrate to the new platform. Factor in techniques which allow you to upgrade a step at a time, such as search-first migration and cross-version federation. Consider adopting O365 quickly, in ways that you dont need to do it all at once. Talk to Microsoft Partners about federation, cross-version configurations, and migration. Look to the Microsoft partner ecosystem for training, components, and innovative solutions. Get familiar with the term store. Find out where there are key lists in your organization (product names, project names, industries, etc) you will be able to import these into the term store and use them for entity extraction. Focus on the problem, not the specific mechanism theres a way to get it solved with this platform. Turn to Microsoft Partners for products that round out all the possibilities. Consider applying JavaScript developers to building SharePoint Apps. Look around your organization for opportunities to apply search-based applications.
79
BA Insight is Social!
Read our blog: www.DoMoreWithSearch.com Follow us on Twitter: @bainsight Linkedin Group: Microsoft Enterprise Search Or find us on Facebook
BA Insight is a leader in agile information integration, enabling business to drive innovation by leveraging all knowledge and data across the enterprise. Offering new generation, cost effective alternative to expensive systems integration, the companys award-winning technology provides a scalable foundation for liberating enterprise data, both structured and unstructured. Microsofts go-to partner for advanced search technologies, BA Insight enables customers to leverage their investments in SharePoint, FAST and other enterprise systems, and extend them with an overlay of easy-to-assemble, highly targeted business applications. Since 2004, more than three million users around the world have relied on BA Insight for low-cost, on-demand access to the information they need. To learn more about BA Insight, visit www.BAinsight.com.