SharePoint 2013 feels like a breeze of fresh air, offering many new features and changes over older versions. In addition to a whole new social experience, a new development model called Apps, and native HTML5 support, SharePoint 2013 introduces a new and improved search engine. As the title of the book implies, this book is all about the new search engine. In this introductory chapter we will get a taste of the new features SharePoint 2013 Search brings to the table and then deep-dive into the architecture that holds this system together.
In this chapter, we will cover the following topics:
New features of SharePoint 2013 Search
The new search architecture
The SharePoint 2013 Search engine is the most powerful enterprise search engine Microsoft has created to date. With this new release, Microsoft combined all of the best features of the legacy SharePoint Enterprise search engine with the best features of the FAST search engine, which Microsoft acquired back in 2008.
UI changes and customization
Relevance and ranking features
New development methods
SharePoint 2013 changed that by adding most of the search settings from the farm level to site collections and sites (SPWebs). As SharePoint 2013 is offered as a cloud service (through Office 365), and cloud users have no access to settings in the farm-administration level, this was a welcome change that both cloud and on-premise site administrations can take advantage of.
Let's have a look at what settings are available for us to administrate; these are shown in the following screenshot:
We will discuss these settings in detail in Chapter 2, Using the Out of the Box Search Components, but for now just keep in mind that a site administrator can configure the search experience on his/her site in ways that were reserved exclusively to farm administrators in previous versions.
In addition, Microsoft introduces a new crawling mode, continuous crawl. Continuous crawl helps to keep the search index as fresh as possible by crawling SharePoint sites (and only SharePoint sites) every 15 minutes, by default; we can change this value using PowerShell, as shown in the following snippet:
$ssa = Get-SPEnterpriseSearchServiceApplication $ssa.SetProperty("ContinuousCrawlInterval",<minutes>)
The value we use for
<minutes> is the number of minutes between crawling.
When running, the crawler gets changes from the crawled SharePoint sites and pushes them to the content processing component, which will process the new content on the fly.
If there is one change in SharePoint 2013 Search that just pops to the eyes, it is the new and fresh user interface (UI). If you worked with SharePoint 2010 search, you'll remember the following screenshot, showing a search-results page:
By looking at the preceding screenshot, we can see that it sports a pretty simple UI. We have textual refinements on the left side; predefined search scopes for websites and people (All Sites and People) on top, and a main, simply styled results area without grouping or categorization of results.
To customize the way the results are shown, we had to use XSL/XSLT, which is quite a messy and unattractive way to design.
Fast forward to the present day. The following screenshot displays how the results page looks like in SharePoint 2013:
Take a look at the refinement panel on the left. While we still have textual refiners, we also have graphical ones, such as a scroller for dates.
We will discuss all of these new and exciting customizations features in detail in Chapter 4, Customizing the Look.
As mentioned earlier, SharePoint 2013 Search took the best features of SharePoint Search and FAST and improved them. As such, SharePoint 2013 uses new and improved ranking models to determine which items are to be displayed and what would be their rank (the order in which they are displayed).
The key to successfully determine the relevancy of search results is to satisfy the intent of the person who issues the query. Let's explain this statement with an example; say I'm performing a search for Apple. Now, did I search for apple the fruit or Apple the technology company?
SharePoint 2013 Search continuously tracks and analyzes search usage to determine how content is connected, how often an item appears in search results, and which search results people click in order to continuously improve the relevance of items to the search query. So, if I clicked on a lot of fruit-related results, the search engine will assume I was looking for apple the fruit, and not the technology company.
With this new release of SharePoint, Microsoft made changes to the search-development model. The old SOAP web service (ASMX) has been deprecated alongside the SQL query syntax that we could use to query against SharePoint data.
But, just like the the old saying goes, "out with the old and in with the new", we get some new features to play with to replace the ones that are gone.
A dedicated Representational State Transfer (REST) service that enables us to execute queries against the search service from client applications using libraries such as jQuery or RestSharp. The REST service supports all of the properties available in the CSOM object, but instead of working against objects, we use the URL's query string to send parameters to it.
An enhanced keyword query language with new and improved operators such as ONEAR and XRANK.
Now that we have a general idea about what's new in SharePoint 2013 Search, let's go ahead and discuss the architecture that makes all of this happen.
SharePoint 2013 Search introduces a new search architecture that includes significant changes and new additions compared to previous versions. Since Microsoft consolidated FAST and SharePoint Search, the new search architecture has inherited components from both products while maintaining high scalability and performance.
The crawl component is responsible for crawling content sources. It is the first stop for data that is about to be indexed by the search engine. The crawl component invokes connectors (both out-of-the-box and custom ones) that interact with the content source in order to crawl it.
While indexing, the crawl component uses one (or more) crawl database to temporarily store detailed tracking and historical information about the crawled item, such as the last time the item was crawled and the type of update during the last crawl.
Once an item is crawled, meaning both its data and its associated metadata is crawled, the crawl component delivers it to the content-processing component.
The rectangular blocks in the diagram represent stages that we cannot interact with. We won't be discussing them as they are quite self-explanatory. The curved rectangular blocks, however, represent stages that we can interact with during the processing flow.
The Web service callout stage is similar to the pipeline extensibility stage of FAST for SharePoint 2010, and allows you to add a callout from the content-processing component to a web service of your own so you can manipulate the crawled content before it gets indexed by the index component.
Unlike FAST's pipeline-extensibility stage, where code had to be executed in a sandbox, the web service callout accepts a web service endpoint, which is much easier and reduces the overhead involved in writing a console application to accompany the content-flow process.
Calling a web service during the processing stage can be useful for two scenarios.
Creating new refiners by extracting data from unstructured text using our own logic
Calculating new refiners based on the data of managed properties
You can find a great example on using the web service callout in Kathrine Hammervold's post, Customize the SharePoint 2013 search experience with a Content Enrichment web service, located at http://blogs.msdn.com/b/sharepointdev/archive/2012/11/13/customize-the-sharepoint-2013-search-experience-with-a-content-enrichment-web-service.aspx.
The next point of interaction is the word-breaking stage, which allows you to write your own custom word-breaking logic for the content processor. Please refer to the MSDN documentation on custom word breakers, located at http://msdn.microsoft.com/en-us/library/jj163981.aspx.
The web frontend is where the search process actually begins. A user can interact with the search service by either writing a search query in the search center (or a search box) or developing against the new public APIs: REST/OData services and the CSOM. Both the search center and public APIs are hosted on the frontend.
Once the user creates a query, the query is sent to the query-processing component for analysis. The query-processing component analyzes the query and forwards it to the index component. The index component returns the matching results to the query-processing component for another analysis and from there the results are forwarded to the web frontend to be displayed.
When the query-processing component receives a search query from the frontend, it analyzes it in an attempt to optimize its precision and relevance. A site administrator can interact with a query using different techniques such as query rules or result source. We will discuss these techniques in detail in the next chapter, but for now it is important to understand that these manipulations are handled within the query-processing components. As part of its query handling, the query-processing component performs linguistic processes on the query, such as word-breaking and stemming.
Once the query is optimized, it is sent to the index component, which will process the optimized query and return a result set back to the query-processing component and from there to the search frontend.
The index component has the following two roles:
How the index component saves and manages this index file is out of the scope of this book, but you can read more about this in the TechNet article Manage the index component in SharePoint Server 2013, located at http://technet.microsoft.com/en-us/library/jj862355.aspx.
The analytics architecture consists of three main parts, as follows:
The analytics-reporting database, which stores statistical information such as usage data.
The link database, which stores information about searches and crawled documents. In addition, the link database is shared with the Content Processing Component, which in turn stores links and anchors in it. The information, the content-processing component stores is later used by the analytics-processing component.
The analytics-processing component runs two types of analytics: search analytics and usage analytics. The search analytics analyzes content from the content-processing component for information such as links, information related to people, and recommendations. The usage analytics analyzes user actions on an item, such as the number of views it had or how many users clicked on it.
An important output of usage analytics are the recommendations. The recommendations analysis creates recommendations on items based on how users have interacted with this specific item in the past. The analysis calculates an item-to-item relationship graph and updates it continuously based on search usage.
Keep in mind that the analytics-processing component is a "learning" component, which means it learns by usage. The more usage the search system will have, the better analytics it will provide.
This chapter marks the beginning of our journey to create search-driven applications using SharePoint 2013. We started the chapter by discussing the new features of SharePoint 2013 Search and divided them into four categories: administration changes, UI changes, relevance and ranking changes, and new development methods. Once we had an idea about what's new in SharePoint 2013 Search, we went on and deep-dived into the new search architecture.
In the next chapter we will get our hands dirty, and once we understand how to work with the out-of-the-box search components, we will build our first search-driven application using them.