Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Integrating with Google Apps - Google Search Appliance - Google Code http://code.google.com/intl/fr/apis/searchappliance/documentation/64/i...

Integrating with Google Apps


Google Search Appliance software version 6.4
Posted May 2010

With Indexing Google Apps content, you can use your Google Search Appliance to search your domain's Google Apps
content. In a few clicks, you can enable your search appliance to crawl, index, and serve public Google Sites and Google
Docs.

Contents
About This Document
Example Google Apps Domain
Overview
Integrated Google Apps Services
Prerequisites for Indexing Google Apps Content
What Is Public Google Apps Content?
Enabling or Disabling Indexing of Google Apps Content
Enabling Indexing
Disabling Indexing
Writing URL Patterns for Google Apps Content
Example URL Patterns
Managing Crawl of Google Apps Content
How Is Content Counted Against the License Limit?
Controlling Crawl
Monitoring Crawl
Managing Serve of Google Apps Content
Google Apps Icons in Search Results
Managing Serve with Front Ends
Managing Serve with Collections
Preventing Google Apps Content from Appearing in Search Results
Biasing Google Apps Content in Search Results
Filtering Google Apps Content in Search Results
Removing Google Apps URLs from the Search Index
Creating Collections of Google Apps Content
Importing or Exporting Configurations for Indexing Google Apps Content
Exporting a Configuration File
Importing a Configuration File

About This Document


This document provides information about how to serve content from your Google Apps domain along with those from a

1 sur 12 03/08/2010 11:22


Integrating with Google Apps - Google Search Appliance - Google Code http://code.google.com/intl/fr/apis/searchappliance/documentation/64/i...

This document provides information about how to serve content from your Google Apps domain along with those from a
Google Search Appliance. This document is intended for search appliance administrators who need to understand:
How the search appliance indexes Google Apps content
How to enable indexing of Google Apps content

This document also provides guidance for importing or exporting a search appliance configuration for indexing Google
Apps content, as well as controlling crawl and managing serving content in a Google Apps domain. The following table
lists the major sections in this document.

Section Describes

Overview How indexing Google Apps content enhances search results

Enabling or Disabling Indexing Google How to enable a search appliance to index Google Apps content
Apps Content

Importing or Exporting Configurations for How to export a search appliance configuration that includes
Indexing Google Apps Content indexing Google Apps content and what happens to a search
appliance when you import such a configuration

Writing URL Patterns for Google Apps How to write valid URL patterns for managing crawl and serve of
Content content in a Google Apps domain

Managing Crawl of Google Apps How to control and monitor crawl of content in a Google Apps
Content domain

Managing Serve of Google Apps How to manipulate Google Apps URLs in search results and
Content remove them from the search index

Example Google Apps Domain


Several sections in this document use a Google Apps example domain called nucleotraining.com. This domain is used by
the training group at a fictional company called Nucleo Worldwide Systems and contains the following content:
The NucleoTraining site, which is used for internal communication and collaboration by members of the Nucleo
training group, including teachers, course coordinators, and courseware developers
Documents created and used by training group members, including course descriptions and courseware modules
Presentations created and used by training group members, including course presentations
Spreadsheets used by the course registrar, including class registration information and teacher schedules

Overview
Google Apps provide your organization with tools for collaborating on documents, spreadsheets, presentations, and sites.
As a search appliance administrator, you can integrate your search appliance with Google Apps. This integration enables
the search appliance to crawl, index, and serve results from a Google Apps domain's public Google Docs and Google
Sites content.

The following steps provide an overview of the entire process of integrating a search appliance with Google Apps:
1. Your organization signs up for Google Apps and identifies a Google Apps domain and administrator.
2. Using Google Apps, users create documents, spreadsheets, presentations, and sites in the domain.
3. You, as search appliance administrator, enable indexing Google Apps content for the domain on a search appliance.
4. The search appliance crawls public content from Google Apps in the domain.
5. When a user enters a search query in a search appliance front end, the search appliance serves results that might
include content from the Google Apps domain, depending on the query.

Integrated Google Apps Services


The Google Search Appliance indexes content from a subset of Google Apps services, as listed in the following table.

Google Apps Service Indexed ?

Google Docs (documents, spreadsheets, and presentations) Yes

Google Sites Yes

2 sur 12 03/08/2010 11:22


Integrating with Google Apps - Google Search Appliance - Google Code http://code.google.com/intl/fr/apis/searchappliance/documentation/64/i...

Gmail No

Google Calendar No

Google Video and other services No

Prerequisites for Indexing Google Apps Content


To index Google Apps content, a search appliance must be able to access Google.com and there must be a Google Apps
domain with content that the search appliance can crawl and index.

To sign up for Google Apps, visit the Google Apps editions page. When you sign up for Google Apps, you select:
A domain for your organization's Google Apps
A Google Apps administrator username
A Google Apps administrator password

Each Google Apps domain can have multiple administrators. As a search appliance administrator, you must have a domain
name, a Google Apps administrator username, and a password to enable indexing Google Apps.

What Is Public Google Apps Content?


When a Google Search Appliance integrates with Google Apps, the search appliance crawls and serves only public
content from a Google Apps domain. Any Google Apps content that has not been published is not crawled. The following
subsections describe how to make content in Google Docs and Google Sites public.

Making Content in Google Docs Public


To make a Google Doc public, you must publish the document, presentation, or spreadsheet as a web page.

To publish a document as a web page:


1. When viewing the document, select Share > Publish as web page.
2. Click Publish document.

To publish a presentation as a web page:


1. When viewing the presentation, select Share > Publish/embed.
2. Click Publish document.

To publish a spreadsheet as a web page:


1. When viewing the spreadsheet, select Share > Publish as a web page.
2. Click Start Publishing.

For more information about publishing Google Docs, refer to the Google Docs Help Center.

Making Content in Google Sites Public


To make a site public, you can:
Share the site with everyone in the same domain
Make the site public

To share a site with everyone in the same domain:


1. When viewing the site, select more actions > Site sharing.
2. On the Sharing page, check Anyone at <domain_name> may view/edit this document.

To make a new site public:


1. When viewing the site, select more actions > Site sharing.
2. On the Sharing page, check Also let anyone in the world view this site on the Create new site page.

To make an existing site public:


1. When viewing the site, select more actions > Site sharing.
2. On the Sharing page, check Anyone in the world may view this site.

Enabling or Disabling Indexing Google Apps Content

3 sur 12 03/08/2010 11:22


Integrating with Google Apps - Google Search Appliance - Google Code http://code.google.com/intl/fr/apis/searchappliance/documentation/64/i...

As a search appliance administrator, you do not need to perform any configuration tasks to index Google Apps content.
You only need to enable indexing Google Apps content by using the Google Apps > Indexing Google Apps Content
page in the Admin Console.

When you enable indexing Google Apps content, the search appliance configures itself to:
Authenticate to Google Apps
Crawl your domain's public Google Docs and Google Sites content

For information about using the Indexing Google Apps Content page, click Help Center > Google Apps > Indexing
Google Apps Content in the Admin Console.

Enabling Indexing
For one search appliance, you can enable indexing Google Apps content for Google Apps domain. Each time you enable
indexing Google Apps content, the search appliance downloads the latest list of Google Apps services and access control
policies. Check this document for a description of the latest functionality. To enable indexing Google Apps content, you
need the following information for the Google Apps domain:
A domain name
An administrator username
An administrator password

For information about how to get this information, refer to Prerequisites for Indexing Google Apps Content.

Each time you enable indexing Google Apps content, the search appliance downloads the latest list of Google Apps
services and access control policies. Check this document for descriptions of the latest functionality.

Example: Enabling Indexing a Google Apps Domain


For example, suppose that you want to enable indexing the nucleotraining.com domain. Your Google Apps administrator
name is admin5@nucleotraining.com and you have the Google Apps Administrator password for the domain.

To enable indexing nucleotraining.com:


1. Click Google Apps > Indexing Google Apps Content.
The Indexing Google Apps Content page appears.

2. In the Domain box, type nucleotraining.com.


3. In the Username box, type admin5 (omitting @nucleotraining.com).
4. In the Password box, type the administrator password.
5. Click Enable.

Even if you change the administrator password or the administrator account is deleted, the indexing continues to work.

Disabling Indexing
To disable indexing Google Apps content:
1. Click Google Apps > Indexing Google Apps Content.
The Indexing Google Apps Content page appears.

2. Click Disable.

To re-enable indexing, follow the procedure in Enabling Indexing.

Writing URL Patterns for Google Apps Content


When you enable indexing, the search appliance configures itself to crawl Google Apps content. It does not display URLs
for Google Docs or Google Sites in the Admin Console.

Do not enter any URLs for Google Apps in Start Crawling from the Following URLs or Follow and Crawl Only URLs
with the Following Patterns on the Crawl and Index > Crawl URLs page. Specifying Google Apps start and follow
URLs can cause the search appliance to perform unnecessary crawling and might cause Google to block your search
appliance. However, as a search appliance administrator, you may need to enter Google Apps URL patterns to manage
crawl or serve.

Example URL Patterns

4 sur 12 03/08/2010 11:22


Integrating with Google Apps - Google Search Appliance - Google Code http://code.google.com/intl/fr/apis/searchappliance/documentation/64/i...

Example URL Patterns


The following table lists example URL patterns for types of content in a Google Apps domain. For individual items in a
Google Apps domain, such as a specific document, copy the URL from a listing on a search results page and paste it in
the Admin Console page where you are working. Google Apps supports both public (http) and secure (https) sites.

Content URL Patterns

All documents, docs.google.com/


presentations, and and
spreadsheets in a domain spreadsheets.google.com/

All documents in a domain ^http://docs.google.com/a/domain_name.com/View


and
^https://docs.google.com/a/domain_name.com/View

A specific document in a The full URL of the document, for example:


domain http://docs.google.com/a/domain_name.com
/Doc?docid=dg4sgjw7_18cp3vsbfz&hl=en
or
https://docs.google.com/a/domain_name.com
/Doc?docid=dg4sgjw7_18cp3vsbfz&hl=en

All presentations in a ^http://docs.google.com/a/domain_name.com/SimplePresentationView


domain and
^https://docs.google.com/a/domain_name.com/SimplePresentationView

A specific presentation in The full URL of the presentation, for example:


a domain ^http://docs.google.com/a/domain_name.com
/Presentation?docid=dg4sgjw7_0d5m8vzgw&hl=en
or
^https://docs.google.com/a/domain_name.com
/Presentation?docid=dg4sgjw7_0d5m8vzgw&hl=en

All spreadsheets in a spreadsheets.google.com/


domain

A specific spreadsheet in The full URL of the spreadsheet, for example:


a domain ^http://docs.google.com/a/domain_name.com
/ccc?key=pugnm4XXrq5DeFcreLXRibQ&hl=en
or
^https://docs.google.com/a/domain_name.com
/ccc?key=pugnm4XXrq5DeFcreLXRibQ&hl=en

All sites in a domain sites.google.com/

A specific site in a domain The URL of the site, for example:


^http://sites.google.com/a/domain_name.com/site_name/Home
or
^https://sites.google.com/a/domain_name.com/site_name/Home

Managing Crawl of Google Apps Content


The Google Search Appliance crawls content in a Google Apps domain the same way that it crawls other content. For
general information about how the Google Search Appliance crawls content, refer to Administering Crawl.

In continuous crawl mode, the search appliance crawls Google Apps (and other) content at all times, ensuring that newly
added or updated content is added to the index as quickly as possible. The starting point for crawling Google Apps is the
docs publish index, which is updated once a day.

The search appliance can automatically determine URLs that often change and should be crawled frequently and URLs
that seldom change and should be crawled infrequently. The search appliance can also crawl in scheduled crawl mode,
where it crawls content at a scheduled time.

For a search appliance to crawl content in a Google Apps domain, you do not need to specify any follow and crawl URL
patterns. In fact, the Google Apps crawl URLs are hidden and you cannot delete them. However, you can manage
crawling of Google Apps as described in the following sections:
Controlling Crawl
Monitoring Crawl

5 sur 12 03/08/2010 11:22


Integrating with Google Apps - Google Search Appliance - Google Code http://code.google.com/intl/fr/apis/searchappliance/documentation/64/i...

How Is Content Counted Against the License Limit?


Each document, presentation, and spreadsheet that is crawled is counted against the search appliance's license limit. For
sites, each page in a site that is crawled is counted against the license limit. Any public Google docs that are embedded in
a sites page are considered separate pages and are recrawled.

Controlling Crawl
As a search appliance administrator, you can control the content in a Google Apps domain that is crawled. To exclude
URLs from crawling, use the Crawl and Index > Crawl URLs page in the Admin Console.

Example: Excluding All Sites from a Crawl


For example, in the domain nucleotraining.com, the search appliance crawls content that is of interest to all members of
the training group. This content includes documents, presentations, and sites. However, because spreadsheets contain
information that is only of interest to course registrars, the search appliance should not crawl spreadsheets.

To exclude all spreadsheets from a crawl:


1. Click Crawl and Index > Crawl URLs.
The Crawl URLs page appears.

2. In the Do Not Crawl URLs with the Following Patterns box, type sites.google.com/.
You can also exclude an individual URLs from a crawl by typing it in this box. For information about valid Google
Apps URLs, refer to Writing URL Patterns for Google Apps Content.

3. Click Save URLs to Crawl.

For more information about controlling crawl, refer to Administering Crawl. For information about using the Crawl and
Index > Crawl URLs page, click Help Center > Crawl and Index > Crawl URLs in the Admin Console.

Monitoring Crawl
While the search appliance is crawling, you can monitor a crawl's history on the Status and Reports > Crawl
Diagnostics page in the Admin Console.

When this page first appears, it shows the crawl history for the current domain. From the domain level, you can navigate to
lower levels that show crawl history for Google Apps URLs. URLs for content in Google Apps domains follow the patterns
described in Writing URL Patterns for Google Apps Content.

For domain crawling, "Unknown" or "Crawled with empty body: Disallowed by robots.txt" crawl statuses do not indicate
errors.

Monitoring Information for Hierarchical Levels


The following table lists the hierarchical levels that you can navigate to and describes the information that the Status and
Reports > Crawl Diagnostics page displays at each level.

Level Page Shows

Domain The number of URLs that have been crawled in all Google Apps hosts in the domain plus other
information. Hosts include docs.google.com and sites.google.com.

Host The number of URLs that have been crawled in the selected Google Apps host plus other
information. For example, this level shows information for http://sites.google.com.

Directory The crawl status for the Google Apps directory (http://sites.google.com/a/) or subdirectories
(http://sites.google.com/domain/...).

URL Detailed information about the crawled URL and a crawl history for the URL. You can also use
this page to recrawl the current URL.

Example: Monitoring the Crawl of a Site


For example, suppose you want to monitor crawling of the site NucleoTraining in the nucleotraining.com domain. To
monitor NucleoTraining, navigate to the Crawl Diagnostics page for the site's URL:

6 sur 12 03/08/2010 11:22


Integrating with Google Apps - Google Search Appliance - Google Code http://code.google.com/intl/fr/apis/searchappliance/documentation/64/i...

1. Click Status and Reports > Crawl Diagnostics.


The domain-level Crawl Diagnostics page appears.

2. Click sites.google.com in the Host Name column in the All Hosts table.
The host-level Crawl Diagnostics page appears.

3. Click http://sites.google.com in the Host Name column.


The directory-level Crawl Diagnostics page appears.

4. Click folders in the File/Directory column to navigate down the directory structure to the URL-level page for
/a/nucleotraining/NucleoTraining.
The URL-level Crawl Diagnostics page for http://sites.google.com/a/nucleotraining/NucleoTraining appears.

5. Use the URL-level Crawl Diagnostics page to monitor the crawl history for the NucleoTraining site.

For more information about monitoring a crawl, refer to Administering Crawl. For information about using the Status and
Reports > Crawl Diagnostics page, click Help Center > Status and Reports > Crawl Diagnostics in the Admin
Console.

Managing Serve of Google Apps Content


After a search appliance has indexed Google Apps content, it can return search results from a Google Apps domain to
users. The following figure illustrates search results from a Google Apps domain.

You can manage serving content from a Google Apps domain the same way you manage serving other crawled content.
For general information about how to manage serve, refer to Creating the Search Experience.

Google Apps Icons in Search Results


In listings, search results from Google Apps services are identified by icons, as illustrated in the following table.

Icon Identifies Result From

Google Docs--document

Google Docs--presentation

Google Docs--spreadsheet

Google Sites--site

Managing Serve with Front Ends


The framework for managing how the search appliance serves content from a Google Apps domain (or any source) to
users is the front end. There are several search appliance features associated with a front end, including features that

7 sur 12 03/08/2010 11:22


Integrating with Google Apps - Google Search Appliance - Google Code http://code.google.com/intl/fr/apis/searchappliance/documentation/64/i...

refine search results. With a single search appliance, you can create and deploy an unlimited number of front ends. This
means that you can customize how the search appliance serves content from a Google Apps domain to different types of
users.

To create a front end, use the Serving > Front Ends page in the Admin Console. For complete information about the
Front Ends page, click Help Center >Serving> Front Ends in the Admin Console. For more information about using
front ends, refer to Creating the Search Experience.

Example: Using Two Front Ends


For example, in a search appliance that has indexed nucleotraining.com, you might create two front ends:
NucleoGroupMembers front end for members of the nucleo training group
NucleoStudents front end for students who attend the training

In the NucleoGroupMembers front end, the search appliance serves all content in the nucleotraining.com domain. In the
NucleoStudents front end, the search appliance only serves content that is appropriate for students, including course
descriptions, course modules, course presentations, and class schedules.

Using Front End Features to Manage Serve


The following table lists front end features that help you manage how content from a Google Apps domain is served to
users. For more information about using a feature to manage serve, refer to the section listed in the Reference column.

Front End Feature Reference

Remove URLs Preventing Google Apps content from appearing in search


results

Removing Google Apps content from the search index

Result biasing Biasing Google Apps content in search results

Filters Filtering Google Apps content in search results

Managing Serve with Collections


A collection is another search appliance feature that can help you manage serve of Google Apps content. Collections are
independent of front ends. However, you can use a custom front end with a specific collection to help improve searches
and enhance results. For more information about using collections, refer to Creating Collections of Google Apps content.

Preventing Google Apps Content from Appearing in Search Results


For any front end, you can prevent URLs that match specific patterns from appearing in search results. To prevent URLs
from appearing in search results, use the Serving > Front Ends > Remove URLs page in the Admin Console. Because
removing URLs is specific to a front end, it can be aimed at a specific type of user, as shown in the following example.

Example: Preventing Results for a Site from Appearing in Search Results


The NucleoTraining site contains sensitive information about internal team projects, issues, and events. In the
NucleoStudents front end, the URL for the NucleoTraining site should not appear; you need to prevent this URL from
appearing in search results. In the front end for team members, you do not need to remove this URL.

To prevent the URL for the NucleoTraining site from appearing in search results in the NucleoStudents front end:
1. Click Serving > Front Ends.
The Front Ends page appears.

2. Click the Edit link next to the NucleoStudents front end.


The Output Format page appears.

3. Click the Remove URLs tab.


The Remove URLs page appears.

4. In the Remove URLs matching the following patterns from all search results box, type the URL of the
NucleoTraining site.
For information about valid Google Apps URLs, refer to Writing URL Patterns for Google Apps Content.

8 sur 12 03/08/2010 11:22


Integrating with Google Apps - Google Search Appliance - Google Code http://code.google.com/intl/fr/apis/searchappliance/documentation/64/i...

5. Click Update List of Removed URLs.

For more information about removing URLs, refer to Creating the Search Experience. For information about using the
Serving > Front Ends > Remove URLs page, click Help Center > Serving > Front Ends > Remove URLs in the
Admin Console.

Biasing Google Apps Content in Search Results


You can affect the order of Google Apps content in search results by using source biasing, which is a type of result
biasing. Source biasing enables you to boost or weaken the relevancy score of content in the search index based on URL
patterns. Boosting a score usually moves a document up in the result listings. Weakening a score usually moves it down.

Set up source biasing by performing the following tasks:


1. Creating a result biasing policy by using the Serving > Result Biasing page in the Admin Console.
2. Configuring the result biasing policy by using the Serving > Result Biasing Edit page in the Admin Console.
3. Enabling the result biasing policy by using the Serving > Front Ends > Filters page in the Admin Console.

For more information about result biasing and source biasing, refer to Creating the Search Experience.

Example: Creating a Result Biasing Policy


For example, in the nucleotraining.com domain, the team site contains important information that members of the team
collaborate on keeping up-to-date. In the NucleoGroupMembers front end, you might want to boost the relevancy scores
for sites. To do this, you might create and configure a result biasing policy named Site and then select it for use with the
NucleoGroupMembers front end. Because you can create unlimited front ends for a search appliance, you might have a
different result biasing policy for each front end.

To create the Site result biasing policy:


1. Click Serving > Result Biasing.
The Result Biasing page appears.

2. In the Result Biasing Name text box, type Sites.


3. Click Create Policy.
The new policy's name, Site, appears in the list of result biasing policies and is selected.

For more information about using the Serving > Result Biasing page, click Help Center > Serving > Result Biasing.

Example: Configuring a Result Biasing Policy


To configure the Site result biasing policy:
1. On the Serving > Result Biasing page, click the Edit link next to the Site result biasing policy.
The Serving > Result Biasing Edit page appears.

2. Under How much influence should source biasing have?, click the highest setting.
3. In the URL Pattern boxes, type the URL pattern for spreadsheets, sites.google.com.
For information about valid Google Apps URLs, refer to Writing URL Patterns for Google Apps Content.

4. In the Strength boxes, select Strong Increase for docs.google.com.


5. Click Save Settings.

For more information about using the Serving > Result Biasing Edit page, click Help Center > Serving > Result
Biasing Edit .

Example: Enabling a Result Biasing Policy


To enable the Site result biasing policy, apply it to the NucleoGroupMembers front end by performing the following steps:
1. Click Serving > Front Ends.
The Front Ends page appears.

2. Click the Edit link next to the NucleoGroupMembers front end.


The Output Format page appears.

3. Click the Filters tab.


The Filters page appears.

4. In the Result Biasing Policy pull-down list, select the result biasing policy name, Site.

9 sur 12 03/08/2010 11:22


Integrating with Google Apps - Google Search Appliance - Google Code http://code.google.com/intl/fr/apis/searchappliance/documentation/64/i...

4. In the Result Biasing Policy pull-down list, select the result biasing policy name, Site.
5. Click Save Settings.

For information about using the Serving > Filters page, click Help Center > Serving > Filters.

Filtering Google Apps Content in Search Results


As an administrator, you can create custom filters for a front end to ensure that the search appliance serves appropriate
results to users. The search appliance provides different types of filters, including domain, language, file type, and meta
tag. To ensure that the search appliance only serves results from a Google Apps domain with a front end, use a domain
filter. To create a domain filter, use the Serving > Front Ends > Filters page in the Admin Console.

Example: Filtering Content by Domain


For example, suppose nucleotraining.com is one of many domains at Nucleo Worldwide Systems. Other domains include
www.nucleoworldwidesystems.com, and www.nucleoworldwidesystems.com.uk. However, with the
NucleoGroupMembers front end, you want the search appliance to serve results from nucleotraining.com only. To do this,
you create a domain filter. Because you can create unlimited front ends for a search appliance, you might create different
domain filters for different front ends.

To create a domain filter for nucleotraining.com:


1. Click Serving > Front Ends.
The Front Ends page appears.

2. Click the Edit link next to the NucleoGroupMembers front end.


The Output Format page appears.

3. Click the Filters tab.


The Filters page appears.

4. In the Domain Filter box, type nucleotraining.com.


5. Click Save Settings.

For more information about using filters, refer to Creating the Search Experience. For information about using the Serving
> Front Ends > Filters page, click Help Center > Serving > Front Ends > Filters in the Admin Console.

Removing Google Apps URLs from the Search Index


After a search appliance crawls a URL, it adds the URL to the search index, where it can be served to users in search
results. However, there might be one or more Google Apps URLs that you want to remove from the search index. To
remove a URL from the search index, use the Crawl and Index > Crawl URLs page in the Admin Console.

Example: Remove a Site from a Search Index


For example, suppose the nucleotraining.com domain contains an obsolete site called NucleoCoursewareAlpha and you
want to remove it from the search index.

To remove the NucleoCoursewareAlpha site from the search index:


1. Click Crawl and Index > Crawl URLs.
The Crawl URLs page appears.

2. In the Do Not Crawl URLs with the Following Patterns box, type the URL for the site, http://sites.google.com
/a/nucleotraining.com/nucleo-courseware-alpha/Home.
For information about valid Google Apps URLs, refer to Writing URL Patterns for Google Apps Content.

3. Click Update List of Removed URLs.

For more information about removing URLs from the search index, refer to Administering Crawl. For information about
using the Crawl and Index > Crawl URLs page, click Help Center > Crawl and Index > Crawl URLs in the Admin
Console.

Creating Collections of Google Apps Content


As a search appliance administrator, you can create collections, which are subsets of the search index. A collection lets
users:
Search over a specific part of the index

10 sur 12 03/08/2010 11:22


Integrating with Google Apps - Google Search Appliance - Google Code http://code.google.com/intl/fr/apis/searchappliance/documentation/64/i...

Narrow searches
Get relevant results more quickly and efficiently than searching the entire index

Example: Creating a Collection of Google Apps


For example, you might create a collection called GoogleApps that contains documents, presentations, spreadsheets, and
sites. To narrow a search to include only documents, presentations, spreadsheets, and sites, and exclude all other
content, a user could select the GoogleApps collection on the search page. All the results served from this collection
would be from a Google Apps domain. To create a collection, use the Crawl and Index > Collections page in the Admin
Console.

To create the GoogleApps collection:


1. Click Crawl and Index > Collections.
The Collections page appears.

2. Under Create New Collection section, type GoogleApps in the Collection Name box.
Either leave the Use default configuration option selected or click the Import configuration from file option.

3. Click Create Collection.


The new collection's name, GoogleApps, appears in the list of collections and is selected.

4. Click the Edit link next to the GoogleApps collection.


The GoogleApps collection page page appears.

5. In the Include Content Matching the Following Patterns box, type the following Google Apps URLs, pressing
Enter after each one:
docs.google.com/
sites.google.com/
For information about valid Google Apps URLs, refer to Writing URL Patterns for Google Apps Content.

6. Click Save Collection Definition.

For more information about collections, refer to Creating the Search Experience. For information about using the Crawl
and Index > Collections page, click Help Center > Crawl and Index > Collections in the Admin Console.

Importing or Exporting Configurations for Indexing Google Apps Content


You import or export a configuration for indexing Google Apps content by importing or exporting the configuration file for a
search appliance. A search appliance configuration file contains information about front end configuration, collections, and
general parameters in XML format. The default name for the configuration file is config.xml.

The search appliance configuration file contains the following information about indexed Google Apps content:
The Google Apps domain name
Crawl URLs that the Google Search Appliance configures for itself

To import or export a configuration file, use the Administration > Import/Export in the Admin Console. For information
about using this page, click Help Center > Administration > Import/Export in the Admin Console.

Exporting a Configuration File


When the Google Search Appliance configures itself, it creates a Google Apps role account and password that it uses to
access Google Apps. An exported configuration file does not include Google Apps role account credentials.

To export a configuration file:


1. Click Administration > Import/Export.
The Import/Export page appears.

2. Under Export configuration, type a passphrase in the Enter Import/Export Passphrase box.
Usually, a passphrase is the same as the Admin Console password.

3. Retype the passphrase in the Confirm Passphrase box.


4. Click Export Configuration.
5. Save the configuration file by using the download options displayed by your browser.

Importing a Configuration File

11 sur 12 03/08/2010 11:22


Integrating with Google Apps - Google Search Appliance - Google Code http://code.google.com/intl/fr/apis/searchappliance/documentation/64/i...

Importing a Configuration File


When you import a configuration file, the current settings for indexing Google Apps content on a search appliance might or
might not be preserved, depending on settings in both the file and the search appliance.

To import a configuration file:


1. Click Administration > Import/Export.
The Import/Export page appears.

2. Under Import configuration, type a filename in the Filename box or click Browse to find the file on your network.
3. In the Import/Export Passphrase box, type the passphrase used for importing and exporting.
4. Click Import Configuration.

If your configuration is complex, the import process can be very slow. A configuration that contains multiple megabytes of
data, has hundreds of front ends, or creates hundreds of collections can require over an hour to import.

How Do Settings in a File Affect a Search Appliance?


Refer to the following table for information about how settings for indexing Google Apps Content in a configuration file
affect the indexing Google Apps content settings on a search appliance when you import a file.

In the Configuration On the Search When You Import the Configuration File
File Appliance

Indexing Google Apps Indexing Google Apps The search appliance prompts you to
Content is enabled for a Content is disabled. enable Indexing Google Apps Content.
specific domain, for
example, domain1.com. Indexing Google Apps Indexing Google Apps Content continues to
Content is enabled for the be enabled.
same domain
(domain1.com).

Indexing Google Apps The search appliance prompts you to


Content is enabled for a enable Indexing Google Apps Content. The
different domain domain in the configuration file overrides
(domain2.com). the domain on the search appliance.

Indexing Google Apps Indexing Google Apps The search appliance disables Indexing
Content is disabled. Content is enabled. Google Apps Content.

©2010 Google - Code Home - Terms of Service - Privacy Policy - Site Directory

Google Code offered in: English - Español - 日本語 - - Português - Pусский - 中文(简体) - 中文(繁體)

12 sur 12 03/08/2010 11:22

You might also like