Professional Documents
Culture Documents
Integrating With Google Apps GSA
Integrating With Google Apps GSA
With Indexing Google Apps content, you can use your Google Search Appliance to search your domain's Google Apps
content. In a few clicks, you can enable your search appliance to crawl, index, and serve public Google Sites and Google
Docs.
Contents
About This Document
Example Google Apps Domain
Overview
Integrated Google Apps Services
Prerequisites for Indexing Google Apps Content
What Is Public Google Apps Content?
Enabling or Disabling Indexing of Google Apps Content
Enabling Indexing
Disabling Indexing
Writing URL Patterns for Google Apps Content
Example URL Patterns
Managing Crawl of Google Apps Content
How Is Content Counted Against the License Limit?
Controlling Crawl
Monitoring Crawl
Managing Serve of Google Apps Content
Google Apps Icons in Search Results
Managing Serve with Front Ends
Managing Serve with Collections
Preventing Google Apps Content from Appearing in Search Results
Biasing Google Apps Content in Search Results
Filtering Google Apps Content in Search Results
Removing Google Apps URLs from the Search Index
Creating Collections of Google Apps Content
Importing or Exporting Configurations for Indexing Google Apps Content
Exporting a Configuration File
Importing a Configuration File
This document provides information about how to serve content from your Google Apps domain along with those from a
Google Search Appliance. This document is intended for search appliance administrators who need to understand:
How the search appliance indexes Google Apps content
How to enable indexing of Google Apps content
This document also provides guidance for importing or exporting a search appliance configuration for indexing Google
Apps content, as well as controlling crawl and managing serving content in a Google Apps domain. The following table
lists the major sections in this document.
Section Describes
Enabling or Disabling Indexing Google How to enable a search appliance to index Google Apps content
Apps Content
Importing or Exporting Configurations for How to export a search appliance configuration that includes
Indexing Google Apps Content indexing Google Apps content and what happens to a search
appliance when you import such a configuration
Writing URL Patterns for Google Apps How to write valid URL patterns for managing crawl and serve of
Content content in a Google Apps domain
Managing Crawl of Google Apps How to control and monitor crawl of content in a Google Apps
Content domain
Managing Serve of Google Apps How to manipulate Google Apps URLs in search results and
Content remove them from the search index
Overview
Google Apps provide your organization with tools for collaborating on documents, spreadsheets, presentations, and sites.
As a search appliance administrator, you can integrate your search appliance with Google Apps. This integration enables
the search appliance to crawl, index, and serve results from a Google Apps domain's public Google Docs and Google
Sites content.
The following steps provide an overview of the entire process of integrating a search appliance with Google Apps:
1. Your organization signs up for Google Apps and identifies a Google Apps domain and administrator.
2. Using Google Apps, users create documents, spreadsheets, presentations, and sites in the domain.
3. You, as search appliance administrator, enable indexing Google Apps content for the domain on a search appliance.
4. The search appliance crawls public content from Google Apps in the domain.
5. When a user enters a search query in a search appliance front end, the search appliance serves results that might
include content from the Google Apps domain, depending on the query.
Gmail No
Google Calendar No
To sign up for Google Apps, visit the Google Apps editions page. When you sign up for Google Apps, you select:
A domain for your organization's Google Apps
A Google Apps administrator username
A Google Apps administrator password
Each Google Apps domain can have multiple administrators. As a search appliance administrator, you must have a domain
name, a Google Apps administrator username, and a password to enable indexing Google Apps.
For more information about publishing Google Docs, refer to the Google Docs Help Center.
As a search appliance administrator, you do not need to perform any configuration tasks to index Google Apps content.
You only need to enable indexing Google Apps content by using the Google Apps > Indexing Google Apps Content
page in the Admin Console.
When you enable indexing Google Apps content, the search appliance configures itself to:
Authenticate to Google Apps
Crawl your domain's public Google Docs and Google Sites content
For information about using the Indexing Google Apps Content page, click Help Center > Google Apps > Indexing
Google Apps Content in the Admin Console.
Enabling Indexing
For one search appliance, you can enable indexing Google Apps content for Google Apps domain. Each time you enable
indexing Google Apps content, the search appliance downloads the latest list of Google Apps services and access control
policies. Check this document for a description of the latest functionality. To enable indexing Google Apps content, you
need the following information for the Google Apps domain:
A domain name
An administrator username
An administrator password
For information about how to get this information, refer to Prerequisites for Indexing Google Apps Content.
Each time you enable indexing Google Apps content, the search appliance downloads the latest list of Google Apps
services and access control policies. Check this document for descriptions of the latest functionality.
Even if you change the administrator password or the administrator account is deleted, the indexing continues to work.
Disabling Indexing
To disable indexing Google Apps content:
1. Click Google Apps > Indexing Google Apps Content.
The Indexing Google Apps Content page appears.
2. Click Disable.
Do not enter any URLs for Google Apps in Start Crawling from the Following URLs or Follow and Crawl Only URLs
with the Following Patterns on the Crawl and Index > Crawl URLs page. Specifying Google Apps start and follow
URLs can cause the search appliance to perform unnecessary crawling and might cause Google to block your search
appliance. However, as a search appliance administrator, you may need to enter Google Apps URL patterns to manage
crawl or serve.
In continuous crawl mode, the search appliance crawls Google Apps (and other) content at all times, ensuring that newly
added or updated content is added to the index as quickly as possible. The starting point for crawling Google Apps is the
docs publish index, which is updated once a day.
The search appliance can automatically determine URLs that often change and should be crawled frequently and URLs
that seldom change and should be crawled infrequently. The search appliance can also crawl in scheduled crawl mode,
where it crawls content at a scheduled time.
For a search appliance to crawl content in a Google Apps domain, you do not need to specify any follow and crawl URL
patterns. In fact, the Google Apps crawl URLs are hidden and you cannot delete them. However, you can manage
crawling of Google Apps as described in the following sections:
Controlling Crawl
Monitoring Crawl
Controlling Crawl
As a search appliance administrator, you can control the content in a Google Apps domain that is crawled. To exclude
URLs from crawling, use the Crawl and Index > Crawl URLs page in the Admin Console.
2. In the Do Not Crawl URLs with the Following Patterns box, type sites.google.com/.
You can also exclude an individual URLs from a crawl by typing it in this box. For information about valid Google
Apps URLs, refer to Writing URL Patterns for Google Apps Content.
For more information about controlling crawl, refer to Administering Crawl. For information about using the Crawl and
Index > Crawl URLs page, click Help Center > Crawl and Index > Crawl URLs in the Admin Console.
Monitoring Crawl
While the search appliance is crawling, you can monitor a crawl's history on the Status and Reports > Crawl
Diagnostics page in the Admin Console.
When this page first appears, it shows the crawl history for the current domain. From the domain level, you can navigate to
lower levels that show crawl history for Google Apps URLs. URLs for content in Google Apps domains follow the patterns
described in Writing URL Patterns for Google Apps Content.
For domain crawling, "Unknown" or "Crawled with empty body: Disallowed by robots.txt" crawl statuses do not indicate
errors.
Domain The number of URLs that have been crawled in all Google Apps hosts in the domain plus other
information. Hosts include docs.google.com and sites.google.com.
Host The number of URLs that have been crawled in the selected Google Apps host plus other
information. For example, this level shows information for http://sites.google.com.
Directory The crawl status for the Google Apps directory (http://sites.google.com/a/) or subdirectories
(http://sites.google.com/domain/...).
URL Detailed information about the crawled URL and a crawl history for the URL. You can also use
this page to recrawl the current URL.
2. Click sites.google.com in the Host Name column in the All Hosts table.
The host-level Crawl Diagnostics page appears.
4. Click folders in the File/Directory column to navigate down the directory structure to the URL-level page for
/a/nucleotraining/NucleoTraining.
The URL-level Crawl Diagnostics page for http://sites.google.com/a/nucleotraining/NucleoTraining appears.
5. Use the URL-level Crawl Diagnostics page to monitor the crawl history for the NucleoTraining site.
For more information about monitoring a crawl, refer to Administering Crawl. For information about using the Status and
Reports > Crawl Diagnostics page, click Help Center > Status and Reports > Crawl Diagnostics in the Admin
Console.
You can manage serving content from a Google Apps domain the same way you manage serving other crawled content.
For general information about how to manage serve, refer to Creating the Search Experience.
Google Docs--document
Google Docs--presentation
Google Docs--spreadsheet
Google Sites--site
refine search results. With a single search appliance, you can create and deploy an unlimited number of front ends. This
means that you can customize how the search appliance serves content from a Google Apps domain to different types of
users.
To create a front end, use the Serving > Front Ends page in the Admin Console. For complete information about the
Front Ends page, click Help Center >Serving> Front Ends in the Admin Console. For more information about using
front ends, refer to Creating the Search Experience.
In the NucleoGroupMembers front end, the search appliance serves all content in the nucleotraining.com domain. In the
NucleoStudents front end, the search appliance only serves content that is appropriate for students, including course
descriptions, course modules, course presentations, and class schedules.
To prevent the URL for the NucleoTraining site from appearing in search results in the NucleoStudents front end:
1. Click Serving > Front Ends.
The Front Ends page appears.
4. In the Remove URLs matching the following patterns from all search results box, type the URL of the
NucleoTraining site.
For information about valid Google Apps URLs, refer to Writing URL Patterns for Google Apps Content.
For more information about removing URLs, refer to Creating the Search Experience. For information about using the
Serving > Front Ends > Remove URLs page, click Help Center > Serving > Front Ends > Remove URLs in the
Admin Console.
For more information about result biasing and source biasing, refer to Creating the Search Experience.
For more information about using the Serving > Result Biasing page, click Help Center > Serving > Result Biasing.
2. Under How much influence should source biasing have?, click the highest setting.
3. In the URL Pattern boxes, type the URL pattern for spreadsheets, sites.google.com.
For information about valid Google Apps URLs, refer to Writing URL Patterns for Google Apps Content.
For more information about using the Serving > Result Biasing Edit page, click Help Center > Serving > Result
Biasing Edit .
4. In the Result Biasing Policy pull-down list, select the result biasing policy name, Site.
4. In the Result Biasing Policy pull-down list, select the result biasing policy name, Site.
5. Click Save Settings.
For information about using the Serving > Filters page, click Help Center > Serving > Filters.
For more information about using filters, refer to Creating the Search Experience. For information about using the Serving
> Front Ends > Filters page, click Help Center > Serving > Front Ends > Filters in the Admin Console.
2. In the Do Not Crawl URLs with the Following Patterns box, type the URL for the site, http://sites.google.com
/a/nucleotraining.com/nucleo-courseware-alpha/Home.
For information about valid Google Apps URLs, refer to Writing URL Patterns for Google Apps Content.
For more information about removing URLs from the search index, refer to Administering Crawl. For information about
using the Crawl and Index > Crawl URLs page, click Help Center > Crawl and Index > Crawl URLs in the Admin
Console.
Narrow searches
Get relevant results more quickly and efficiently than searching the entire index
2. Under Create New Collection section, type GoogleApps in the Collection Name box.
Either leave the Use default configuration option selected or click the Import configuration from file option.
5. In the Include Content Matching the Following Patterns box, type the following Google Apps URLs, pressing
Enter after each one:
docs.google.com/
sites.google.com/
For information about valid Google Apps URLs, refer to Writing URL Patterns for Google Apps Content.
For more information about collections, refer to Creating the Search Experience. For information about using the Crawl
and Index > Collections page, click Help Center > Crawl and Index > Collections in the Admin Console.
The search appliance configuration file contains the following information about indexed Google Apps content:
The Google Apps domain name
Crawl URLs that the Google Search Appliance configures for itself
To import or export a configuration file, use the Administration > Import/Export in the Admin Console. For information
about using this page, click Help Center > Administration > Import/Export in the Admin Console.
2. Under Export configuration, type a passphrase in the Enter Import/Export Passphrase box.
Usually, a passphrase is the same as the Admin Console password.
2. Under Import configuration, type a filename in the Filename box or click Browse to find the file on your network.
3. In the Import/Export Passphrase box, type the passphrase used for importing and exporting.
4. Click Import Configuration.
If your configuration is complex, the import process can be very slow. A configuration that contains multiple megabytes of
data, has hundreds of front ends, or creates hundreds of collections can require over an hour to import.
In the Configuration On the Search When You Import the Configuration File
File Appliance
Indexing Google Apps Indexing Google Apps The search appliance prompts you to
Content is enabled for a Content is disabled. enable Indexing Google Apps Content.
specific domain, for
example, domain1.com. Indexing Google Apps Indexing Google Apps Content continues to
Content is enabled for the be enabled.
same domain
(domain1.com).
Indexing Google Apps Indexing Google Apps The search appliance disables Indexing
Content is disabled. Content is enabled. Google Apps Content.
©2010 Google - Code Home - Terms of Service - Privacy Policy - Site Directory
Google Code offered in: English - Español - 日本語 - - Português - Pусский - 中文(简体) - 中文(繁體)