Pre 5 Midterm Reviewer Nerfed

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

SEARCH ENGINE RESULTS PAGE (SERP): the list of links and other descriptive information about

webpages returned by a search engine in response to a search query

KEYWORDS: in the context of web search or SEO, words or phrases that describe the content on a
webpage. Search engines use keywords to match webpages with user search queries

Traffic from search engines comes from two basic sources:

1. organic search results are heavily influenced by website content and design features (e.g., the
technology used to create the site, webpage coding, etc.)
2. paid search advertisements/traffic is the result of ads that appear on SERPs

SEARCH ENGINE OPTIMIZATION (SEO): a collection of strategies and techniques designed to increase
the number of visitors to a website as a result of the website’s rank on search engine results pages.

BRANDED SEARCH QUERIES - traffic derived from consumers who included the word “Nike” in their
search queries

SEARCH ENGINE: an application for locating webpages or other content (e.g., documents, media files)
on a computer network. Popular web-based search engines include Google, Bing and Yahoo

SPIDERS: also known as CRAWLERS, WEB BOTS OR SIMPLY “BOTS,” are small computer programs
designed to perform automated, repetitive tasks over the Internet. Used by search engines for scanning
webpages and returning information to be stored in a page repository

Different IR services for finding web content:

 CRAWLER SEARCH ENGINES rely on sophisticated computer programs called “spiders,”


“crawlers,” or “bots” that surf the Internet, locating webpages, links, and other content that are
then stored in the search engine’s page repository. The most popular commercial search
engines, Google and Bing, are based on crawler technology.
 WEB DIRECTORIES are categorized listings of webpages created and maintained by humans.
Examples of popular directories include Open Directory Project (dmoz.org), Best of the Web
(botw.org), and Looksmart.com.
 HYBRID SEARCH ENGINES combine the results of a directory created by humans and results
from a crawler search engine, with the goal of providing both accuracy and broad coverage of
the Internet. Yahoo.com, the third most popular commercial search tool, uses a hybrid
approach
 META-SEARCH ENGINES compile results from other search engines. For instance, Dogpile.com
 SEMANTIC SEARCH ENGINES are designed to locate information based on the nature and
meaning of Web content, not simple keyword matches.

PAGE REPOSITORY: a data structure that stores and manages information from a large number of
webpages, providing a fast and efficient means for accessing and analyzing the information at a later
time

CRAWLER CONTROL MODULE: a software program that controls a number of “spiders” responsible for
scanning or crawling through information on the Web

Description is based on publications by Grehan (2002) and Oak (2008):


 The indexer module creates look-up tables by extracting words from the webpages and
recording the URL where they were found
 The collection analysis module creates utility indexes that aid in providing search results.
 The retrieval/ranking module determines the order in which pages are listed in a SERP
 The query interface is where users enter words that describe the kind of information they are
looking for

PETABYTE: a unit of measurement for digital data storage. A petabyte is equal to one million gigabytes.

ACCESS CONTROL, limiting access to certain documents or data.

Search technology impacts business in each of the following ways:

 Enterprise search—finding information within your organization


 Recommendation engines—presenting information to users without requiring them to conduct
an active search
 Search engine marketing (SEM)—getting found by consumers on the Web
 Web search—finding crucial business information online

ENTERPRISE SEARCH

 tools are used by employees to search for and retrieve information related to their work in a
manner that complies with the organization’s information-sharing and access control policies

Data exist in two formats

1. Structured data can be defined as “information with a high degree of organization, such that
inclusion in a relational database is seamless and readily searchable by simple, straightforward
search engine algorithms or other search operations”
2. Unstructured data, sometimes called messy data, refers to information that is not organized in a
systematic or predefined way. Examples of unstructured data include e-mails, articles, books,
and documents

Vendors can be broken down into the following three categories:

1. Specialized search vendors (for instance, Attivio, Endeca, Vivisimo): Software designed to target
specific user information needs
2. Integrated search vendors (for instance, Autonomy, IBM, Microsoft): Software designed to
combine search capabilities with information management tools
3. Detached search vendors (for instance, Google, ISYS): Software designed to target flexibility and
ease of us

RECOMMENDATION ENGINES

 represent an interesting twist on IR technology. It attempts to anticipate information that a user


might be interested in.

SEARCH ENGINE MARKETING (SEM):

 a collection of online marketing strategies and tactics that promote brands by increasing their
visibility in SERPs through optimization and advertising
People generally engage in three basic types of searches

1. Informational search—using search engines to conduct research on a topic. This is the most
common type of search.
2. Navigational search—using a search engine to locate particular websites or webpages.
3. Transactional search—using a search engine to determine where to purchase a product or
service

SEM strategies and tactics produce two different, but complementary outcomes

1. Organic search listings are the result of content and website design features intended to
improve a site’s ranking on SERPs that result from specific keyword queries
2. Paid search listings are a form of advertising and are purchased from search engine companies.
Often referred to as pay-per-click (PPC) advertising

SOCIAL MEDIA OPTIMIZATION refers to strategies designed to enhance a company’s standing on


various social media sites

CLICK-THROUGH RATE: the percentage of people who click on a hyperlinked area of a SERP or webpage.

Google features:

 Focused search: You can focus your search to information in different formats—webpages,
videos, images, maps, and the like—by selecting the appropriate navigation button on the SERP
page.
 Filetype: If you are looking specifically for information contained in a certain file format, you can
use the “filetype:[file extension]” command following your keyword query
 Advanced search: To narrow your search. From this page, you can set a wide range of
parameters for your search, including limiting the search to certain languages, dates, and even
reading level.
 Search tools button: Allows you to narrow your results to listings from specific locations or time
frames
 Search history: It’s possible to review your search history. It will show you not only your search
queries, but also the pages you visited following each query

Real Time search tools: information about things as they happen.

 Google Trends—Trends (google.com/trends) will help you identify current and historical interest
in the topic by reporting the volume of search activity over time
 Google Alerts—Alerts (google.com/alerts) is an automated search tool for monitoring new Web
content, news stories, videos, and blog posts about some topic
 Twitter Search—You can leverage the crowd of over 650 million Twitter users to find
information as well as gauge sentiment on a wide range of topics and issues in real time.

SOCIAL BOOKMARKING SEARCH Social bookmarking sites (described in Chapter 7) provide a way for
users to save links they want to access at a later time.
VERTICAL SEARCH ENGINES are programmed to focus on webpages related to a particular topic and to
drill down by crawling pages that other search engines are likely to ignore.

BACKLINKS: external links that point back to a site.

KEYWORD CONVERSION RATES, or the likelihood that using a particular keyword to optimize a page will
result in conversions (i.e., when a website visitor converts to a buyer)

Ranking factors can be grouped into at least three different categories:

1. Reputation or Popularity: In simple terms, search engines attempt to provide links to good
websites—sites that contain high-quality content
o PageRank algorithm is perhaps one of the most well-known attempts to use popularity
to determine website quality
2. Relevance: In addition to popularity, search engines attempt to determine if the content on a
webpage is relevant to what the searcher is looking for
3. User Satisfaction: Like all successful businesses, search engines want their customers to be
satisfied.

INBOUND MARKETING: an approach to marketing that emphasizes SEO, content marketing and social
media strategies to attract customers. Often viewed as an alternative to traditional marketing strategies
based on advertising and personal selling. It represents an alternative approach to traditional outbound
marketing strategies (e.g., mass media advertising) that have been used by companies historically.

BLACK HAT SEO TACTICS try to trick the search engine into thinking a website has high-quality content,
when in fact it does not.

Examples

 Link spamming, generating backlinks for the primary purpose of SEO, not adding value to the
user
 Keyword tricks, Black hat SEOs will embed several high-value keywords on pages with unrelated
content to drive up traffic statistics.
 Ghost text, This tactic involves adding text on a page that will affect page ranking
 Shadow page, Also called ghost pages or cloaked pages, this black hat tactic involves creating
pages that are optimized to attract lots of people

There are four steps to creating a PPC advertising campaign on search engines.

1. Set an overall budget for the campaign.


2. Create ads—most search engine ads are text only.
3. Select keywords associated with the campaign.
4. Set up billing account information

PPC advertisers use the following metrics to gauge the effectiveness of their campaigns:

 Click-through rates (CTRs)


 Keyword conversion
 Cost of customer acquisition (CoCA)
 Return on advertising spend (ROAS)
Web will use:

1. Context defines the intent of the user


2. Personalization refers to the user’s personal characteristics that impact how relevant the
content, commerce, and community are to an individual.
3. Vertical search, focuses on finding information in a particular content area, such as travel,
finance, legal, and medical

METADATA: data that describe and provide information about other data

SEMANTIC refers to the meaning of words or language

SEMANTIC WEB is one in which computers can interpret the meaning of content (data) by using
metadata:

RICH SNIPPETS websites optimized for semantic technology with metadata produce richer, more
attractive listings on SERPs.

Evolution of the Web

 Web 1.0 (The Initial Web) A Web of Pages - Pages or documents are “hyperlinked,” making it
easier than ever before to access connected information.
 Web 2.0 (The Social Web) A Web of Applications - Applications are created that allow people to
easily create, share, and organize information.
 Web 3.0 (The Semantic Web) A Web of Data - Information within documents or pages is tagged
with metadata, allowing users to access specific information across platforms, regardless of the
original structure of the file, page, or document that contains it. It turns the Web into one giant
database

Additional languages that have been developed by the W3C.

 Resource Description Framework (RDF),


 Web Ontology Language (OWL),
 SPARQL Protocol and RDF Query Language (SPARQL).

Practical benefits that could result from semantic search technology:

1. Related searches/queries. The engine suggests alternative search queries that may produce
information related to the original query
2. Reference results. The search engine suggests reference material related to the query
3. Semantically annotated results. Returned pages contain highlighting of search terms, but also
related words or phrases that may not have appeared in the original query
4. Search on semantic/syntactic annotations. This approach would allow a user to indicate the
“syntactic role the term plays
5. Concept search. Search engines could return results with related concepts.
6. Ontology-based search. Ontologies define the relationships between data. An ontology is based
on the concept of “triples”: subject, predicate, and object
7. Semantic Web search. This approach would take advantage of content tagged with metadata as
previously described in this section.
8. Faceted search. Faceted search provides a means of refining results based on predefined
categories called facets
9. Clustered search. This is similar to a faceted search, but without the predefined categories
10. Natural language search. Natural language search tools attempt to extract words from
questions and create a semantic representation of the query

There are three widely used approaches to creating useful recommendations:

1. Content-based filtering recommends products based on the product features of items the
customer has interacted with in the past
2. Collaborative filtering makes recommendations based on a user’s similarity to other people
3. Hybrid recommendation engines develop recommendations based on some combination of the
methodologies
 Weighted hybrid: Results from different recommenders are assigned a weight and
combined numerically to determine a final set of recommendations.
 Mixed hybrid: Results from different recommenders are presented alongside of each
other.
 Cascade hybrid: Recommenders are assigned a rank or priority
 Compound hybrid: This approach combines results from two recommender systems
from the same technique category (e.g., two collaborative filters), but uses different
algorithms or calculation procedures.

Limitations of Recommendation Engines

 Cold start or new user—Making recommendations for a user who has not provided any
information to the system is a challenge for most systems since they require a starting point or
information about the user
 Sparsity—Collaborative systems depend on having information about a critical mass of users to
compare to the target user in order to create reliable or stable recommendations.
 Limited feature content—For content filter systems to work, there must be sufficient
information available about product features and the information must exist in a structured
format so it can be read by computers
 Overspecialization—If systems can only recommend items that are highly similar to a user
profile, then the recommendations may not be useful

You might also like