6 Oracle Secure Enterprise Search Advanced Information

This chapter contains the following topics:

Troubleshooting Sources
Tuning Crawl Performance
Tuning Search Performance
Using Backup and Recovery
Integrating with Google Desktop for Enterprise
Monitoring Oracle Secure Enterprise Search
Turning On Debug Mode
Restarting Oracle Secure Enterprise Search After Rebooting

Troubleshooting Sources

This section contains the following topics:

Tips for Using Table Sources
Tips for Using File Sources
Tips for Using Mailing List Sources
Tips for Using OracleAS Portal Sources
Tips for Using User-Defined Sources
Tips for Using Federated Sources

Tips for Using Table Sources

Oracle Secu re Enterprise Search can crawl table sources in an Oracle database. To crawl non-Oracle databases, you must create a view in an Oracle database on the non-Oracle table. Then create the table source on the Oracle view. Oracle SES accesses databases using database links.

Limitations with Table Sources

Oracle SES cannot crawl tables inside the Oracle SES database.
Only one table or view can be specified for each table source. If data from more than one table or view is required, then first create a single view that encompasses all required data.
Table column mappings cannot be applied to LOB columns.
The following data types are supported for table sources: BLOB, BFILE, CLOB, CHAR, VARCHAR, VARCHAR2.

Limitations with Database Links

If the text column of the base table or view is of type BLOB or CLOB, then the table must have a ROWID column. A table or view might not have a ROWID column for various reasons, including the following:
- A view is comprised of a join of one or more tables.
- A view is based on a single table using a GROUP BY clause.
The best way to know if a table or view can be safely crawled by Oracle SES is to check for the existence of the ROWID column. To do so, run the following SQL statement against that table or view using SQL*Plus: SELECT MIN(ROWID) FROM <table or view name>;
The base table or view cannot have text columns of type BFILE or RAW.

Tips for Using File Sources

This section contains the following topics:

Crawling File Sources with Non-ASCII
Crawling File Sources with Symbolic Links
Crawling File URLs

Crawling File Sources with Non-ASCII

For file sources to successfully crawl and display multibyte environments, the locale of the machine that starts the Oracle SES server must be the same as the target file system. This way, the Oracle SES crawler can "see" the multibyte files and paths.

If the locale is different in the installation environment, then Oracle SES should be restarted from the environment with the correct locale. For example, for a Korean environment, either set LC_ALL to ko_KR or set both LC_LANG and LANG to ko_KR.KSC5601. Then run searchctl restartall from either a command prompt on Windows or an xterm on UNIX.

Crawling File Sources with Symbolic Links

When craw ling file sources on UNIX, the crawler will resolve any symbolic link to its true directory path and enforce the boun dary rule on it. For example, suppose directory /tmp/A has two children, B and C, where C is a link to /tmp2/beta. The crawl will have the following URLs:

/tmp/A
/tmp/A/B
/tmp2/beta
/tmp/A/C

If the boundary rule is /tmp/A, then /tmp2/beta will be excluded. The seed URL is treated as is.

Crawling File URLs

If a file URL is to be used "as is", without going through Oracle SES for retrieving the file, then "file" in the URL should be upper case "FILE". For example, FILE://localhost/... "As is" means that when a user clicks on the search link of the document, the browser will try to use the specified file URL on the client machine to retrieve the file. Without that, Oracle SES uses this file URL on the server machine and sends the document through HTTP to the client machine.

Tips for Using Mailing List Sources

The Oracle SES crawler is IMAP4 compliant. To crawl mailing list sources, you need an IMAP e-mail account. It is recommended to create an e-mail account that is used solely for Oracle SES to crawl mailing list messages. The crawler is configured to crawl one IMAP account for all mailing list sources. Therefore, all mailing list messages to be crawled must be found in the Inbox of the e-mail account specified on this page. This e-mail account should be subscribed to all the mailing lists. New postings for all the mailing lists will be sent to this single account and subsequently crawled.
Messages deleted from the global mailing list e-mail account are not removed from the Oracle SES index. In fact, the mailing list crawler itself will delete messages from the IMAP e-mail account as it crawls. The next time the IMAP account for mailing lists is crawled, the previous messages will no longer be there. Any new messages in the account will be added to the index (and also consequently deleted from the account). This keeps the global mailing list IMAP account clean. The Oracle SES index serves as a complete archive of all the mailing list messages.

Tips for Using OracleAS Portal Sources

An Or acleAS Portal source name cannot exceed 35 characters.
URL bound ary rules are not enforced for URL items. A URL item is the metadata that resides on the OracleAS Portal server. Oracle SES does not touch the display URL or the boundary rules for URL items.

Tips for Using User-Defined Sources

If a plug-in is to return file URLs to the crawler, then the file URLs must be fully qualified. For example, file://localhost/.
If a file URL is to be used "as is" without going through Oracle SES for retrieving the file, then "file" in the URL should be upper case "FILE". For example, FILE://localhost/...

See Also:

"Crawling File URLs"

Tips for Using Federated Sources

The Oracle SES federator caches the federator configuration (that is, all federation-related parameters including federated sources). As a result, any change in the configuration will take effect within 0 to 5 minutes.
Oracle SES supports 2-tier federated search. Federation of 3-tier or more is not currently supported.
If you entered proxy settings on the Global Settings - Proxy Settings page, then make sure to add the Web Services URL for the federated source as a proxy exception.
If the federation endpoint instance is set to secure mode 3 (require login to search secure and public content), then all documents (ACL stamped or not) are secure. For secure federated search, create a trusted entity in the federation endpoint instance, then edit the federated source with the trusted entity user name and password.

Federated Search Characteristics

Federated search can improve performance by distributing query processing on multiple machines. It can be an efficient way to scale up search service by adding a cluster of Oracle SES instances.
The federated search performance depends on the network topology and throughput of the entire federated Oracle SES environment.

Federated Search Limitations

There is a size limit of 200KB for the cached documents existing on the federation endpoint to be displayed on the Oracle SES federation broker instance.
For infosource browse, if the source hierarchies for both local and federated sources under one source group start with the same top level folder, then a sequence number is added to the folder name belonging to the federated source to distinguish the two hierarchies on the Browse page.
For federated infosource browse, a federated source should be put under an explicitly created source group.
On the Oracle SES federation broker, there is no direct access to documents on the federation endpoint through the display URL in the search result list. Only the cached version of documents is accessible. Exception: There is direct access for Web source and OracleAS Portal source documents.

See Also:

"Setting Up Secure Federated Sources" if the federated source will be searching private content
Appendix A, "10.1.6 to 10.1.8 Upgrade"

Tuning Crawl Performance

Your Web crawling strategy can be as simple as identifying a few well-known sites that are likely to contain links to most of the other intranet sites in your organization. You could test this by crawling these sites without indexing them. After the initial crawl, you have a good idea of the hosts that exist in your intranet. You could then define separate Web sources to facilitate crawling and indexing on individual sites.

However, the process of discovering and crawling your organization's intranet, or the Internet, is generally an interactive one characterized by periodic analysis of crawling results and modification to crawling parameters. For example, if you observe that the crawler is spending days crawling one Web host, then you might want to exclude crawling at that host or limit the crawling depth.

This section contains the most common things to consider to improve crawl performance:

Register a Proxy
Check Boundary Rules
Check Dynamic Pages
Check Crawler Depth
Check Robots.txt Rule
Check Duplicate Pages
Check Redirected Pages
Check URL Looping
What to do Next

See Also:

"Monitoring the Crawling Process" for more information on crawling parameters

Register a Proxy

By default, Oracle SES is configured to crawl Web sites in the intranet. In other words, crawling internal Web sites requires no additional configuration. However, to crawl Web sites on the Internet (also referred to as external Web sites), Oracle SES needs the HTTP proxy server information. See the Global Settings - Proxy Settings page.If the proxy requires authentication, then enter the proxy authentication information on the Global Settings - Authentication page.

Check Boundary Rules

The seed URL you enter when you create a source is turned into an inclusion rule. For example, if w ww.example.com is the seed URL, then Oracle SES creates an inclusion rule that only URLs containing the string www.example.com will be crawled.

However, suppose that the example Web site includes URLs starting with www.exa-mple.com or ones that start with example.com (without the www). Many pages have a prefix on the site name. For example, the investor section of the site has URLs that start with investor.example.com.

Always check the inclusion rules before crawling, then check the log after crawling to see what patterns have been excluded.

In this case, you might add www.example.com, www.exa-mple.com, and investor.example.com to the inclusion rules. Or you might just add example.

To crawl outside the seed site (for example, if you are crawling text.us.oracle.com, but you want to follow links outside of text.us.oracle.com to oracle.com), consider removing the inclusion rules altogether. Do so carefully. This could lead the crawler into many, many sites.

Notes for File Sources

For file sources, if no boundary rule is specified, then crawling is limited to the underlying file system access privileges. Files accessible from the specified seed file URL will be crawled, subject to the default crawling depth. The depth, which is 2 by default, is set on the Global Settings - Crawler Configuration page. For example, if the seed is file://localhost/home/user_a/, then the crawl will pick up all files and directories under user_a with access privileges. It will crawl any documents in the directory /home/user_a/level1 due to the depth limit. The documents in the /home/user_a/level1/level2 directory are at level 3.
The file URL can be of UNC (universal naming convention) format. The UNC file URL has the following format: file://localhost///<LocalMachineName>/<SharedFolderName>.

For example, \\stcisfcr\docs\spec.htm should be specified as file://localhost///stcisfcr/docs/spec.htm.
On some machines, the path or file name could contain non-ASCII and multibyte characters. URLs are always represented using the ASCII character set. Non-ASCII characters are represented using the hex representation of their UTF-8 encoding. For example, a space is encoded as %20, and a multibyte character can be encoded as %E3%81%82.

For file sources, spaces can be entered in simple (not regular expression) boundary rules. Oracle SES automatically encodes these URL boundary rules. If (Home Alone) is specified, then internally it is stored as (Home%20Alone). Oracle SES does this encoding for the following:
- File source simple boundary rules
- Test URL strings
- File source seed URLs

Note:

Oracle SES does not alter the rule if it is a regular expression rule. It is the administrator's responsibility to make sure that the regular expression rule specified is against the encoded file URL. Spaces are not allowed in regular expression rules.

Check Dynamic Pages

Indexing dynamic pages can generate an excessive number of URLs. From the target Web site, manually navigate through a few pages to understand what bound ary rules should be set to avoid crawling identical pages.

Check Crawler Depth

Setting the crawler depth very high (or unlimited) could lead the crawler into many sites. Without boundary rules, 20 will probably crawl the whole WWW from most locations.

Check Robots.txt Rule

You can control which parts of your sites can be visited by robots. If robots exclusion is enabled (default), then the Web crawler traverses the pages based on the access policy specified in the Web server robots.txt file.

The following sample /robots.txt file specifies that no robots should visit any URL starting with /cyberworld/map/ or /tmp/ or /foo.html:

# robots.txt for http://www.example.com/
 
User-agent: *
Disallow: /cyberworld/map/ 
Disallow: /tmp/ 
Disallow: /foo.html

If the Web site is under the user's control, then a specific robots rule can be tailored for the crawler by specifying the Oracle SES crawler plug-in name "User-agent: Oracle Secure Enterprise Search." For example:

User-agent: Oracle Secure Enterprise Search
 
Disallow: /tmp/

The robots meta tag can instruct the crawler to either index a Web page or follow the links within it. For example:

<meta name="robots" content="noindex,nofollow">

Check Duplicate Pages

If Oracle SES thinks a page is identical to one it has seen before, then it will not index it. If the page is reached through a URL that Oracle SES has already processed, then it will not index that either.

Check Redirected Pages

The crawler crawls only redirected pages. For example, a Web site might have Javascript redirecting users to another site with the same title. Only the redirected site is indexed.

Check for inclusion rules from redirects. This is based on type of redirect. There are three kinds of redirects defined in EQ$URL:

Temporary Redirect: A redirected URL is always allowed if it is a temporary redirection (HTTP status code 302, 307). Temporary redirection is used for whatever reason that the original URL should still be used in the future. It's not possible to find out temporary redirect from EQ$URL table other than filtering out the rest from the log file.
Permanent Redirect: For permanent redirection (HTTP status 301), the redirected URL is subject to boundar y rules. Permanent redirection means the original URL is no longer valid and the user should start using the new (redirected) one. In EQ$URL, HTTP permanent redirect has the status code 954
Meta Redirect: Metatag redirection is treated as a permanent redirect. Meta redirect has status code 954. This is always checked against boundary rules.

Check URL Looping

URL looping refers to the scenario where a large number of unique URLs all point to the same document. One particularly difficult situation is where a site contains a large number of pages, and each page contains links to every other page in the site. Ordinarily this would not be a problem, because the crawler eventually analyzes all documents in the site.

However, some Web servers attach parameters to generated URLs to track information across requests. Such Web servers might generate a large number of unique URLs that all point to the same document.

For example, http://example.com/somedocument.html?p_origin_page=10 might refer to the same document as http://example.com/somedocument.html?p_origin_page=13 but the p_origin_page parameter is different for each link, because the referring pages are different. If a large number of parameters are specified and if the number of referring links is large, then a single unique document could have thousands or tens of thousands of links referring to it. This is an example of how URL looping can occur.

Monitor the crawler statistics in the Oracle SES administration tool to determine which URLs and Web servers are being crawled the most. If you observe an inordinately large number of URL accesses to a particular site or URL, then you might want to do one of the following:

Exclude the Web Server: This prevents the crawler from crawling any URLs at that host. (You cannot limit the exclusion to a specific port on a host.)
Reduce the Crawling Depth: This limits the number of levels of referred links the crawler will follow. If you are observing URL looping effects on a particular host, then you should take a visual survey of the site to find out an estimate of the depth of the leaf pages at that site. Leaf pages are pages that do not have any links to other pages. As a general guideline, add three to the leaf page depth, and set the crawling depth to this value.

Be sure to restart the crawler after altering any parameters. Your changes take effect only after restarting the crawler.

What to do Next

If you are still not crawling all the pages you think you should, then check which pages were crawled by doing one of the following:

Check the crawler log file. (There's a link on the Home - Schedules page and the location of the full log on the Home - Schedules - Status page.)
Create a search source group. (Search - Source Groups - Create New Source Group) Put only one source in the group. From the Search page, search that group. (Click the group name on top of the search box.) Or, from the Search page, click Browse Search Groups. Click the group name for a hierarchy. You could also click the number next to the group name for a list of the pages crawled.

Tuning Search Performance

This section contains suggestions on how to improve the response time and throughput performance of Oracle SES.

This section contains the most common things to consider to improve search performance:

Add Suggested Links or Suggested Content
Optimize the Index
Increase the Indexing Batch Size
Increase the Index Memory Size
Check the Search Statistics
Increase the JVM Heap Size
Increase the Oracle Undo Space

Add Suggested Links or Suggested Content

Suggested links let you direct users to a particular Web site for a given search string. For example, when users search for "Oracle Secure Enterprise Search documentation" or "Enterprise Search documentation" or "Search documentation", you could suggest http://www.oracle.com/technology. Suggested links appear at the top of the search result list. This feature is especially useful to provide links to important Web pages that are not crawled by Oracle Secure Enterprise Search. Set suggested links on the Search - Suggested Links page in the administration tool.

Suggested content lets you display real-time data content in the result list of the default query application. Oracle SES retrieves data from content providers and applies a style sheet to the data to generate an HTML fragment. The HTML fragment is displayed in the result list and is available through the Web Services API. For example, when an end user searches for contact information on a coworker, Oracle SES can fetch the content from the suggested content provider and return the contact information (e-mail address, phone number, and so on) for that person in the result list. Suggested content results appear under any suggested links and above the query results.

Configure suggested content on the Search - Suggested Content page in the administration tool. Enter the maximum number of suggested content results (up to 20) to be included in the Oracle SES result list. The results are rendered on a first-come, first-served basis.

Regular expressions (as supported in the Java regular expression API java.util.regex) are used to define query patterns for suggested content providers. The regular expression-based pattern matching is case-sensitive. For example, a provider with the pattern dir\s(\S+) is triggered on the query dir james but not on the query Dir James. To trigger on the query Dir James, the pattern could be defined either as [Dd][Ii][Rr]\s+(\S+) or as (?i)dir\s+(\S+). A provider with a blank query pattern is triggered on all queries.

The URL you enter for the suggested content provider can contain the following variables: $ora:q, $ora:lang, $ora:q1, ... $ora:qn and $ora:username.

$ora:q is the end user full query.
$ora:lang is the two-letter code for the browser language
$ora:qn is the nth regular expression match group from the end user query. n starts from 1. If no nth group is matched, then the empty string replaces the variable.
$ora:username is the end user name.

Enter an XSLT style sheet to defines rules (for example, the size and style) for transforming XML content from a provider into an HTML fragment. This HTML fragment is displayed in the result list or returned over the Web Services API. If you do not enter an XSLT style sheet, then Oracle SES assumes that the suggested content provider returns HTML. If you do not enter an XSLT style sheet and the provider returns XML, then the result list displays the plain XML.

Note:

It is the administrator's responsibility to ensure that suggested content providers return valid and safe content. Corrupted or incomplete content returned by an suggested content provider can affect the formatting of the default query application results page.

There are three security options for how Oracle SES passes the end user's authentication information to the suggested content provider:

None: With this method (the default), no security policy is used.
Cookie: With this method, the end user first must be authenticated by the suggested content provider. A cookie is set for the user to maintain a session. Oracle SES must know the cookie used by the provider for authentication, and it is made available during registration of the suggested content provider. When the user enters a query, Oracle SES grabs the cookies from the user's request header and passes them to the provider. The cookie scope must be set to the common domain of the provider site and the Oracle SES site by the provider.

For example, suppose the provider site is http://provider.company.com and the Oracle SES site is http://ses.company.com. After the end user logs in to the provider site, the site could set the value of the security cookie loginCookie with domain scope .company.com. When the end user searches in Oracle SES, Oracle SES gets the loginCookie value from the end user browser and forwards it to the provider site to get the suggested content (without login to the provider site again). However, if the provider site is accessed as http://provider or if the Oracle SES site is accessed as http://SES, then no domain cookie is available for sharing between the two sites and this security mechanism does not work.

You can decide what happens when suggested content is available but the user is not logged in to the suggested content provider or the cookie for the provider is not available. For Unauthenticated User Action, if you select Ignore content, then content from that provider will not be displayed in the result list. If you select Display login message, then Oracle SES returns a message that there is content available from this provider but the user is not logged in. The message also provides a link to log in to that provider. Enter the link for the suggested content provider login in the Login URL field.
Service-to-Service: With this method, a one-way trusted relationship is established between Oracle SES and the suggested content provider. Any user already logged in to Oracle SES does not need to be authenticated by the provider again. The provider only authenticates the Oracle SES application and trusts the Oracle SES application to act as the end user. The end user identity is sent from Oracle SES to the provider site in the HTTP header ORA_S2S_PROXY_USER. The trusted entity could be a proxy user configured in the identity management system used by the provider, or it could be a name-value pair.

Example Configuring Google OneBox for Suggested Content

Existing OneBox providers can be configured for use as Oracle SES Suggested Content providers. For example, for a Google OneBox provider, the provider URL might be http://host.company.com/apps/directory.jsp and the trigger might be dir\s(\S+). When the user query is dir james, the provider receives the request with a query string similar to the following: apiMaj=10&apiMin=1&oneboxName=app&query=james.

With a Suggested Content provider, set the URL template as http://host.company.com/apps/directory.jsp?apiMaj=10&apiMin=1&oneboxName=app&query=$ora:q1. The provider pattern is the same: dir\s(\S+). The XSLT used for Google OneBox can be re-used with a minor change. Look for the line:

<xsl:template name="apps">

and change that line in your template to

<xsl:template match="/OneBoxResults">

Optimize the Index

Opti mizing the index reduces fragmentation, and it can significantly increase the speed of searches. Schedule index optimization on a regular basis. Also, optimize the index after the crawler has made substantial updates or if fragmentation is more than 50%. Make sure index optimization is scheduled during off-peak hours. Optimization of a very large index could take several hours.

See the fragmentation level and run index optimization on the Global Settings - Index Optimization page in the administration tool.

Increase the Indexing Batch Size

The data in the cache directory continues to accumulate until it reaches this limit. When the limit is reached, the data is indexed. The bigger the batch size, the longer it will take to index each batch. Only indexed data can be searched: data in the cache cannot be searched.

The default indexing batch size is 250M. Increasing the size up to the index memory size (275M by default) can reduce index fragmentation. However, increasing the size more than the index memory size will not reduce fragmentation. You can change the index memory size manually.

Set the indexing batch size on the Global Settings - Crawler Configuration page in the administration tool.

Increase the Index Memory Size

A large index memory setting (even hundreds of megabytes) improves the speed of indexing and reduces the fragmentation of the final indexes. However, there will be a point where it is set so high that memory paging occurs and impacts indexing speed.

Follow these steps to increase the index memory size:

Launch SQL*Plus and connect as the eqsys user.
Run the following SQL statement to see the current indexing memory size:
```
SQL> SELECT par_value FROM ctx_parameters
2  WHERE par_name = 'DEFAULT_INDEX_MEMORY';
 
PAR_VALUE
-----------
288358400
```
This is the default value for indexing memory size. The unit is bytes. (288358400 bytes = 275M bytes)

To change the default indexing memory size to 500M (524288000bytes), run the following procedure:

SQL> begin
2  ctxsys.ctx_adm.set_parameter('DEFAULT_INDEX_MEMORY','524288000');
3  end;
4  /
 
PL/SQL procedure successfully completed.
 
SQL> SELECT par_value FROM ctx_parameters
2  WHERE par_name = 'DEFAULT_INDEX_MEMORY';
 
PAR_VALUE
-----------
524288000

You can specify up to 2G for DEFAULT_INDEX_MEMORY. To allocate more than 1G, you also must change MAX_INDEX_MEMORY. DEFAULT_INDEX_MEMORY cannot exceed MAX_INDEX_MEMORY, and the default value for MAX_INDEX_MEMROY is 1G. The maximum size for MAX_INDEX_MEMORY is 2,147,483,647 bytes.
```
SQL> begin
2  ctxsys.ctx_adm.set_parameter('MAX_INDEX_MEMORY','2147483647');
3  end;
4  /
 
PL/SQL procedure successfully completed.
 
SQL> begin
2  ctxsys.ctx_adm.set_parameter('DEFAULT_INDEX_MEMORY','2147483647');
3  end;
4  /
 
PL/SQL procedure successfully completed.
```
You can change the memory size any time. The next synchronized index uses this specified memory size.

Note:

The indexing batch size determines when the synchronized index is called. Even if DEFAULT_INDEX_MEMORY is large enough, Oracle SES does not use it if the indexing batch size is small. For example, if the indexing batch size is 10M, then the synchronized index uses memory up to 10M, even if you specify 1G for it.

Tip:

"Increase the Indexing Batch Size"

Check the Search Statistics

See the Home - Statistics page in the administration tool for lists of the most popular queries, failed queries, and ineffective queries. This information can lead to the following actions:

Refer users to a particular Web site for failed queries on the Search - Suggested Links page.
Fix common errors that users make in searching on the Search - Alternate Words page.
Make important documents easier to find on the Search - Relevancy Boosting page.

Relevancy Boosting

Relevancy boosting lets administrators influence the order of documents in the result list for a particular search. You might want to override the default results for the following reasons:

For a highly popular search, direct users to the best results
For a search that returns no results, direct users to some results
For a search that has no click-throughs, direct users to better results

In a search, each result is assigned a score that indicates how relevant the result is to the search; that is, how good a result it is. Sometimes there are documents that you know are highly relevant to some search. For example, your company Web site could have a home page for XML (http://example.com/XML-is-great.htm), which you want to appear high in the results of any search for "XML". You would boost the score of that home page (http://example.com/XML-is-great.htm) to 100 for an "XML" search.

There are two methods for locating URLs for relevancy boosting: locate by search or manual URL entry.

Note:

The document still has a score computed if you enter a search that is not one of the boosted queries.

Relevancy boosting, like end user searching, is case-insensitve. For example, a document with a boosted score for "Oracle" is boosted when you enter "oracle".

Increase the JVM Heap Size

If you expect heavy load on the Oracle SES server, then configure the J ava virtual machine (JVM) heap size for better performance.

The heap size is defined in the $ORACLE_HOME/search/config/searchctl.conf file. By default, the following values are given:

max_heap_size = 1024 megabytes

min_heap_size = 512 megabytes

Increase the value of these parameters appropriately. The max size should not exceed the physical memory size. Then restart the mid-tier with searchctl restart.

Increase the Oracle Undo Space

Heavy query load should not coincide with heavy crawl activity, especially when there are large-scale changes on the target site. If it does, for example when the crawl needs be scheduled around-the-clock, then increase the size of the Oracle undo tablespace with the UNDO_RETENTION parameter.

Using Backup and Recovery

A backup is a copy of configuration data that can be used to recover your configuration settings after a hardware failure. When a backup is performed on the Global Settings - Configuration Data Backup and Recovery page, Oracle SES copies the data to the binary metaData.bkp file. The location of that file is provided on the Global Settings - Configuration Data Backup and Recovery page. When the backup successfully completes, you must copy this file to a different host. You should backup after making configuration data changes, such as creating or editing sources.

Recovery can only be performed on a fresh installation. When the installation completes, copy the metaData.bkp file to the location provided in the administration tool. Sources need to be crawled again to see search results.

Some notes about backup and recovery:

You must stop all running schedules before doing the backup.
Secure search does not need to be re-enabled after recovery. If secure search is enabled in the backup instance, you do not need to re-register or re-activate the identity plug-in after recovery. Neither re-activation nor re-registration of the identity plug-in is required. If a plug-in was active when the instance was backed up, the same plug-in will be activated in the recovered instance, using the same parameters.
If you have file or table sources residing on the same machine as the one running Oracle SES, and if you intend to use a different machine for recovery, then you must use the actual host name (not localhost) when creating the sources.
For database table sources, confirm that the remote tables exist.
For file sources, confirm that files and paths are valid after recovery.
During recovery, the mail archive directory settings for existing mailing list and e-mail sources is changed. After recovery, the location will be <cache-dir>/mail, which is the default for new e-mail and mailing list sources. Any customized directory locations prior to recovery will be lost.

Integrating with Google Desktop for Enterprise

Oracle Secure Enterprise Search provides a plug-in (or connector) to integrate with Google Desktop for Enterprise (GDfE). You can include Google Desktop results in your Oracle SES hitlist. You can also link to Oracle SES from the GDfE interface.

See Also:

Google Desktop for Enterprise Readme at http://host:port/search/query/gdfe/gdfe_readme.html for details about how to integrate with GDfE

Monitoring Oracle Secure Enterprise Search

In a production environment, where a load balancer or other monitoring tools are used to ensure system availability, Oracle Secure Enterprise Search (SES) can also be easily monitored through the following URL: http://<host>:<port>/monitor/check.jsp. The URL should return the following message: Oracle Secure Enterprise Search instance is up.

Note:

This message is not translated to other languages, because system monitoring tools might need to byte-compare this string.

If Oracle SES is not available, then the URL returns either a connection error or the HTTP status code 503.

Turning On Debug Mode

Debug mode is useful for troubleshooting purposes. To turn on debug mode for Oracle SES administration tool, update the search.properties file located in the $ORACLE_HOME/search/webapp/config directory. Set debug=true and restart the Oracle SES middle tier with searchctl restart.

To turn off debug mode when you are finished troubleshooting, set debug=false and restart the middle tier with searchctl restart.

Note:

$ORACLE_HOME represents the directory where Oracle SES was installed.

Debug information can be found in the OC4J log file: $ORACLE_HOME/oc4j/j2ee/OC4J_SEARCH/log/oc4j.log.

Restarting Oracle Secure Enterprise Search After Rebooting

The tool for starting and stopping the search engine is searchctl. To restart Oracle SES (for example, after rebooting the host machine), navigate to the bin directory and run searchctl startall.

Note:

Users are prompted for a password when running searchctl commands on UNIX platforms. No password is required on Windows platforms. This is because Oracle SES installation on Windows requires a user with administrator privileges. When running commands to start or stop the search engine, no password is required as long as the user is a member of the administrator group.

See Also:

Startup / Shutdown lesson in the Oracle SES administration tutorial: http://st-curriculum.oracle.com/tutorial/SESAdminTutorial/index.htm