30025 |
{0}: Connection refused |
The Web site refuses the URL access request. |
Check the network setup environment of the machine running the crawler. |
30027 |
Not allowed URL: {0} |
A URL link violates boundary rules and is discarded. |
Confirm that the URL indeed can be ignored. |
30030 |
Malformed URL: {0} |
The URL is not properly formed. |
Verify the URL. |
30031 |
Excluded by ROBOTS.TXT: {0} |
The robots.txt rule from the Web site of the URL does not allow the URL to be crawled. |
Configure the crawler to ignore robots rule only when you are managing the target Web site. This is done on the Home - Sources - Crawling Parameters page. |
30040 |
Ignore URL: {0} |
Redirection to this URL is not allowed by boundary rule. |
Confirm that the URL indeed should be ignored. |
30041 |
{0}: excluded by MIME type inclusion rule, URL is {1} |
The content type of the URL is not in MIME type inclusion list. |
Check if the specified content type should be included. |
30054 |
Excessively long URL: {0} |
The URL string is too long, and the URL is ignored. |
N/A |
30057 |
{0}: timeout reading document |
The target Web site is too slow sending page content. |
Increase the crawler timeout threshold from the crawler configuration page. The default is 30 seconds. |
30083 |
{0}: Duplicate document ignored |
A document with the same content has been seen before in the same crawl session. This could be an indication of URL looping; that is, a generation of different URLs pointing back to the same page. |
Check if the URL is generated correctly. If necessary, disable indexing dynamic URLs. This is done on the Home - Sources - Crawling Parameters page. |
30126 |
Binary document reported as text document: "{0}" |
A binary file has been sent by the Web site as a text document. In most cases, the URL in question is not a binary format text document, like pdf. |
Correct the Web site content type setting for the URL, if possible. |
30188 |
Login form not specified for "{0}" |
Unable to perform HTML form login, because the name of the form is not set. In general, the name of the form should be automatically set by the crawler. |
Identify the URL of the login page, and check whether this is a regular HTML form login page or a SSO login page. Report the problem to Oracle support. |
30199 |
Encountered an error while responding to the following HTTP authentication request: [{0}] |
Unable to authenticate through the target URL. |
Verify if the authentication request is basic authentication or digest authentication. Also confirm the provided authentication credentials. |
30201 |
Missing authentication credentials |
Authentication data is not available to access the URL. |
Check the type of authentication needed and provide it through the source customization page |
30206 |
Ignoring "{0}" due to host (or redirected host) connection problem |
The crawler is unable to contact the server of the URL. |
Verify that the Web site in question is up and try to re-crawl. |
30209 |
Document size ({0}) too big, ignored: {1} |
Document size exceeds the default limit of 10 megabytes. |
Increase the document size limit on the Global Settings - Crawler Configuration page. |
30215 |
Excluded by crawling depth limit({0}): {1} |
Previously crawled URL is excluded due to newly reduced crawling depth limit. |
Confirm that the depth limit is correct. |
30782 |
Invalid document attribute {0} - ignored |
Some of the attribute picked up from the document is not defined for the source. It is ignored. |
Most likely this is safe to ignore, unless you know that this particular attribute should be defined for this source. In that case, contact Oracle Support. |