Skip Headers
Oracle® Secure Enterprise Search Administrator's Guide
10g Release 1 (10.1.8)

Part Number B32259-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

B URL Crawler Status Codes

The crawler uses a set of codes to indicate the result of the crawled URL. Besides the standard HTTP status code, it uses its own code for non-HTTP related situations.

Only URLs with status 200 will be indexed. If the record exists in EQ$URL but the status is something other than 200, then the crawler encountered an error trying to fetch the document. A status of less than 600 maps directly to the HTTP status code.

The following table lists the URL status codes, document container codes used by the crawler plug-in, and EQG codes.

Code Description Document Container Code EQG Codes
200 URL OK STATUS_OK_FOR_INDEX N/A
400 Bad request STATUS_BAD_REQUEST 30009
401 Authorization required STATUS_AUTH_REQUIRED 30007
402 Payment required
30011
403 Access forbidden STATUS_ACCESS_FORBIDDEN 30010
404 Not found STATUS_NOTFOUND 30008
405 Method not allowed
30012
406 Not acceptable
30013
407 Proxy authentication required STATUS_PROXY_REQUIRED 30014
408 Request timeout STATUS_REQUEST_TIMEOUT 30015
409 Conflict
30016
410 Gone
30017
414 Request URI too large
30066
500 Internal server error STATUS_SERVER_ERROR 10018
501 Not implemented
10019
502 Bad gateway STATUS_BAD_GATEWAY 10020
503 Service unavailable STATUS_FETCH_ERROR 10021
504 Gateway timeout
10022
505 HTTP version not supported
10023
902 Timeout reading document STATUS_READ_TIMEOUT 30057
903 Filtering failed STATUS_FILTER_ERROR 30065
904 Out of memory error STATUS_OUT_OF_MEMORY 30003
905 IOEXCEPTION in processing URL STATUS_IO_EXCEPTION 30002
906 Connection refused STATUS_CONNECTION_REFUSED 30025
907 Socket bind exception
30079
908 Filter not available
30081
909 Duplicate document detected
30082
910 Duplicate document ignored STATUS_DUPLICATE_DOC 30083
911 Empty document STATUS_EMPTY_DOC 30106
951 URL not indexed (this can happen if robots.txt specifies that a certain document should not be indexed) STATUS_OK_BUT_NO_INDEX N/A
952 URL crawled STATUS_OK_CRAWLED N/A
953 Metatag redirection
N/A
954 HTTP redirection
30000
955 Black list URL
N/A
956 URL is not unique
31017
957 Sentry URL (URL as a place holder)
N/A
958 Document read error STATUS_CANNOT_READ 30173
959 Form login failed STATUS_LOGIN_FAILED 30183
960 Document size too big, ignored STATUS_DOC_SIZE_TOO_BIG 30209
962 Document was excluded based on mime type STATUS_DOC_MIME_TYPE_EXCLUDED 30041
964 Document was excluded based on boundary rules STATUS_DOC_BOUNDARY_RULE_EXCLUDED 30258
1001 Datatype is not TEXT/HTML
30001
1002 Broken network data stream
30004
1003 HTTP redirect location does not exist
30005
1004 Bad relative URL
30006
1005 HTTP error
30024
1006 Error parsing HTTP header
30058
1007 Invalid URL table column name
30067
1009 Binary document reported as text document
30126
1010 Invalid display URL
30112
1011 Invalid XML from OracleAS Portal PORTAL_XMLURL_FAIL 31011
1020-1024 URL is not reachable. The status starts at 1020, and it increases by one with each try. After five tries (if it reaches 1025), the URL is deleted.
N/A
1111 URL remained in the queue even after a successful crawl. This indicates that the crawler had a problem processing this document. You could investigate the URL by crawling it in a separate source to check for errors in the crawler log.
N/A