Configure crawl and Add new crawl are in essence similar. They contain setup options that determine the scope, frequency and behaviour of crawls. These pages are reached by clicking on the Configure button on the Crawl summary page of an existing crawl; or by clicking Add new crawl button on the Site overview page.
Each option has an associated help text. Hover your mouse over the red question mark to display help texts.
Here is a summary of what each option does:
- Name: Each crawl can be named as required.
- URL: This is the designated starting point for the crawl.
- Alias: In order to limit the scope of a crawl to a specific site section, a crawl alias may be entered. For example, a crawl might start at the URL http://example.com/news/listing.html, and it should only index pages within the news section. If each news item has the following URL pattern: http://example.com/news/news_item_name.html, set the alias to http://example.com/news/. The crawler will not follow any links to pages that do not match this URL pattern.
- Interval: This allows you to determine how often the crawl is performed. This can range between 1 minute and years. Pages that are updated frequently may need frequent crawling, whereas pages that are updated less frequently can be crawled less frequently.
- Crawl level: If this is left blank, the default setting of Complete crawl will be applied. A level 1 crawl follows the links from the start page entered in the URL field above, and indexes all the pages one level below it. A level 2 crawl indexes a level deeper, and so on.
- Active: Tick this box to activate the crawl.
- Index first page: This determines whether the content of the start page entered in the URL field above should be included in the search index; or whether this is simply a page that refers to content pages. This option may be applicable to index or directory pages -- such as news front pages, A-Z pages, or Sitemap pages -- that do not contain content that should be searchable, but which are useful as starting points for a crawl.
- Is this a Delete crawl?: Tick this box if this crawl should remove the pages that have links on the designated start page from the search index. It is possible to manage the content of the search index by creating a page with links to pages that should be removed from the search index. There are some advantages to managing search index content using this method:
- It reduces the need for complete site indexing, reducing the demands on your web server resources
- The page that contains links to pages that should be removed can be generated using your Content Management System, making day-to-day management simple.
- Is this an Update crawl?: Tick this box to make this crawl an Update crawl. This type of crawl uses the same method as a Delete crawl: The URL field above is used to point to an index page containing a list of links to pages that have been updated since the last crawl. This index page can be generated using your content management system, reducing the need for complete crawls.