FUEL CMS User Guide : Version 1.5.2


Search Module Documentation

This Search module documentation is for version 1.0.

Overview

The Search module can be used to index the content of your site and provides views for you to customize for your own search result pages.

Displaying Search Results

After installing, copy over the fuel/modules/search/views/search.php file to fuel/application/views/search.php. Doing this allows you to make any necessary modifications to the view file outside of the search module. It is also a good idea to copy over the fuel/modules/search/config/search.php and place it fuel/application/config/search.php so that you can add configuration settings without changing the search config file in the fuel/modules/search/config/search.php folder.

Aternatively, you can overwrite the view configuration value with an array syntax to point to a different module's view folder:

$config['search']['view'] = array('my_module' =>'search');

Query Type

You can specify several query types in the configuration which is used for searching the index table. Values can be either "like" which will do a %word% query, "match" which will use the MATCH / AGAINST syntax or "match boolean" which will do a match against in boolean mode. It is recommended that you use "match boolean" or "like" if you have a small number of records.

sitemap.xml

A search index creates a list of pages that can be easily used to generate a sitemap as well. To implement, create a route like so:

$route['sitemap.xml'] = 'search/sitemap';

Indexing

There are three different options when crawling a site and can be configured with the index_method configuration option:

Excluding Pages

Often times, there may be pages you want to exclude from the search index. You can use the exclude configuration parameter and provide an array of page locations that you'd like to exclude. It excepts a similar syntax as CodeIgniter routes where you can use regular expression and :any and :num for specifying a range of pages. The search module also honors pages specified in the robots.txt file.

CLI

Indexing the site can also be done via the CLI like so

>php index.php fuel/tools/search/index_site

If you run it via the CLI, you may need to change the search configurations base_url value by creating a new configuration file at fuel/application/config/search.php with the base URL value of your site:

...
$config['search']['base_url'] = 'http://localhost/';

Using Delimiters

You can specify different delimiters for the crawler to use when parsing information from the site. To do so, create your own spefic configuration file at fuel/application/config/search.php if you haven't already and add one or more of the configuration values specified below to overwrite the defaults. The delimiters can be HTML tags or xpath The default set is listed below. The first one, delimiters, is used to grab the general indexable content for the page. The second one, title_page, is used to grab the title associated with the search index. The third, excerpt is used for the search results page. The 4th, language, is the delimiter that is used for determining the language of the page with (multi-language sites):

...
// search page content delimiters. used for scraping page content. Can be an HTML node or xpath syntax (e.g. //div[@id="main"])
$config['search']['delimiters'] = array(
	'<div id="main">', 
	'//meta[@name="keywords"]/@content',
);

// search page title tag (e.g. "title, h1")
$config['search']['title_tag'] = array('title', 'h1');

// search page for appropriate tag to save as excerpt tag (e.g. "p", "meta[@name="description"]/@content")
$config['search']['excerpt_tag'] = array('p', '//meta[@name="description"]/@content');

// search page for appropriate language value using the meta values or html lang attrubute (e.g. "p", "html[@lang]")
$config['search']['language_tag'] = array('html[@lang]/@lang');

To specify a new set of delimiters for your site, create a new configuration file at fuel/application/config/search.php if you haven't already. This will overwrite any values found in the fuel/modules/search/config/search.php.

Hooks

Indexing an entire site can be a time consuming process especially for bigger sites. To help with this issue, there is is a search module hook that will run on any module that has specified a preview_path after editing or creating a new module record. This will help keep your index up to date incrementally. To incorporate, add the following to the fuel/application/config/hooks.php:

// include hooks specific to FUEL
include(SEARCH_PATH.'config/search_hooks.php');

TIP: The search module provides the the added bonus of sniffing out bad URLs.

Search Configuration

The following configuration parameters can be found in the modules/search/config/search.php configuration file. It is recommended that you copy the config file and place it in your fuel/application/config directory which will override the defaults and make it easier for future updates.

Property Default Value Description
base_url
''
the base url in which to look. By default it will assume the same value as site_url()
indexing_enabled
TRUE
whether to enable search indexing
user_agent
'FUEL'
the user agent used when indexing
query_type
'match boolean'
value can be either "like" which will do a %word% query, "match" which will use the MATCH / AGAINST syntax OR "match boolean" which will do a match against in boolean mode. Use "match boolean" OR "like" if you have a small number of records.
delimiters
array(
'<div id="main">',
'meta[@name="keywords"]/@content',
)
search page content delimiters. used for scraping page content. Can be an HTML node or xpath syntax (e.g. div[@id="main"])
title_tag
array('title', 'h1')
search page title tag (e.g. "title, h1")
excerpt_tag
array('p', 'meta[@name="description"]/@content')
search page for appropriate tag to save as excerpt tag (e.g. "p", "meta[@name="description"]/@content")
language_tag
array('html[@lang]/@lang')
search page for appropriate language value using the meta values or html lang attrubute (e.g. "p", "html[@lang]")
exclude
array()
the URI locations of pages to exclude from the index. You can also add them to the "robots.txt" file for your site
index_method
'crawl'
can be AUTO, "crawl" or "sitemap" crawl, will scan the site for local links to index sitemap will use the sitemap if it exists AUTO will first check the sitemap (because it's faster), then will default to the crawl
index_modules
TRUE
whether to automatically index modules that have a preview_path specified. Default is TRUE and will automatically do it for all modules. If an array is specified, then it will only index those in the array
view
'search'
the view file to use to display the results. An array can be used to point to a different module (e.g. array('my_module' => 'search'))
min_length_search
3
minimum length of the search term
depth
0
maximum search depth. 0 means there is no limit
user_tmp_table
0
use a temp table to store crawled information before switching to live table
pagination
array(
'per_page' => 10,
'num_links' => 2,
'prev_link' => lang('search_prev_page'),
'next_link' => lang('search_next_page'),
'first_link' => lang('search_first_link'),
'last_link' => lang('search_last_link'),
)
pagination

Libraries