Search Module Documentation
This Search module documentation is for version 1.0.
Overview
The Search module can be used to index the content of your site and provides views for you to customize for your own search result pages.
Displaying Search Results
After installing, copy over the fuel/modules/search/views/search.php file to fuel/application/views/search.php. Doing this allows you to make any necessary modifications to the view file outside of the search module. It is also a good idea to copy over the fuel/modules/search/config/search.php and place it fuel/application/config/search.php so that you can add configuration settings without changing the search config file in the fuel/modules/search/config/search.php folder.
Aternatively, you can overwrite the view configuration value with an array syntax to point to a different module's view folder:
$config['search']['view'] = array('my_module' =>'search');
Query Type
You can specify several query types in the configuration which is used for searching the index table. Values can be either "like" which will do a %word% query, "match" which will use the MATCH / AGAINST syntax or "match boolean" which will do a match against in boolean mode. It is recommended that you use "match boolean" or "like" if you have a small number of records.
sitemap.xml
A search index creates a list of pages that can be easily used to generate a sitemap as well. To implement, create a route like so:
$route['sitemap.xml'] = 'search/sitemap';
Indexing
There are three different options when crawling a site and can be configured with the index_method configuration option:
- crawl: will scan the site for local links to index
- sitemap: will use the sitemap.xml file if it exists
- AUTO: will first check the sitemap.xml (because it's faster), then will default to the crawl
Excluding Pages
Often times, there may be pages you want to exclude from the search index. You can use the exclude configuration parameter and provide an array of page locations that you'd like to exclude. It excepts a similar syntax as CodeIgniter routes where you can use regular expression and :any and :num for specifying a range of pages. The search module also honors pages specified in the robots.txt file.
CLI
Indexing the site can also be done via the CLI like so
>php index.php fuel/tools/search/index_site
If you run it via the CLI, you may need to change the search configurations base_url value by creating a new configuration file at fuel/application/config/search.php with the base URL value of your site:
... $config['search']['base_url'] = 'http://localhost/';
Using Delimiters
You can specify different delimiters for the crawler to use when parsing information from the site. To do so, create your own spefic configuration file at fuel/application/config/search.php if you haven't already and add one or more of the configuration values specified below to overwrite the defaults. The delimiters can be HTML tags or xpath The default set is listed below. The first one, delimiters, is used to grab the general indexable content for the page. The second one, title_page, is used to grab the title associated with the search index. The third, excerpt is used for the search results page. The 4th, language, is the delimiter that is used for determining the language of the page with (multi-language sites):
... // search page content delimiters. used for scraping page content. Can be an HTML node or xpath syntax (e.g. //div[@id="main"]) $config['search']['delimiters'] = array( '<div id="main">', '//meta[@name="keywords"]/@content', ); // search page title tag (e.g. "title, h1") $config['search']['title_tag'] = array('title', 'h1'); // search page for appropriate tag to save as excerpt tag (e.g. "p", "meta[@name="description"]/@content") $config['search']['excerpt_tag'] = array('p', '//meta[@name="description"]/@content'); // search page for appropriate language value using the meta values or html lang attrubute (e.g. "p", "html[@lang]") $config['search']['language_tag'] = array('html[@lang]/@lang');
To specify a new set of delimiters for your site, create a new configuration file at fuel/application/config/search.php if you haven't already. This will overwrite any values found in the fuel/modules/search/config/search.php.
Hooks
Indexing an entire site can be a time consuming process especially for bigger sites. To help with this issue, there is is a search module hook that will run on any module that has specified a preview_path after editing or creating a new module record. This will help keep your index up to date incrementally. To incorporate, add the following to the fuel/application/config/hooks.php:
// include hooks specific to FUEL include(SEARCH_PATH.'config/search_hooks.php');
TIP: The search module provides the the added bonus of sniffing out bad URLs.
Search Configuration
The following configuration parameters can be found in the modules/search/config/search.php configuration file. It is recommended that you copy the config file and place it in your fuel/application/config directory which will override the defaults and make it easier for future updates.
Property | Default Value | Description |
---|---|---|
base_url |
'' |
the base url in which to look. By default it will assume the same value as site_url() |
indexing_enabled |
TRUE |
whether to enable search indexing |
user_agent |
'FUEL' |
the user agent used when indexing |
query_type |
'match boolean' |
value can be either "like" which will do a %word% query, "match" which will use the MATCH / AGAINST syntax OR "match boolean" which will do a match against in boolean mode. Use "match boolean" OR "like" if you have a small number of records. |
delimiters |
array( '<div id="main">', 'meta[@name="keywords"]/@content', ) |
search page content delimiters. used for scraping page content. Can be an HTML node or xpath syntax (e.g. div[@id="main"]) |
title_tag |
array('title', 'h1') |
search page title tag (e.g. "title, h1") |
excerpt_tag |
array('p', 'meta[@name="description"]/@content') |
search page for appropriate tag to save as excerpt tag (e.g. "p", "meta[@name="description"]/@content") |
language_tag |
array('html[@lang]/@lang') |
search page for appropriate language value using the meta values or html lang attrubute (e.g. "p", "html[@lang]") |
exclude |
array() |
the URI locations of pages to exclude from the index. You can also add them to the "robots.txt" file for your site |
index_method |
'crawl' |
can be AUTO, "crawl" or "sitemap" crawl, will scan the site for local links to index sitemap will use the sitemap if it exists AUTO will first check the sitemap (because it's faster), then will default to the crawl |
index_modules |
TRUE |
whether to automatically index modules that have a preview_path specified. Default is TRUE and will automatically do it for all modules. If an array is specified, then it will only index those in the array |
view |
'search' |
the view file to use to display the results. An array can be used to point to a different module (e.g. array('my_module' => 'search')) |
min_length_search |
3 |
minimum length of the search term |
depth |
0 |
maximum search depth. 0 means there is no limit |
user_tmp_table |
0 |
use a temp table to store crawled information before switching to live table |
pagination |
array( 'per_page' => 10, 'num_links' => 2, 'prev_link' => lang('search_prev_page'), 'next_link' => lang('search_next_page'), 'first_link' => lang('search_first_link'), 'last_link' => lang('search_last_link'), ) |
pagination |