4.3. Storage mode

Here we explain how to use storage mode of Mroonga

4.3.2. How to get search score

Note

In version 1.0.0 or before, Mroonga used a special column named _score to get search score. From version 1.0.0, it follows MySQL’s standard way to get search score.

We often want to display more relevant results first in full text search. We use search score in such case.

We can get search score by MySQL’s standard way [1], i.e. we use MATCH…AGAINST in one of columns in SELECT or ORDER BY.

Let’s try:

INSERT INTO diaries (content) VALUES ("It's fine today. It'll be fine tomorrow as well.");
-- Query OK, 1 row affected (0.00 sec)

INSERT INTO diaries (content) VALUES ("It's fine today. But it'll rain tomorrow.");
-- Query OK, 1 row affected (0.00 sec)

SELECT *,
       MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE)
  FROM diaries
 WHERE MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE)
 ORDER BY
       MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) DESC;
-- +----+--------------------------------------------------+---------------------------------------------------+
-- | id | content                                          | MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) |
-- +----+--------------------------------------------------+---------------------------------------------------+
-- |  3 | It's fine today. It'll be fine tomorrow as well. |                                                 2 |
-- |  1 | It'll be fine tomorrow.                          |                                                 1 |
-- |  4 | It's fine today. But it'll rain tomorrow.        |                                                 1 |
-- +----+--------------------------------------------------+---------------------------------------------------+
-- 3 rows in set (0.00 sec)

The result having the search word fine more, i.e. id = 3 message having the higher search score, is displayed first. And you also get search score by using MATCH AGAINST in SELECT phrase.

You can use AS to change the attribute name:

SELECT *,
       MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) AS score
  FROM diaries
 WHERE MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE)
 ORDER BY
       MATCH (content) AGAINST ("+fine" IN BOOLEAN MODE) DESC;
-- +----+--------------------------------------------------+-------+
-- | id | content                                          | score |
-- +----+--------------------------------------------------+-------+
-- |  3 | It's fine today. It'll be fine tomorrow as well. |     2 |
-- |  1 | It'll be fine tomorrow.                          |     1 |
-- |  4 | It's fine today. But it'll rain tomorrow.        |     1 |
-- +----+--------------------------------------------------+-------+
-- 3 rows in set (0.00 sec)

4.3.4. How to specify the normalizer

Mroonga uses normalizer corresponding to the encoding of document. It is used when tokenizing text and storing table key.

It is used NormalizerMySQLGeneralCI normalizer when the encoding is utf8_general_ci or utf8mb4_general_ci.

It is used NormalizerMySQLUnicodeCI normalizer when the encoding is utf8_unicode_ci or utf8mb4_unicode_ci.

It isn’t used normalizer when the encoding is utf8_bin.

Here is an example that uses NormalizerMySQLUnicodeCI normalizer by specifying utf8_unicode_ci:

SET NAMES utf8;
-- Query OK, 0 rows affected (0.00 sec)

CREATE TABLE diaries (
  day DATE PRIMARY KEY,
  content VARCHAR(64) NOT NULL,
  FULLTEXT INDEX (content)
) Engine=Mroonga DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
-- Query OK, 0 rows affected (0.18 sec)

INSERT INTO diaries VALUES ("2013-04-23", "ブラックコーヒーを飲んだ。");
-- Query OK, 1 row affected (0.00 sec)

SELECT *
  FROM diaries
 WHERE MATCH (content) AGAINST ("+ふらつく" IN BOOLEAN MODE);
-- +------------+-----------------------------------------+
-- | day        | content                                 |
-- +------------+-----------------------------------------+
-- | 2013-04-23 | ブラックコーヒーを飲んだ。 |
-- +------------+-----------------------------------------+
-- 1 row in set (0.00 sec)

SELECT *
  FROM diaries
 WHERE MATCH (content) AGAINST ("+ブラック" IN BOOLEAN MODE);
-- +------------+-----------------------------------------+
-- | day        | content                                 |
-- +------------+-----------------------------------------+
-- | 2013-04-23 | ブラックコーヒーを飲んだ。 |
-- +------------+-----------------------------------------+
-- 1 row in set (0.00 sec)

Mroonga has the following syntax to specify Groonga’s normalizer:

FULLTEXT INDEX (content) COMMENT 'normalizer "NormalizerAuto"'

See Groonga’s documentation about normalizer for more details.

Here is an example that uses NormalizerAuto normalizer:

SET NAMES utf8;
-- Query OK, 0 rows affected (0.00 sec)

CREATE TABLE diaries (
  day DATE PRIMARY KEY,
  content VARCHAR(64) NOT NULL,
  FULLTEXT INDEX (content) COMMENT 'normalizer "NormalizerAuto"'
) Engine=Mroonga DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
-- Query OK, 0 rows affected (0.19 sec)

INSERT INTO diaries VALUES ("2013-04-23", "ブラックコーヒーを飲んだ。");
-- Query OK, 1 row affected (0.00 sec)

SELECT *
  FROM diaries
 WHERE MATCH (content) AGAINST ("+ふらつく" IN BOOLEAN MODE);
-- Empty set (0.00 sec)

SELECT *
  FROM diaries
 WHERE MATCH (content) AGAINST ("+ブラック" IN BOOLEAN MODE);
-- +------------+-----------------------------------------+
-- | day        | content                                 |
-- +------------+-----------------------------------------+
-- | 2013-04-23 | ブラックコーヒーを飲んだ。 |
-- +------------+-----------------------------------------+
-- 1 row in set (0.00 sec)

4.3.5. How to specify the token filters

Mroonga has the following syntax to specify Groonga’s token filters.:

FULLTEXT INDEX (content) COMMENT 'token_filters "TokenFilterStem"'

Here is an example that uses TokenFilterStem token filter:

SELECT mroonga_command('register token_filters/stem');
-- +------------------------------------------------+
-- | mroonga_command('register token_filters/stem') |
-- +------------------------------------------------+
-- | true                                           |
-- +------------------------------------------------+
-- 1 row in set (0.00 sec)

CREATE TABLE memos (
  id INT NOT NULL PRIMARY KEY,
  content TEXT NOT NULL,
  FULLTEXT INDEX (content) COMMENT 'normalizer "NormalizerAuto", token_filters "TokenFilterStem"'
) Engine=Mroonga DEFAULT CHARSET=utf8;
-- Query OK, 0 rows affected (0.18 sec)

INSERT INTO memos VALUES (1, "I develop Groonga");
-- Query OK, 1 row affected (0.00 sec)

INSERT INTO memos VALUES (2, "I'm developing Groonga");
-- Query OK, 1 row affected (0.00 sec)

INSERT INTO memos VALUES (3, "I developed Groonga");
-- Query OK, 1 row affected (0.00 sec)

SELECT *
  FROM memos
 WHERE MATCH (content) AGAINST ("+develops" IN BOOLEAN MODE);
-- +----+------------------------+
-- | id | content                |
-- +----+------------------------+
-- |  1 | I develop Groonga      |
-- |  2 | I'm developing Groonga |
-- |  3 | I developed Groonga    |
-- +----+------------------------+
-- 3 rows in set (0.01 sec)

See Groonga’s documentation about token filter documentation for more details.

Here is an example that uses TokenFilterStopWord token filter:

SELECT mroonga_command("register token_filters/stop_word");
-- +-----------------------------------------------------+
-- | mroonga_command("register token_filters/stop_word") |
-- +-----------------------------------------------------+
-- | true                                                |
-- +-----------------------------------------------------+
-- 1 row in set (0.00 sec)

CREATE TABLE terms (
  term VARCHAR(64) NOT NULL PRIMARY KEY,
  is_stop_word BOOL NOT NULL
) Engine=Mroonga COMMENT='tokenizer "TokenBigram", token_filters "TokenFilterStopWord"' DEFAULT CHARSET=utf8;
-- Query OK, 0 rows affected (0.12 sec)

CREATE TABLE memos (
  id INT NOT NULL PRIMARY KEY,
  content TEXT NOT NULL,
  FULLTEXT INDEX (content) COMMENT 'table "terms"'
) Engine=Mroonga DEFAULT CHARSET=utf8;
-- Query OK, 0 rows affected (0.17 sec)

INSERT INTO terms VALUES ("and", true);
-- Query OK, 1 row affected (0.00 sec)

INSERT INTO memos VALUES (1, "Hello");
-- Query OK, 1 row affected (0.00 sec)

INSERT INTO memos VALUES (2, "Hello and Good-bye");
-- Query OK, 1 row affected (0.00 sec)

INSERT INTO memos VALUES (3, "Good-bye");
-- Query OK, 1 row affected (0.00 sec)

SELECT *
  FROM memos
 WHERE MATCH (content) AGAINST ('+"Hello and"' IN BOOLEAN MODE);
-- +----+--------------------+
-- | id | content            |
-- +----+--------------------+
-- |  1 | Hello              |
-- |  2 | Hello and Good-bye |
-- +----+--------------------+
-- 2 rows in set (0.01 sec)

It’s used that specifying the lexicon table for fulltext search.

4.3.6. How to specify Groonga’s column flags

Mroonga has the following syntax to specify Groonga’s column flags:

content TEXT COMMENT 'flags "COLUMN_SCALAR|COMPRESS_ZLIB"'

Here is an example that uses COMPRESS_ZLIB flag:

CREATE TABLE entries (
  id INT UNSIGNED PRIMARY KEY,
  content TEXT COMMENT 'flags "COLUMN_SCALAR|COMPRESS_ZLIB"'
) Engine=Mroonga DEFAULT CHARSET=utf8;
-- Query OK, 0 rows affected (0.12 sec)

See Groonga’s documentation about column flags for more details.

4.3.8. How to get the record ID

Groonga assigns a unique number to identify the record when a record is added in the table.

To make the development of applications easier, you can get this record ID by SQL in Mroonga

To get the record ID, you need to create a column named _id when you create a table:

CREATE TABLE memos (
  _id INT,
  content VARCHAR(255),
  UNIQUE KEY (_id) USING HASH
) ENGINE = Mroonga;
-- Query OK, 0 rows affected (0.04 sec)

Tye typo of _id column should be integer one (TINYINT, SMALLINT, MEDIUMINT, INT or BIGINT).

You can create an index for _id column, but it should be HASH type.

Let’s add records in the table by INSERT. Since _id column is implemented as a virtual column and its value is assigned by Groonga, you cannot specify the value when updating. So you need to exclude it from setting columns, or you need to use null as its value:

INSERT INTO memos VALUES (null, "Saury for today's dinner.");
-- Query OK, 1 row affected (0.00 sec)

INSERT INTO memos VALUES (null, "Update mroonga tomorrow.");
-- Query OK, 1 row affected (0.00 sec)

INSERT INTO memos VALUES (null, "Buy some dumpling on the way home.");
-- Query OK, 1 row affected (0.00 sec)

INSERT INTO memos VALUES (null, "Thank God It's meat day.");
-- Query OK, 1 row affected (0.00 sec)

To get the record ID, you invoke SELECT with _id column:

SELECT * FROM memos;
-- +------+------------------------------------------+
-- | _id  | content                                  |
-- +------+------------------------------------------+
-- |    1 | Saury for today's dinner.                |
-- |    2 | Update mroonga tomorrow.                 |
-- |    3 | Buy some dumpling on the way home.       |
-- |    4 | Thank God It's meat day.                 |
-- +------+------------------------------------------+
-- 4 rows in set (0.00 sec)

By using last_insert_grn_id function, you can also get the record ID that is assigned by the last INSERT:

INSERT INTO memos VALUES (null, "Just one bottle of milk in the fridge.");
-- Query OK, 1 row affected (0.00 sec)

SELECT last_insert_grn_id();
-- +----------------------+
-- | last_insert_grn_id() |
-- +----------------------+
-- |                    5 |
-- +----------------------+
-- 1 row in set (0.00 sec)

last_insert_grn_id function is included in Mroonga as a User-Defined Function (UDF), but if you have not yet register it in MySQL by CREATE FUNCTION, you need to invoke the following SQL for defining a function:

CREATE FUNCTION last_insert_grn_id RETURNS INTEGER SONAME 'ha_mroonga.so';

As you can see in the example above, you can get the record ID by _id column or last_insert_grn_id function. It will be useful to use this value in the ensuing SQL queries like UPDATE:

UPDATE memos SET content = "So much milk in the fridge." WHERE _id = last_insert_grn_id();
-- Query OK, 1 row affected (0.00 sec)
-- Rows matched: 1  Changed: 1  Warnings: 0

4.3.9. How to get snippet (Keyword in context)

Mroonga provides functionality to get keyword in context. It is implemented as mroonga_snippet() UDF.

4.3.11. How to run Groonga command

In storage mode, Mroonga stores all your data into Groonga database. You can access Groonga database by SQL with Mroonga. SQL is very powerful but it is not good for some operations such as faceted search.

Faceted search is popular recently. Many online shopping sites such as amazon.com and ebay.com support faceted search. Faceted search refines the current search by available search parameters before users refine their search. And faceted search shows refined searches. Users just select a refined search. Users benefit from faceted search:

  • Users don’t need to think about how to refine their search. Users just select a showed refined search.

  • Users don’t get boared “not match” page. Faceted search showes only refined searches that has one or more matched items.

Faceted search needs multiple GROUP BY operations against searched result set. To do faceted search by SQL, multiple SELECT requests are needed. It is not effective.

Groonga can do faceted search by only one groonga command. It is effective. Groonga has the select command that can search records with faceted search. Faceted search is called as “drilldown” in Groonga. See Groonga’s documentation about select command for more details.

Mroonga provides mroonga_command() function. You can run Groonga command in SQL by the function. But you should use only select command. Other commands that change schema or data may break consistency.

Here is the schema definition for execution examples:

CREATE TABLE diaries (
  id INT PRIMARY KEY AUTO_INCREMENT,
  content VARCHAR(255),
  date DATE,
  year YEAR,
  `year_month` VARCHAR(9),
  tag VARCHAR(32),
  FULLTEXT INDEX (content)
) ENGINE = Mroonga DEFAULT CHARSET utf8;

Here is the sample data for execution examples:

INSERT INTO diaries (content, date, year, `year_month`, tag)
       VALUES ('Groonga is an open-source fulltext search engine and column store.',
               '2013-04-08',
               '2013',
               '2013-04',
               'groonga');
INSERT INTO diaries (content, date, year, `year_month`, tag)
       VALUES ('Mroonga is an open-source storage engine for fast fulltext search with MySQL.',
               '2013-04-09',
               '2013',
               '2013-04',
               'MySQL');
INSERT INTO diaries (content, date, year, `year_month`, tag)
       VALUES ('Tritonn is a patched version of MySQL that supports better fulltext search function with Senna.',
               '2013-03-29',
               '2013',
               '2013-03',
               'MySQL');

Each record has groonga or MySQL as tag. Each record also has year and year_month. You can use tag, year and year_month as faceted search keys.

Groonga calls faceted search as drilldown. So parameter key in Groonga is --drilldown. Groonga returns search result as JSON. So mroonga_command() also returns search result as JSON. It is not SQL friendly. You need to parse search result JSON by yourself.

Here is the example of faceted search by all available faceted search keys (result JSON is pretty printed):

SELECT mroonga_command("select diaries --output_columns _id --limit 0 --drilldown tag,year,year_month") AS faceted_result;
-- +-----------------------------+
-- | faceted_result              |
-- +-----------------------------+
-- | [[[3],                      |
-- |   [["_id","UInt32"]]],      |
-- |  [[2],                      |
-- |   [["_key","ShortText"],    |
-- |    ["_nsubrecs","Int32"]],  |
-- |   ["groonga",1],            |
-- |   ["MySQL",2]],             |
-- |  [[1],                      |
-- |   [["_key","Time"],         |
-- |    ["_nsubrecs","Int32"]],  |
-- |   [1356998400.0,3]],        |
-- |  [[2],                      |
-- |   [["_key","ShortText"],    |
-- |    ["_nsubrecs","Int32"]],  |
-- |   ["2013-04",2],            |
-- |   ["2013-03",1]]]           |
-- +-----------------------------+
-- 1 row in set (0.00 sec)

The first element [[3], [["_id","UInt32"]]] is normal search result. It’s not faceted search result.

The second, third and forth elements are faceted search results:

[[2],
 [["_key","ShortText"],
  ["_nsubrecs","Int32"]],
 ["groonga",1],
 ["MySQL",2]]
[[1],
 [["_key","Time"],
  ["_nsubrecs","Int32"]],
 [1356998400.0,3]]
[[2],
 [["_key","ShortText"],
  ["_nsubrecs","Int32"]],
 ["2013-04",2],
 ["2013-03",1]]

The order of faceted search results is corresponding to the value of --drilldown. In this example, we specified tag, year and year_month as --drilldown value. So the first faceted search result is for tag, the second one is for year and the third one is for year_month.

Each faceted search result uses the following format. This is the same as normal search result:

[[${THE_NUMBER_OF_RECORDS}],
 [[${OUTPUT_COLUMN_NAME_0}, ${OUTPUT_COLUMN_TYPE_0}],
  [${OUTPUT_COLUMN_NAME_1}, ${OUTPUT_COLUMN_TYPE_1}],
  ...,
  [${OUTPUT_COLUMN_NAME_N}, ${OUTPUT_COLUMN_TYPE_N}]]
 [${OUTPUT_COLUMN_VALUE_0_FOR_RECORD_0},
  ${OUTPUT_COLUMN_VALUE_1_FOR_RECORD_0},
  ...,
  ${OUTPUT_COLUMN_VALUE_N_FOR_RECORD_0}],
 [${OUTPUT_COLUMN_VALUE_0_FOR_RECORD_1},
  ${OUTPUT_COLUMN_VALUE_1_FOR_RECORD_1},
  ...,
  ${OUTPUT_COLUMN_VALUE_N_FOR_RECORD_1}],
 ...
 [${OUTPUT_COLUMN_VALUE_0_FOR_RECORD_M},
  ${OUTPUT_COLUMN_VALUE_1_FOR_RECORD_M},
  ...,
  ${OUTPUT_COLUMN_VALUE_N_FOR_RECORD_M}]]

The _key column value in faceted search result shows faceted search key.

For example, the first faceted search result (the faceted search result for tag) shows that matched records (all records in this case) have groonga and MySQL as tag value:

[[2],
 [["_key","ShortText"],
  ["_nsubrecs","Int32"]],
 ["groonga",1],
 ["MySQL",2]]

The _nsubrecs column value in faceted search result shows the number of records that have the corresponding faceted search key.

For example, the first faceted search result (the faceted search result for tag) shows that there are 1 record that has groonga as tag value and 2 records that have MySQL as tag value:

[[2],
 [["_key","ShortText"],
  ["_nsubrecs","Int32"]],
 ["groonga",1],
 ["MySQL",2]]

See Groonga’s documentation about select command for more details.

4.3.12. How to search by regular expression

In storage mode, you can use a Groonga’s functionality from Mroonga as described above. Thus, you can search records by using a regular expression via Groonga’s functionality.

There are some conditions to use regular expression in Mroonga.

  • Create an index with TokenRegexp tokenizer which is used in COMMENT

  • Use *SS pragma and @~ in WHERE MATCH ... AGAINST clause

Here is the example of search by regular expression.

CREATE TABLE paths (
  content text,
  FULLTEXT INDEX content_index (content) COMMENT 'tokenizer "TokenRegexp", normalizer "NormalizerAuto"'
) ENGINE=Mroonga DEFAULT CHARSET=utf8mb4;

INSERT INTO paths VALUES ('/usr/bin/groonga');
INSERT INTO paths VALUES ('/var/log/auth.log');
INSERT INTO paths VALUES ('/var/log/messages');
INSERT INTO paths VALUES ('/tmp/local/var/log/auth.log');

SELECT * FROM paths WHERE MATCH(content) AGAINST ('*SS content @~ "\\\\A/var/log/auth"' IN BOOLEAN MODE);
-- +-------------------+
-- | content           |
-- +-------------------+
-- | /var/log/auth.log |
-- +-------------------+
-- 1 row in set (0.024 sec)

By using *SS pragma, you can search the records which matches /var/log/auth.log with content @~ "\\\\A/var/log/auth". @~ is a Groonga’s operator which executes a regular expression search, and "\\\\A/var/log/auth" executes prefix search, so it matches to only /var/log/auth.log. /tmp/local/var/log/auth.log doesn’t match because it doesn’t begin with “/var/log/auth”.

See Groonga’s regular expression document for more syntax details.

4.3.13. Logging

Mroonga outputs the logs by default.

Log files are located in MySQL’s data directory with the filename groonga.log.

Here is the example of the log.

2010-10-07 17:32:39.209379|n|b1858f80|mroonga 1.10 started.
2010-10-07 17:32:44.934048|d|46953940|hash get not found (key=test)
2010-10-07 17:32:44.936113|d|46953940|hash put (key=test)

The default log level is NOTICE, i.e. we have important information only and we don’t have debug information etc.).

You can get the log level by mroonga_log_level system variable, that is a global variable. You can also modify it dynamically by using SET phrase:

SHOW VARIABLES LIKE 'mroonga_log_level';
-- +-------------------+--------+
-- | Variable_name     | Value  |
-- +-------------------+--------+
-- | mroonga_log_level | NOTICE |
-- +-------------------+--------+
-- 1 row in set (0.00 sec)

SET GLOBAL mroonga_log_level=DUMP;
-- Query OK, 0 rows affected (0.00 sec)

SHOW VARIABLES LIKE 'mroonga_log_level';
-- +-------------------+-------+
-- | Variable_name     | Value |
-- +-------------------+-------+
-- | mroonga_log_level | DUMP  |
-- +-------------------+-------+
-- 1 row in set (0.00 sec)

Available log levels are the followings.

  • NONE

  • EMERG

  • ALERT

  • CRIT

  • ERROR

  • WARNING

  • NOTICE

  • INFO

  • DEBUG

  • DUMP

See mroonga_log_level about details.

You can reopen the log file by FLUSH LOGS. If you want to rotate the log file without stopping MySQL server, you can do in the following procedure.

  1. change the file name of groonga.log (by using OS’s mv command etc.).

  2. invoke “FLUSH LOGS” in MySQL server (by mysql command or mysqladmin command).

4.3.14. Next step

Now, you can use Mroonga as storage mode! If you want Mroonga to be faster, see also Optimizations.

Footnotes