Mroonga blog

2013-04-29

Mroonga 3.03 has been released!

Mroonga 3.03 has been released!

Mroonga is a MySQL storage engine that supports fast fulltext search and geolocation search. It is CJK ready. It uses groonga as a storage and fulltext search engine.

How to install: Install

There are three topics for this release:

  • Official URL had been changed!
  • Supported Ubuntu 13.04 Raring Ringtail
  • Supported custom normalizer for FULLTEXT INDEX

Official URL had been changed!

The mroonga project uses mroonga.github.com as a official URL so far.

But we has moved to mroonga.org now.

If you access mroonga.github.com or mroonga.github.io, URL is just redirected to mroonga.org.

Supported Ubuntu 13.04 Raring Ringtail

Ubuntu 13.04 Raring Ringtail had been released Apr 25, 2013.

This release began to support Ubuntu 13.04 by providing deb packages.

Supported custom normalizer for FULLTEXT INDEX

This release began to support custom normalizer for FULLTEXT INDEX.

Here is the syntax:

FULLTEXT INDEX (column) COMMENT 'normalizer "NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark"'

NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark is newly added MySQL compatible normalizer provided by groonga-normalizer-mysql.

It enable you to distinct voiced sound mark and semi-voiced sound mark. So you can raise the accuracy about search results when you use utf8_unicode_ci.

Let's try concrete example.

CREATE TABLE test (
  id int(11) NOT NULL AUTO_INCREMENT,
  content varchar(255) NOT NULL COLLATE 'utf8_unicode_ci',
  content2 varchar(255) NOT NULL COLLATE 'utf8_unicode_ci',
  PRIMARY KEY (id),
  FULLTEXT INDEX (content),
  FULLTEXT INDEX (content2) COMMENT 'normalizer "NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark"'
) ENGINE=mroonga DEFAULT CHARSET=utf8;

INSERT INTO test(content,content2) VALUES (_utf8 x'e38396e383a9e38383e382af', _utf8 x'e38396e383a9e38383e382af');
INSERT INTO test(content,content2) VALUES (_utf8 x'e381b5e38289e381a4e3818f', _utf8 x'e381b5e38289e381a4e3818f');

FULLTEXT INDEX is applied to content and content2 columns. The difference between content and content2 are custom normalizer is applied or not.

Note that 'e38396 e383a9 e38383 e382af' means 'black' in english, and 'e381b5e38289e381a4e3818f' means wobble in english.

Then search 'e38396 e383a9 e38383 e382af' and 'e381b5 e38289 e381a4 e3818f'.

SELECT * FROM test WHERE MATCH(content) AGAINST(_utf8 x'e38396e383a9e38383e382af' IN BOOLEAN MODE);

This query returns 'e38396 e383a9 e38383 e382af' and 'e381b5e38289e381a4e3818f' as the results.

SELECT * FROM test WHERE MATCH(content) AGAINST(_utf8 x'e381b5e38289e381a4e3818f' IN BOOLEAN MODE);

This query also returns 'e38396 e383a9 e38383 e382af' and 'e381b5e38289e381a4e3818f' as the results.

As you want to distinct 'e38396 e383a9 e38383 e382af' and 'e381b5 e38289 e381a4 e3818f', this is not what you want.

Then try to use custom normalizer.

SELECT * FROM test WHERE MATCH(content2) AGAINST(_utf8 x'e38396e383a9e38383e382af' IN BOOLEAN MODE);

This query returns 'e38396 e383a9 e38383 e382af' only.

SELECT * FROM test WHERE MATCH(content2) AGAINST(_utf8 x'e381b5e38289e381a4e3818f' IN BOOLEAN MODE);

This query returns 'e381b5 e38289 e381a4 e3818f' only.

This is the results what really you want.

By using custom normalizer, you can raise the accrary about the search results.

Conclusion

See Release 3.03 - 2013/04/29 about detailed changes since 3.02.

Let's search by mroonga!