wondering if there is any plugin for searching chinese

Creating and modifying plugins.
Post Reply
tianyi
Regular
Posts: 18
Joined: Thu Sep 11, 2008 11:02 am

wondering if there is any plugin for searching chinese

Post by tianyi »

I am a user from China, and I need a plugin for searching chinese in my blog. Can anyone develop a plugin like this?
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Re: wondering if there is any plugin for searching chinese

Post by judebert »

It was my impression that the Quicksearch plugin would search your blog regardless of language, as long as your database and blog use the same character encoding. Are you actually having problems with searching?
Judebert
---
Website | Wishlist | PayPal
tianyi
Regular
Posts: 18
Joined: Thu Sep 11, 2008 11:02 am

Re: wondering if there is any plugin for searching chinese

Post by tianyi »

judebert wrote:It was my impression that the Quicksearch plugin would search your blog regardless of language, as long as your database and blog use the same character encoding. Are you actually having problems with searching?
Yes, I do have the difficulties in searching. xu-kaidotcom is my own blog, at the right column of the blog is the quick search plugin. However, can not search any chinese character. It is an problem since long ago. I guess that nobody came here to report this problem, but modify the codes.

/include/functions_entries.inc.php

Code: Select all

if (preg_match('@["+-*~<>()]+@', $term)) {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term' IN BOOLEAN MODE)";
} else {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term')";
} 
change to

Code: Select all

if(preg_match("/^[x80-xff]+$/", $term))
{
$cond['find_part'] = "((e.title LIKE ('%" . addslashes($term) . "%')) or (e.body LIKE ('%" . addslashes($term) . "%')) or (e.extended LIKE ('%" . addslashes($term) . "%')))";
}
else
{
if (preg_match('@["+-*~<>()]+@', $term)) {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term' IN BOOLEAN MODE)";
} else {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term')";
}
} 
tianyi
Regular
Posts: 18
Joined: Thu Sep 11, 2008 11:02 am

Re: wondering if there is any plugin for searching chinese

Post by tianyi »

This method is available but not a good way.
Modifying the code is not convinient and the result is not exactly what i want.
I hope that there is a plugin to solve this problem
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: wondering if there is any plugin for searching chinese

Post by garvinhicking »

Hi!

Are you using the most recent MySQL version? This one should have charset support, and if your database is in UTF-8 format, your chinese characters should be included there and be no different than german umlauts?!

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
tianyi
Regular
Posts: 18
Joined: Thu Sep 11, 2008 11:02 am

Re: wondering if there is any plugin for searching chinese

Post by tianyi »

garvinhicking wrote:Hi!

Are you using the most recent MySQL version? This one should have charset support, and if your database is in UTF-8 format, your chinese characters should be included there and be no different than german umlauts?!

Regards,
Garvin
Mysql Version: 5.0.51a-community
I think it is a new one, thought not a latest one. :shock:
tianyi
Regular
Posts: 18
Joined: Thu Sep 11, 2008 11:02 am

Re: wondering if there is any plugin for searching chinese

Post by tianyi »

Hello, anyone to answer my question? :mrgreen:
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: wondering if there is any plugin for searching chinese

Post by garvinhicking »

Hi!
tianyi wrote:Hello, anyone to answer my question? :mrgreen:
Sadly not; only that quicksearch should work with MATCH AGAINST. Maybe mysql specific forums could help here.

Sadly I do not know any chinese or japanese, so I have no way of testing this...

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Re: wondering if there is any plugin for searching chinese

Post by judebert »

Perhaps the obvious question: Are your blog and your database both using UTF-8?
Judebert
---
Website | Wishlist | PayPal
tianyi
Regular
Posts: 18
Joined: Thu Sep 11, 2008 11:02 am

Re: wondering if there is any plugin for searching chinese

Post by tianyi »

judebert wrote:Perhaps the obvious question: Are your blog and your database both using UTF-8?
mysqlutf8.png
mysqlutf8.png (1.59 KiB) Viewed 7031 times
ayamico
Posts: 1
Joined: Wed Feb 03, 2010 2:07 pm

Re: wondering if there is any plugin for searching chinese

Post by ayamico »

I have been using serendipity for 5 years. Chinese searching has never worked. I decided to do some searching on why it does not work.

It looks like it is because mySQL full text search mode cannot support Chinese / Japanese character searching.

What I have found is "For FULLTEXT searches, we need to know where words begin and end. With Western languages, this is rarely a problem because most (if not all) of these use an easy-to-identify word boundary — the space character. However, this is not usually the case with Asian writing. We could use arbitrary halfway measures, like assuming that all Han characters represent words, or (for Japanese) depending on changes from Katakana to Hiragana due to grammatical endings. However, the only sure solution requires a comprehensive word list, which means that we would have to include a dictionary in the server for each Asian language supported."

http://blogs.sun.com/soapbox/entry/full ... uages_with

I can understand now why the following code modification is made

Code: Select all

if(preg_match("/^[x80-xff]+$/", $term))
{
$cond['find_part'] = "((e.title LIKE ('%" . addslashes($term) . "%')) or (e.body LIKE ('%" . addslashes($term) . "%')) or (e.extended LIKE ('%" . addslashes($term) . "%')))";
}
else
{
if (preg_match('@["+-*~<>()]+@', $term)) {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term' IN BOOLEAN MODE)";
} else {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term')";
}
} 
Looking at the current serendipity codes, the above changes cannot be made in a plugin.

Any suggestion ?
Post Reply