wondering if there is any plugin for searching chinese
wondering if there is any plugin for searching chinese
I am a user from China, and I need a plugin for searching chinese in my blog. Can anyone develop a plugin like this?
Re: wondering if there is any plugin for searching chinese
It was my impression that the Quicksearch plugin would search your blog regardless of language, as long as your database and blog use the same character encoding. Are you actually having problems with searching?
Re: wondering if there is any plugin for searching chinese
Yes, I do have the difficulties in searching. xu-kaidotcom is my own blog, at the right column of the blog is the quick search plugin. However, can not search any chinese character. It is an problem since long ago. I guess that nobody came here to report this problem, but modify the codes.judebert wrote:It was my impression that the Quicksearch plugin would search your blog regardless of language, as long as your database and blog use the same character encoding. Are you actually having problems with searching?
/include/functions_entries.inc.php
Code: Select all
if (preg_match('@["+-*~<>()]+@', $term)) {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term' IN BOOLEAN MODE)";
} else {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term')";
}
Code: Select all
if(preg_match("/^[x80-xff]+$/", $term))
{
$cond['find_part'] = "((e.title LIKE ('%" . addslashes($term) . "%')) or (e.body LIKE ('%" . addslashes($term) . "%')) or (e.extended LIKE ('%" . addslashes($term) . "%')))";
}
else
{
if (preg_match('@["+-*~<>()]+@', $term)) {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term' IN BOOLEAN MODE)";
} else {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term')";
}
}
Re: wondering if there is any plugin for searching chinese
This method is available but not a good way.
Modifying the code is not convinient and the result is not exactly what i want.
I hope that there is a plugin to solve this problem
Modifying the code is not convinient and the result is not exactly what i want.
I hope that there is a plugin to solve this problem
-
- Core Developer
- Posts: 30022
- Joined: Tue Sep 16, 2003 9:45 pm
- Location: Cologne, Germany
- Contact:
Re: wondering if there is any plugin for searching chinese
Hi!
Are you using the most recent MySQL version? This one should have charset support, and if your database is in UTF-8 format, your chinese characters should be included there and be no different than german umlauts?!
Regards,
Garvin
Are you using the most recent MySQL version? This one should have charset support, and if your database is in UTF-8 format, your chinese characters should be included there and be no different than german umlauts?!
Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
Re: wondering if there is any plugin for searching chinese
Mysql Version: 5.0.51a-communitygarvinhicking wrote:Hi!
Are you using the most recent MySQL version? This one should have charset support, and if your database is in UTF-8 format, your chinese characters should be included there and be no different than german umlauts?!
Regards,
Garvin
I think it is a new one, thought not a latest one.
Re: wondering if there is any plugin for searching chinese
Hello, anyone to answer my question?
-
- Core Developer
- Posts: 30022
- Joined: Tue Sep 16, 2003 9:45 pm
- Location: Cologne, Germany
- Contact:
Re: wondering if there is any plugin for searching chinese
Hi!
Sadly I do not know any chinese or japanese, so I have no way of testing this...
Regards,
Garvin
Sadly not; only that quicksearch should work with MATCH AGAINST. Maybe mysql specific forums could help here.tianyi wrote:Hello, anyone to answer my question?
Sadly I do not know any chinese or japanese, so I have no way of testing this...
Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
Re: wondering if there is any plugin for searching chinese
Perhaps the obvious question: Are your blog and your database both using UTF-8?
Re: wondering if there is any plugin for searching chinese
judebert wrote:Perhaps the obvious question: Are your blog and your database both using UTF-8?
Re: wondering if there is any plugin for searching chinese
I have been using serendipity for 5 years. Chinese searching has never worked. I decided to do some searching on why it does not work.
It looks like it is because mySQL full text search mode cannot support Chinese / Japanese character searching.
What I have found is "For FULLTEXT searches, we need to know where words begin and end. With Western languages, this is rarely a problem because most (if not all) of these use an easy-to-identify word boundary — the space character. However, this is not usually the case with Asian writing. We could use arbitrary halfway measures, like assuming that all Han characters represent words, or (for Japanese) depending on changes from Katakana to Hiragana due to grammatical endings. However, the only sure solution requires a comprehensive word list, which means that we would have to include a dictionary in the server for each Asian language supported."
http://blogs.sun.com/soapbox/entry/full ... uages_with
I can understand now why the following code modification is made
Looking at the current serendipity codes, the above changes cannot be made in a plugin.
Any suggestion ?
It looks like it is because mySQL full text search mode cannot support Chinese / Japanese character searching.
What I have found is "For FULLTEXT searches, we need to know where words begin and end. With Western languages, this is rarely a problem because most (if not all) of these use an easy-to-identify word boundary — the space character. However, this is not usually the case with Asian writing. We could use arbitrary halfway measures, like assuming that all Han characters represent words, or (for Japanese) depending on changes from Katakana to Hiragana due to grammatical endings. However, the only sure solution requires a comprehensive word list, which means that we would have to include a dictionary in the server for each Asian language supported."
http://blogs.sun.com/soapbox/entry/full ... uages_with
I can understand now why the following code modification is made
Code: Select all
if(preg_match("/^[x80-xff]+$/", $term))
{
$cond['find_part'] = "((e.title LIKE ('%" . addslashes($term) . "%')) or (e.body LIKE ('%" . addslashes($term) . "%')) or (e.extended LIKE ('%" . addslashes($term) . "%')))";
}
else
{
if (preg_match('@["+-*~<>()]+@', $term)) {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term' IN BOOLEAN MODE)";
} else {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term')";
}
}
Any suggestion ?