AnimeSuki Forums

Register Forum Rules FAQ Members List Social Groups Search Today's Posts Mark Forums Read

Go Back   AnimeSuki Forum > General > Forum & Site Feedback

Notices

Reply
 
Thread Tools
Old 2009-11-26, 07:52   Link #21
npcomplete
Senior Member
 
 
Join Date: Dec 2008
Much better!! It was also very quick for me with no difference in search time. I have a few questions:

are all non-alphanumeric chars discarded? For example "ga-rei:zero" -> I was expecting this to only match threads like http://forums.animesuki.com/showthread.php?t=75903 in torrent submissions since I think that's the only place with that literal string containing the colon.

As far as the way you've configured it, would it be possible to leave that to the user by using special syntax? Like the above, if we want a more exact or literal match, can Sphinx be configured to do that by quoting? or using \char ?

And is possible to leave it up to the user to use wildcards or not? The substring matches, like you've configured it now, can be immensely useful when you need it (I can't imagine searching torrents on TT without it) but can be a pain when you have to sift through a lot of unrelated / unintended results.

Are there boolean operators? and parenthesis?

Finally, this may deal with vB I think: would it be possible to have an option to show the matching posts? useful for long threads of course. Normally I would work around this by finding which thread from the general search then search in the thread again.
npcomplete is offline   Reply With Quote
Old 2009-11-26, 10:47   Link #22
GHDpro
Administrator
*Administrator
 
 
Join Date: Jan 2001
Location: Netherlands
Age: 35
Quote:
Originally Posted by npcomplete View Post
Much better!! It was also very quick for me with no difference in search time. I have a few questions:
Yes, somehow Sphinx is able to perform pretty well even though the server is using 500MB swap (on Linux servers, any significant amount of swap space used usually kills performance).

Even when running on an underpowered server, Sphinx seems way faster than vBulletin's internal search for common keywords like "Gundam" (when searching the contents of posts; thread title search is fast).

Quote:
Originally Posted by npcomplete View Post
are all non-alphanumeric chars discarded? For example "ga-rei:zero" -> I was expecting this to only match threads like http://forums.animesuki.com/showthread.php?t=75903 in torrent submissions since I think that's the only place with that literal string containing the colon.
By default most non-alphanumeric characters are discarded. I've added the colon (":") to the list of characters that may be considered part of a word. This means the search you tried will probably work once the whole search index is rebuild, which will take a few hours.

Quote:
Originally Posted by npcomplete View Post
As far as the way you've configured it, would it be possible to leave that to the user by using special syntax? Like the above, if we want a more exact or literal match, can Sphinx be configured to do that by quoting? or using \char ?
As far as I know this is not possible. I need to specify which characters are considered part of words and which are not (and discarded) in the configuration file manually. I'm not sure if it's a good idea to put every single possible character in that list, as it would cause the index to grow significantly.

Quote:
Originally Posted by npcomplete View Post
And is possible to leave it up to the user to use wildcards or not? The substring matches, like you've configured it now, can be immensely useful when you need it (I can't imagine searching torrents on TT without it) but can be a pain when you have to sift through a lot of unrelated / unintended results.
I'm not sure. The extended search syntax (not yet enabled) does support an "exact form modifier" ("="), but that requires an extra option to be enabled that also will need a search index rebuild. I'll test again when it's done.

Quote:
Originally Posted by npcomplete View Post
Are there boolean operators? and parenthesis?
Yes, using boolean or extended search syntax. The search is currently configured to use neither.

The reason why I tried to avoid these search options was because I initially experimented with Sphinx on the AnimeSuki v3 Beta site at a time when I had not yet added hyphens to the list of characters that should be considered part of words. As a result, when I tried searching for "ga-rei zero" it was interpreted as "ga AND NOT rei AND zero" and yielded completely different results from what I was expecting.

But I just tested (using the console) and in boolean or extended mode "ga-rei" finds the appropriate threads just fine.

Quote:
Originally Posted by npcomplete View Post
Finally, this may deal with vB I think: would it be possible to have an option to show the matching posts? useful for long threads of course. Normally I would work around this by finding which thread from the general search then search in the thread again.
There is a radio button below the search box that says "Show Threads" and "Show Posts". If you tick the latter, the search should return posts rather than whole threads.

But note: without extended mode enabled, any search for keywords that can be found in the thread title will match ALL posts from that thread.

Anyway, the search index is rebuilding now, after that's done I'll try enabling extended matching mode.
GHDpro is offline   Reply With Quote
Old 2009-11-26, 11:39   Link #23
GHDpro
Administrator
*Administrator
 
 
Join Date: Jan 2001
Location: Netherlands
Age: 35
I just did a little testing, and exact word matching is probably not going to work very well.

If this forum was just about 100% English language topics without much ambiguity regarding exact spelling of titles and phrases having to manually specify wildcards isn't much of a problem.

But (for example) I want both "ga-rei" and "ga rei" to match all appropriate threads. That only works well with automatic wildcards, where "ga" and "rei" simply match the keyword "ga-rei" as much as the whole keyword does.

I've now enabled extended syntax. However the index is still being rebuild for the colon, so that won't work yet.

Previously in this thread you mentioned having trouble finding "ga-rei" in the Anime DVD sales thread. Now try the following query: ga-rei @title dvd sale and select "Show Posts". Works perfect for me.

NOTE!!! As the search index is still being rebuild, the field @body is still called "pagetext" so replace @body with @pagetext in the query above if you try it within 1-2 hours of the time of this post

Edit: for some reason field renaming didn't work as expected. So post body is still called "@pagetext".

Last edited by GHDpro; 2009-11-27 at 03:08.
GHDpro is offline   Reply With Quote
Old 2009-11-27, 16:24   Link #24
Dr. Casey
Senior Member
 
 
Join Date: Nov 2007
Location: Tennessee
Age: 27
Would it be possible to have full posts show up in search results, rather than just a line and a half or so as a preview?
Dr. Casey is offline   Reply With Quote
Old 2009-11-27, 17:55   Link #25
DragoZERO
Spoilaphobic
*IT Support
 
 
Join Date: Jan 2009
Location: USA
Age: 28
Quote:
Originally Posted by Dr. Casey View Post
Would it be possible to have full posts show up in search results, rather than just a line and a half or so as a preview?
That would be horrible. It would take up way too much space plus you may not even find what you want. I think if the search results opened in a new window, leaving the original search intact, would be a good addition.
DragoZERO is offline   Reply With Quote
Old 2009-11-27, 18:21   Link #26
Dr. Casey
Senior Member
 
 
Join Date: Nov 2007
Location: Tennessee
Age: 27
I think it would be a lot better, personally - quite a few forums do use that system and I find them much more convenient. It would be nice to be able to search for a certain specific word or term and then just have the full posts right there, no opening a post in a new tab and going 'Yeah, this post doesn't have what I'm looking for.' The fact that multiple posts can all have the exact same preview also contributes to the tedium; here's an example of this from the Code Geass forum, with four posts on this page having 'I haven't been following this conversation'... as the preview. And I really don't think the pages would be too long. The average AnimeSuki post isn't exactly lengthy, and the only ones that would be problematic are generally hidden behind spoiler tags (Fanfics being the prime offender). With 20 results on a page, a page of search results would basically be the exact same thing as a page from a forum thread, which to me wouldn't be too much at all.
Dr. Casey is offline   Reply With Quote
Old 2009-11-27, 21:30   Link #27
felix
sleepyhead
*Author
 
 
Join Date: Dec 2005
Location: event horizon
Quote:
Originally Posted by DragoZERO View Post
I think if the search results opened in a new window, leaving the original search intact, would be a good addition.
Ctrl+Shift when clicking anything such as a link or button will open in new window; forms are no exception.
__________________
felix is offline   Reply With Quote
Old 2009-11-27, 22:45   Link #28
DragoZERO
Spoilaphobic
*IT Support
 
 
Join Date: Jan 2009
Location: USA
Age: 28
Quote:
Originally Posted by Cats View Post
Ctrl+Shift when clicking anything such as a link or button will open in new window; forms are no exception.
I right click and open in new tab on my own already. I was speaking for other people.
DragoZERO is offline   Reply With Quote
Old 2009-11-28, 02:22   Link #29
GHDpro
Administrator
*Administrator
 
 
Join Date: Jan 2001
Location: Netherlands
Age: 35
FYI: my recent modification only has effect on the search results. That is, it hands over an ordered list of thread or post numbers to vBulletin, which then retrieves the appropriate info from the database to display the results.

In other words: any change on how the results are displayed would require a different kind of modification. And IMHO, I don't see the point, there is not much wrong with how the results are displayed.
GHDpro is offline   Reply With Quote
Old 2009-11-28, 19:45   Link #30
SeijiSensei
AS Oji-kun
 
 
Join Date: Nov 2006
Location: Mucking about
Age: 64
I'd just like to thank you for introducing me to Sphinx Search. I've already replaced the engine I was building for my client with it. I'd give you more rep, but I can't do so at the moment!

How do you pass the results to vBulletin? I build a temporary database table with the document id's then do a LEFT JOIN to select the results. This seemed the simplest method to me; do you take a different route?
__________________
SeijiSensei is offline   Reply With Quote
Old 2009-11-29, 04:26   Link #31
GHDpro
Administrator
*Administrator
 
 
Join Date: Jan 2001
Location: Netherlands
Age: 35
For caching and possibly sharing search results, vBulletin stores information (and an orderded list of thread or post ID numbers) in a row in the "vb3_search" table. My code simply does the same, except it gets the list from Sphinx. Also, my code works like a plugin although for convenience all I do in the plugin is include the script I wrote.

Right now I created my own "do" hook into search.php, but I'm thinking about expanding the functionality so that it can handle most features of the "Advanced Search" as well and the hijack the main "process" hook, completely replacing the old vBulletin search.

And yes, where ever I need a good search functionality in any of my projects I'm now inclined to use Sphinx as well. AnimeSuki v3 will use it and so does Fanzub. There is just no point in building a search function myself when Sphinx can do it better and way faster.
GHDpro is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 01:30.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.
We use Silk.