View Single Post
Old 2011-09-08, 06:25   Link #42
GHDpro
Administrator
*Administrator
 
 
Join Date: Jan 2001
Location: Netherlands
Age: 45
A little update about Fanzub.com:

I'm currently working on a major rewrite of the site. Some features I hope to implement are:
- Counting of NZB downloads (preparation for a "Popular Downloads" feature)
- Caching of NZB files (no more time-outs on huge NZB files)
- Allow full browsing of all categories with paginating (so you can view more than just recent entries)
- Ability to download multiple files as one NZB (saves some clicking)

However the majority of the work is actually "under the hood" rather than visible: not just some of the code is getting rewritten, but the entire database is being reconstructed as well. The reason for this is that the current site has poor article matching, causing lots of duplicate articles in the current database. As a result there are lots of duplicate posts as well.

In addition to better article matching I've also written a far more powerful post "scanner" (script that tries to group a number of articles into one "post"). The old one was fairly simple: it simply chopped some parts of the subject line (certain extensions, part indicators etc) and tried to match different article subject lines mostly on filename. This worked okay for fansubs where each post is a single file; but it completely failed for music posts where each article may contain a completely different track number + title for a single album. The new post "scanner" can handle music posts much better. It will also skip most spam by default as each post must be of a certain size to be added (so single file spam will be skipped).

As the database is about 5 gigabytes (with 5+ million articles) this takes quite a long time to process on my home development machine though, especially considering due to a bugs I have had to restart the process several times. I just had one of those moments: it turned out using array_merge() to merge two lists of message-ids was a really bad idea

Anyway, this is just to let you know that I am working on the site. I will hopefully be able to put the reworked site online in a few weeks time.
GHDpro is offline