AnimeSuki.com Forum

AnimeSuki Forum (http://forums.animesuki.com/index.php)
-   Forum & Site Feedback (http://forums.animesuki.com/forumdisplay.php?f=2)
-   -   Ignore/Add to Buddy list (http://forums.animesuki.com/showthread.php?t=15863)

_Sin_ 2004-06-20 09:52

Ignore/Add to Buddy list
 
I got some questions about the Forum which can be answered quite quickly I think. I want to know what effect it has if I add someone to my buddy list or in my ignore list. Like can I see the people in my Buddy list even if they are in Invisible mode? Or I don't get any PM's from people in the Ignore list?

EDIT: I feel so stupid >_< It's all in the FAQ :heh:
I guess the only question that remains is the last one at the bottom, and I'll look into the FAQ again if I can find any clues about that. EDIT end

Oh and since I'm opening a thread for this, I might ask this as well although it does not have much of an importance: Who/What are the Google Spiders on the Who is online page? Are they guest who got directed here by Google or something?

Thanks :D

Superchop 2004-06-20 10:06

Quote:

Originally Posted by _Sin_
Who/What are the Google Spiders on the Who is online page? Are they guest who got directed here by Google or something?

Thanks :D

lol, I've always wondered about those things as well...I just thought that since noone ever asked about it that either noone noticed it...or that everyone else except me knew what they were :heh:

xris 2004-06-20 10:16

Quote:

Originally Posted by Superchop
lol, I've always wondered about those things as well...I just thought that since noone ever asked about it that either noone noticed it...or that everyone else except me knew what they were :heh:

This is how online search engines (such as Google) get the data indexed. The spiders are programs that trawl the internet (spiders, as they crawl around the web, www) and build up the index for the search engines. They download the web page, index every word they find and note the URL of the page. They also look for other URLs on the page so to build a network of links (to find more pages to search). If you look at a log file of a site to see who visits, spiders are common visitors.

Note: This is a simplified (and poor) explanation of how it sort of works, it's meant to just give a general overview. I've never noticed them here before but I've seen the equivalent 'bot trawl through my sites (by inspecting the access log files).

_Sin_ 2004-06-20 10:19

Quote:

Originally Posted by xris
This is how online search engines (such as Google) get the data indexed. The spiders are programs that trawl the internet (spiders, as they crawl around the web, www) and build up the index for the search engines. They download the web page, index every word they find and note the URL of the page. They also look for other URLs on the page so to build a network of links (to find more pages to search). If you look at a log file of a site to see who visits, spiders are common visitors.

Thanks for the information :)

Superchop 2004-06-20 10:26

Xris - Ah, ok...the few times that i ever checked the who was online page i always saw some 2 or 3 of them browsing but since nobody ever asked about it i just shrugged it off and left it alone :heh:

_Sin_ 2004-06-20 10:31

Quote:

Originally Posted by Superchop
Xris - Ah, ok...the few times that i ever checked the who was online page i always saw some 2 or 3 of them browsing but since nobody ever asked about it i just shrugged it off and left it alone :heh:

I'm pretty sure that I even saw one Spider looking at the user profiles - good thing that our E-Mail addresses are kinda crypted :uhoh:
:hmm: And they even search the Forums like the Admins/Mods - interesting.

xris 2004-06-20 10:33

Here are some examples of search engine bots (I only listed those looking at the robots.txt file, otherwise there are too many to list)

crawler14.googlebot.com - - [19/Jun/2004:22:30:04 +0100] "GET /robots.txt HTTP/1.0" 200 155 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

lj1141.inktomisearch.com - - [19/Jun/2004:17:42:58 +0100] "GET /robots.txt HTTP/1.0" 200 155 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"

webcachem08b.cache.pol.co.uk - - [19/Jun/2004:19:37:53 +0100] "GET /robots.txt HTTP/1.1" 200 155 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; MSIECrawler)"

x1crawler3-1-0.x-echo.com - - [19/Jun/2004:20:48:25 +0100] "GET /robots.txt HTTP/1.1" 200 155 "-" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 95) VoilaBot BETA 1.2 (http://www.voila.com/)"

wfp2.almaden.ibm.com - - [20/Jun/2004:01:55:35 +0100] "GET /robots.txt HTTP/1.0" 200 155 "-" "http://www.almaden.ibm.com/cs/crawler [c01]"

msnbot64104.search.msn.com - - [20/Jun/2004:03:04:02 +0100] "GET /robots.txt HTTP/1.0" 200 155 "-" "msnbot/0.11 (+http://search.msn.com/msnbot.htm)"

Note that sites can have a file called robots.txt, which the admin can optionally create to 'help' the bot search the site. I assume the bot starts by accessing the robots.txt file and then proceeds with /

Here's part of a robots.txt file I have
# Hello little robots

user-agent: *
disallow: /tiles
disallow: /status.htm
disallow: /preorder.htm
disallow: /updates.htm

It tells them not to search those directories / files listed (because I think they get updated to often to warrant repeated searches of them).

[maven] 2004-06-23 09:50

Quote:

Originally Posted by xris
Here's part of a robots.txt file I have
Code:

# Hello little robots

user-agent: *
disallow: /tiles
disallow: /status.htm
disallow: /preorder.htm
disallow: /updates.htm

It tells them not to search those directories / files listed (because I think they get updated to often to warrant repeated searches of them).

status.htm - updated 27th January 2002 3:15 PM BST
preorder.htm - updated 27th January 2002 3:15 PM GMT
updates.htm - updated 1st August 2003

Sorry. Couldn't resist... :p


All times are GMT -5. The time now is 14:55.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.