Change of Policy or Slip up?

Debate directly related to English Chess Federation matters.
User avatar
Michael Farthing
Posts: 2069
Joined: Fri Apr 04, 2014 1:28 pm
Location: Morecambe, Europe

Change of Policy or Slip up?

Post by Michael Farthing » Mon Oct 26, 2015 7:37 pm

The ECF Forum reported a minute or so ago:
Who is online

In total there are 8 users online :: 1 registered, 0 hidden and 7 guests (based on users active over the past 5 minutes)
Most users ever online was 96 on 13 May 2015, 05:14

Registered users: Bing [Bot]
Legend: Administrators, Global moderators
Who is this Bing? Is this a Bot I see? Intriguing.

Roger de Coverly
Posts: 21315
Joined: Tue Apr 15, 2008 2:51 pm

Re: Change of Policy or Slip up?

Post by Roger de Coverly » Mon Oct 26, 2015 7:46 pm

Michael Farthing wrote:Who is this Bing? Is this a Bot I see? Intriguing.
Bing is the Microsoft search engine.

It's here as well.

Who is online

In total there are 33 users online :: 8 registered, 2 hidden and 23 guests (based on users active over the past 5 minutes)
Most users ever online was 120 on Sat Oct 17, 2015 4:51 pm

Registered users: Baidu [Spider], Bing [Bot], Brendan O'Gorman, Exabot [Bot], Google [Bot], Michael Farthing, Roger de Coverly, Yahoo [Bot]
Legend: Administrators, Global moderators

User avatar
Michael Farthing
Posts: 2069
Joined: Fri Apr 04, 2014 1:28 pm
Location: Morecambe, Europe

Re: Change of Policy or Slip up?

Post by Michael Farthing » Mon Oct 26, 2015 7:49 pm

Err.. yes Roger. The point was that supposedly they are not allowed on the ECF forum.

Bill Porter
Posts: 147
Joined: Sat Nov 07, 2009 12:20 pm

Re: Change of Policy or Slip up?

Post by Bill Porter » Mon Oct 26, 2015 7:53 pm

Michael Farthing wrote:Err.. yes Roger. The point was that supposedly they are not allowed on the ECF forum.
Robots are allowed to see the main page, but none of the forums or posts.

Angus French
Posts: 2151
Joined: Thu May 15, 2008 1:37 am

Re: Change of Policy or Slip up?

Post by Angus French » Mon Oct 26, 2015 7:57 pm

Bill Porter wrote:
Michael Farthing wrote:Err.. yes Roger. The point was that supposedly they are not allowed on the ECF forum.
Robots are allowed to see the main page, but none of the forums or posts.
But I think robots are (now) allowed to access the forum - as Michael indicates and according to my, possibly naive, reading of the robots.txt file.

User avatar
Carl Hibbard
Posts: 6028
Joined: Fri Dec 08, 2006 8:05 pm
Location: Evesham

Re: Change of Policy or Slip up?

Post by Carl Hibbard » Mon Oct 26, 2015 8:14 pm

There is a bit more to it then all that but I am not sure what their policy is now.
Cheers
Carl Hibbard

User avatar
John Upham
Posts: 7218
Joined: Wed Apr 04, 2007 10:29 am
Location: Cove, Hampshire, England.

Re: Change of Policy or Slip up?

Post by John Upham » Mon Oct 26, 2015 8:30 pm

Here is the current robots file for "the other place" at the level above the forum sub-site (i.e. ../forum)
User-agent: *
Disallow: /images/
Disallow: /CBV/
Disallow: /FM_uploads/
Disallow: /PGN/
Disallow: /event-calendar/action~posterboard/
Disallow: /event-calendarr/action~agenda/
Disallow: /event-calendar/action~oneday/
Disallow: /event-calendar/action~month/
Disallow: /event-calendar/action~week/
Disallow: /event-calendar/action~stream/
and the one for this place is:
User-agent: *
and, by way of contrast here is the current one for the Talk Talk corporate site:
#
# robots.txt
#
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these "robots" where not to go on your site,
# you save bandwidth and server resources.
#
# This file will be ignored unless it is at the root of your host:
# Used: http://example.com/robots.txt
# Ignored: http://example.com/site/robots.txt
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/robotstxt.html
#
# For syntax checking, see:
# http://www.frobee.com/robots-txt-check

User-agent: *
Crawl-delay: 10
# Directories
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /profiles/
Disallow: /scripts/
Disallow: /themes/
# Files
Disallow: /CHANGELOG.txt
Disallow: /cron.php
Disallow: /INSTALL.mysql.txt
Disallow: /INSTALL.pgsql.txt
Disallow: /INSTALL.sqlite.txt
Disallow: /install.php
Disallow: /INSTALL.txt
Disallow: /LICENSE.txt
Disallow: /MAINTAINERS.txt
Disallow: /update.php
Disallow: /UPGRADE.txt
Disallow: /xmlrpc.php
# Paths (clean URLs)
Disallow: /admin/
Disallow: /comment/reply/
Disallow: /filter/tips/
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register/
Disallow: /user/password/
Disallow: /user/login/
Disallow: /user/logout/
# Paths (no clean URLs)
Disallow: /?q=admin/
Disallow: /?q=comment/reply/
Disallow: /?q=filter/tips/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/
Disallow: /?q=user/logout/
Enjoy!
British Chess News : britishchessnews.com
Twitter: @BritishChess
Facebook: facebook.com/groups/britishchess :D

User avatar
Carl Hibbard
Posts: 6028
Joined: Fri Dec 08, 2006 8:05 pm
Location: Evesham

Re: Change of Policy or Slip up?

Post by Carl Hibbard » Mon Oct 26, 2015 8:33 pm

I believe they are still only a suggestion?
Cheers
Carl Hibbard

User avatar
John Upham
Posts: 7218
Joined: Wed Apr 04, 2007 10:29 am
Location: Cove, Hampshire, England.

Re: Change of Policy or Slip up?

Post by John Upham » Mon Oct 26, 2015 8:35 pm

And for those who like to look wayback we have

http://web.archive.org/web/201509091435 ... .uk/Forum/

J.
British Chess News : britishchessnews.com
Twitter: @BritishChess
Facebook: facebook.com/groups/britishchess :D

Paul McKeown
Posts: 3735
Joined: Thu Apr 12, 2007 3:01 pm
Location: Hayes (Middx)

Re: Change of Policy or Slip up?

Post by Paul McKeown » Tue Oct 27, 2015 12:13 am

robots.txt will be obeyed by spiders from corporations with a reputation to defend - e.g. Microsoft or Google. It is as much use as the proverbial chocolate fireguard in fending off bots from script kiddies and worse from sniffing around your privates. The robots.txt for the other place provided by John Upham above allows indexing of forum contents.

User avatar
Michael Farthing
Posts: 2069
Joined: Fri Apr 04, 2014 1:28 pm
Location: Morecambe, Europe

Re: Change of Policy or Slip up?

Post by Michael Farthing » Tue Oct 27, 2015 8:25 am

robots.txt is a bit of an irrelevance. The bots are publically declared now as registered users. That means they have a user id. That has to be sanctioned by the adminiatator.

Paul McKeown
Posts: 3735
Joined: Thu Apr 12, 2007 3:01 pm
Location: Hayes (Middx)

Re: Change of Policy or Slip up?

Post by Paul McKeown » Tue Oct 27, 2015 10:49 am

Michael, you misunderstand how the system works.

Legitimate spiders/crawlers/index bots are NOT registered users and they do not have user ids and are not sanctioned by the administrator.

Apache (or other web server) passes parameters to the application software creating the page to be returned to the web user. These parameters include information which allows standard spiders/crawlers/index bots to be recognised.

For instance, Google publishes how to recognise its index bots as follows: Google crawlers. So, for instance, Google's standard bots can be recognised by their User-Agent string of "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

You will further note from that page that Google provides the name for each of its index bots by which it should be known in robots.txt.

The name you see at the bottom of the page on this site, is provided as a standard part of the Content Management System used to build and return pages. It recognises the standard bots that access its pages and accumulates statistics for them.

-- corrected grammar to clarify meaning
Last edited by Paul McKeown on Tue Oct 27, 2015 12:42 pm, edited 1 time in total.

Roger de Coverly
Posts: 21315
Joined: Tue Apr 15, 2008 2:51 pm

Re: Change of Policy or Slip up?

Post by Roger de Coverly » Tue Oct 27, 2015 10:57 am

John Upham demonstrated that the index page of the ECF's forum is sometimes archived, so it's possible to see the topic areas which had been discussed. It didn't archive the forum content.

That contrasts with this forum where if you can remember some keywords, there's a reasonable chance that a Google search will directly locate the original content.

The point, supposedly, was that if the ECF decided that particular posts were never made, they could be removed and no-one would be the wiser. Copies are often taken though.

Bill Porter
Posts: 147
Joined: Sat Nov 07, 2009 12:20 pm

Re: Change of Policy or Slip up?

Post by Bill Porter » Tue Oct 27, 2015 12:35 pm

the admin can allow or deny recognised spiders/crawlers/index bots permission to access any or all forums.

The wayback machine may not be included in the list or recognised otherwise, when it would be allowed the same access as a guest.

There are lots of ways of discouraging archiving; the EC forum's approach might be described as "keep it complicated, clever".

User avatar
John Upham
Posts: 7218
Joined: Wed Apr 04, 2007 10:29 am
Location: Cove, Hampshire, England.

Re: Change of Policy or Slip up?

Post by John Upham » Tue Oct 27, 2015 12:47 pm

Bill Porter wrote:the admin can allow or deny recognised spiders/crawlers/index bots permission to access any or all forums.

The wayback machine may not be included in the list or recognised otherwise, when it would be allowed the same access as a guest.

There are lots of ways of discouraging archiving; the EC forum's approach might be described as "keep it complicated, clever".
The standard method would be to include
User-agent: ia_archiver Disallow: /.
in the robots.txt

The Wayback Machine client advertises itself to http servers with a http user agent of ia_archiver

Obviously the WBM cannot authenticate itself and generate queries to retrieve data from a web site whose content is primarily stored in a database such as Oracle or MySQL.

It is most successful for flat, non-dynamic sites.
Last edited by John Upham on Tue Oct 27, 2015 5:26 pm, edited 1 time in total.
British Chess News : britishchessnews.com
Twitter: @BritishChess
Facebook: facebook.com/groups/britishchess :D