Received: with LISTAR (v1.0.0; list gopher); Sat, 13 Jan 2001 18:43:28 -0600 (CST) Return-Path: Delivered-To: gopher@complete.org Received: from gtei1.bellatlantic.net (gtei1.bellatlantic.net [199.45.40.145]) by pi.glockenspiel.complete.org (Postfix) with ESMTP id 59DA43B805 for ; Sat, 13 Jan 2001 18:43:27 -0600 (CST) Received: from mothra (adsl-141-152-12-101.bellatlantic.net [141.152.12.101]) by gtei1.bellatlantic.net (8.9.1/8.9.1) with ESMTP id TAA13973 for ; Sat, 13 Jan 2001 19:39:49 -0500 (EST) Received: from x by mothra with local (Exim 3.20 #1 (Debian)) id 14HbAJ-0005Fb-00 for ; Sat, 13 Jan 2001 19:36:59 -0500 Date: Sat, 13 Jan 2001 19:36:59 -0500 From: David Allen To: gopher@complete.org Subject: [gopher] Re: Gopher "robots.txt" (was Re: New V-2 WAIS database) Message-ID: <20010113193659.A20066@mothra> References: <20010114002128.D4061@wonderland.linux.it> <200101140028.QAA13514@stockholm.ptloma.edu> Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii User-Agent: Mutt/1.0.1i In-Reply-To: <200101140028.QAA13514@stockholm.ptloma.edu>; from spectre@stockholm.ptloma.edu on Sat, Jan 13, 2001 at 04:28:12PM -0800 Content-Transfer-Encoding: 8bit X-archive-position: 90 X-listar-version: Listar v1.0.0 Sender: gopher-bounce@complete.org Errors-to: gopher-bounce@complete.org X-original-sender: s2mdalle@titan.vcu.edu Precedence: bulk Reply-to: gopher@complete.org X-list: gopher On Sat, Jan 13, 2001 at 04:28:12PM -0800, Cameron Kaiser wrote: > > >>Good point. I am actually trying to think of a way like the HTTP robots.txt > >>that can more or less transparently tell V-2 what to stay out of. Suggestions? > > > I don't know well gopher yet, but you'd better find one before starting > > to index my site, or your database will be filled with crap. > > I'm sure Dave and John will have some ideas, but for now just mail me offlist > with some regexes that are off-limits and I'll hardcode them for the present. > I appreciate it :-) Personally, I don't see any reason not to just lift the robots.txt verbatim and add it. Minus the User-Agent part, which gopher doesn't really support. (Or, it could always be '*' in case web agents for some reason ended up reading the file) So maybe we could do something like this: Disallow: some_directory Disallow: another_directory Questions on these items though: 1.) Should "some_directory" be a selector string, or the portion of the URL after the host? I.e. on my system, I have a selector, "1/Python Stuff". Should it be listed as that, or as the portion of the URL after the host, which would be: "/11/Python%20Stuff"? 2.) Should gopher servers hide files by the name of robot.txt from the view of the client? (i.e. should it be possible for a human user to come into a directory and see a robots.txt entry, or should it be automagically hidden?) -- David Allen http://opop.nols.com/ ---------------------------------------- Atlee is a very modest man. And with reason. - Winston Churchill