Received: with LISTAR (v1.0.0; list gopher); Mon, 15 Jan 2001 08:00:38 -0600 (CST) Return-Path: Delivered-To: gopher@complete.org Received: from erwin.complete.org (cc695330-a.indnpls1.in.home.com [24.8.87.207]) by pi.glockenspiel.complete.org (Postfix) with ESMTP id 31BCD3B912; Mon, 15 Jan 2001 08:00:37 -0600 (CST) Received: (from jgoerzen@localhost) by erwin.complete.org (8.11.1/8.11.1/Debian 8.11.0-6) id f0FDvAp04219; Mon, 15 Jan 2001 08:57:10 -0500 X-Authentication-Warning: erwin.complete.org: jgoerzen set sender to jgoerzen@complete.org using -f To: gopher@complete.org Subject: [gopher] Re: Gopher "robots.txt" (was Re: New V-2 WAIS database) References: <200101150544.VAA10768@stockholm.ptloma.edu> From: John Goerzen Date: 15 Jan 2001 08:57:10 -0500 In-Reply-To: <200101150544.VAA10768@stockholm.ptloma.edu> Message-ID: <87lmscq5op.fsf@complete.org> Lines: 35 User-Agent: Gnus/5.090001 (Oort Gnus v0.01) XEmacs/21.1 (Channel Islands) MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit X-archive-position: 104 X-listar-version: Listar v1.0.0 Sender: gopher-bounce@complete.org Errors-to: gopher-bounce@complete.org X-original-sender: jgoerzen@complete.org Precedence: bulk Reply-to: gopher@complete.org X-list: gopher Cameron Kaiser writes: > You could instead put in something like > > iF1/stayout;1/dontindexFerror.hostF909 > > in the menu that references these selectors. Not quite good enough, I'm afraid. I might have many menus that reference that, and others out there in gopherspace might also reference it. While I'd be able to control my own usage (albeit with some difficulty), I have no control over how others out there link to it. > Mind you, I'd be happy with any approach that works on a per-menu level, > just as long as the bot doesn't have to cache every server's particular > robot policy and can determine the policy for a selector from the menu(s) > that reference that selector. This is just one way I can think of. Well, let's get back to this point. I'm not sure that you must cache it. Assuming that a robot will traverse an entire server at once (am I wrong with that assumption?), it would involve only one extra request to ask for robots.txt before traversal. If, OTOH, servers are hit in a more random pattern, I can see there would be a problem. Is it practical to cache this data then? I might be inclined to suggest that it is. Given that only a few of the gopher servers out there are both actively maintained and have a situation warranting this sort of treatment, perhaps it is not so problematic a situation after all? -- John Goerzen www.complete.org Sr. Software Developer, Progeny Linux Systems, Inc. www.progenylinux.com #include