Received: with ECARTIS (v1.0.0; list gopher); Wed, 30 Nov 2005 22:29:22 -0600 (CST) Received: from [69.217.43.23] (helo=hal3000.cx ident=root) by glockenspiel.complete.org with esmtp (Exim 4.50) id 1Ehg4G-0004lG-GJ for gopher@complete.org; Wed, 30 Nov 2005 22:29:21 -0600 Received: from work1.hal3000.cx (work1.hal3000.cx [10.0.0.2]) by hal3000.cx (8.9.3/8.9.3) with SMTP id WAA59250 for ; Wed, 30 Nov 2005 22:29:00 -0600 (CST) (envelope-from chris@hal3000.cx) Date: Wed, 30 Nov 2005 22:25:35 -0600 From: Chris To: gopher@complete.org Subject: [gopher] Re: Bot update Message-Id: <20051130222535.0c72975e@work1.hal3000.cx> In-Reply-To: <20051130090658.GB15038@freeshell.org> References: <20051031034851.GA30223@katherina.lan.complete.org> <20051129232006.GP19727@complete.org> <20051129210335.18b61281@work1.hal3000.cx> <20051130090658.GB15038@freeshell.org> X-Mailer: Sylpheed version 0.9.10claws (GTK+ 1.2.10; i386-portbld-freebsd4.9) Mime-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8bit X-Spam-Status: No (score 0.1): FORGED_RCVD_HELO=0.05 X-Virus-Scanned: by Exiscan on glockenspiel.complete.org at Wed, 30 Nov 2005 22:29:21 -0600 X-archive-position: 1166 X-ecartis-version: Ecartis v1.0.0 Sender: gopher-bounce@complete.org Errors-to: gopher-bounce@complete.org X-original-sender: chris@hal3000.cx Precedence: bulk Reply-to: gopher@complete.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: Gopher X-List-ID: Gopher List-subscribe: List-owner: List-post: List-archive: X-list: gopher Yes very much like WAIS and with each gopher server, for this particular project (searching Johns database), maintaing a part of the larger data set. The idea being that your searching a static database and each link points to the cached database and therefore faster than WAIS as it was used and could still be used. I feel there is a distinction between WAIS and using WAIS software to work on one static database, even if done similarly because with WAIS there was alot of lag and latency which we could trim by selecting servers and groups of servers dedicated for the project. And maybe we should have some people running WAIS anyhow. John had some ideas for pygopherd to be more easily traversed through directories by google spidering... what of a way to have the full text be googled and derived from pygopherds html-ized pages, in this way one would run a pygopherd server with the 28G data on it and let google spider it and then make a gopher gateway to google http://gophersite:80/bigdataset. That way google does the work of the search and its brought back "down" into gopherland. Just more thoughts.. Chris On Wed, 30 Nov 2005 12:06:58 +0300 "R.A.Pavlov" wrote: > On Tue, Nov 29, 2005 at 09:03:35PM -0600, Chris wrote: > > Some other possibilities came to mind, things such as breaking it up into datasets and having various boxen here as well as at other gophers each maintain a dataset or sets. > > And this is close to the idea of WAIS searches where each server indexes > its own content and other servers maintain lists of links to these WAIS > servers. If you mean that each gopher server maintains a database > of its own content. > > By the way I have some progress with WAIS and will show some results to > the public very soon. > > > These were just some thoughts I had. Thanks John for getting it I think it's awesome and am excited to see what we can all do with it. > > Chris > > gopher://hal3000.cx > > > > > > On Tue, 29 Nov 2005 17:20:06 -0600 > > John Goerzen wrote: > > > > > On Wed, Nov 16, 2005 at 10:04:17PM -0600, Jeff wrote: > > > > On Sun, 30 Oct 2005 21:48:51 -0600, John Goerzen > > > > wrote: > > > > > > > > > Here's an update on the gopher bot: > > > > > > > > > > There is currently 28G of data archived representing 386,315 > > > > > documents. 1.3 million documents remain to be visited, from > > > > > approximately 20 very large Gopher servers. I believe, then, that the > > > > > majority of gopher servers have been cached by this point. 3,987 > > > > > different servers are presently represented in the archive. > > > > > > > > Any news? > > > > > > Not really. The bot hit a point where its algorithm for storing page > > > information was getting to be too slow, and there was also a problem > > > with the database layer I'm using segfaulting. When I get some time, I > > > will write a new layer. > > > > > > In the meantime, I'd like to talk about how to get this data to others > > > that might be willing to host it, as well as how to store it out there > > > for the public. Any ideas? > > > > > > > > > > > > > > > > > > -- > > Join FSF as an Associate Member at: > > > > > > -- > Yours, etc. > Roman A. Pavlov > > gopher://sdf.lonestar.org/1/users/rp > > > > > -- Join FSF as an Associate Member at: