26-Aug-96 20:22:00-GMT,4423;000000000001
Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.7.5/8.7.3) id QAA04898; Mon, 26 Aug 1996 16:21:59 -0400 (EDT)
Date: Mon, 26 Aug 96 16:21:59 EDT
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: jaltman@columbia.edu
cc: Kai Uwe Rommel <rommel@ars.de>, Joe Doupnik <JRD@cc.usu.edu>,
        John Chandler <JCHBN@CUVMB.CC.COLUMBIA.EDU>
Subject: Re: Info-ZIP and C-Kermit
In-Reply-To: Your message of Mon, 26 Aug 96 10:33:41 EDT
Message-ID: <CMM.0.90.4.841090919.fdc@watsun.cc.columbia.edu>

> > I am thinking about a new file transfer technique:  SET FILE TYPE
> > INFOZIP which would first create a ZIP file, transfer it in BINARY
> > mode, and then UNZIP it.  This would take care of the inefficiencies
> > associated with transfering hundreds of small files that fit within one
> > packet.  
> 
> We would certainly approve. The requirements are fairly loose. If you
> use the DLL (by linking it in if it exists or not if not) or call the
> executable as a subprocess, you don't even have to care about
> copyrights. And if you integrate the source code directly, all you
> need to do is to put the proper statements into the docs.
> 
Thanks, Kai Uwe (and hello :-)

There are several problems with this scheme, though.  My greatest quibble
with the ZIP approach is that the results are not truly transportable.
If I have a directory containing a mixture of text and binary files, and I
ZIP it, and then I transfer the ZIP file to another computer that uses a
different text-file format, then either the binary files or the text files
will be in the wrong format.

Obviously, Kermit has the same problem.  The real solution here is to be able
for the software (ZIP or Kermit) to determine automatically, for each file,
whether it is text or binary.  But in most common settings, that would be
impossible -- DOS, Windows, UNIX, etc.  VMS lets us do this by inspecting the
record format (but even then, we are sometimes fooled -- e.g. VMS ZIP files
have a text-type record format, Stream_LF).

In DOS and Windows, I suppose we could go by filetype (e.g. ".TXT" is text),
but of course it is not reliable.  In UNIX we don't even *have* filetypes.
We do have the "file" program in UNIX, but it guesses wrong more often than
not.

So I am not sure that having Kermit use ZIP format at the presentation layer
is a good idea.  As for solving the other problem -- transferring a large
number of tiny files -- we already have a way to do this.  The challenge is
to make it work more efficiently.  The barrier to that challenge is the
Attribute refusal mechanism.

In the current scheme, we use sliding windows only during the data phase.
Now consider a wildcard transfer, with many files.  If we were to allow
sliding windows during *all* phases of the transfer, then the sender might
have sent up to 32 packets of file data (potentially up to 9K each) before
receiving the attribute refusal.  This is far LESS efficient that turning off
windowing during the filename-attribute phase.

When transferring between LIKE systems, ZIPping is a good way to solve this
problem, but otherwise it is likely to introduce more problems (corrupted
files) than it solves.

> > Also, Frank and I have always been troubled by the problems associated
> > with transfering directory trees from DOS/Unix to VMS since the
> > directory structures are so completely incompatible.  How does Info-ZIP
> > handle the recreation of directory trees between these systems?  
> 
> Transparently. The / separators in archives are treated as "canonical"
> and are translated into the local separators of the actual OS, such as
> \ on DOS or OS/2, / on Unix and [.] on VMS.
> 
I suppose we could do this too.  In VMS C-Kermit, we could offer a "UNIX" mode
for filenames and paths -- incoming and outgoing.  Ditto for Windows and OS/2,
where it's easy.  So much for the presentation layer.  We already support
automatic directory creation, so then it is simply a matter of adding the
ability to "recurse" to our filename expanders.

I'm not sure that this is the right way to solve the problem, because it
assumes that every other file system in the world can map to the UNIX file
system (a structural issue), and on a more simplicistic level, that directory
names can never contain slashes (or if they can, then we must introduce a new
kind of quoting rule for pathnames).

Anyway, this is all stuff for the distant future, not for now.

- Frank

27-Aug-96 14:33:19-GMT,2733;000000000001
Received: from ars.de (firewall.ars.de [194.97.120.113]) by watsun.cc.columbia.edu (8.7.5/8.7.3) with SMTP id KAA12640 for <fdc@WATSUN.CC.COLUMBIA.EDU>; Tue, 27 Aug 1996 10:33:16 -0400 (EDT)
Received: from jonas.ars.de by ars.de (IBM OS/2 SENDMAIL VERSION 1.3.17/3.0ars)
	  id AA0160; Tue, 27 Aug 96 16:32:34 +0200
Received: by internal-host.ars.de (IBM OS/2 SENDMAIL VERSION 1.3.17/3.0ars)
	  id AA0489; Tue, 27 Aug 96 16:31:00 +0200
Message-Id: <9608271431.AA0489@internal-host.ars.de>
Date: Tue, 27 Aug 96 16:30:59 +0100
From: Kai Uwe Rommel <rommel@ars.de>
Subject: Re: Info-ZIP and C-Kermit
To: Zip-Bugs@LISTS.WKU.EDU, fdc@WATSUN.CC.COLUMBIA.EDU
In-Reply-To: <CMM.0.90.4.841090919.fdc@watsun.cc.columbia.edu> from "Frank da Cruz" at Aug 26 96 4:21 pm
X-Mailer: ELM [version 2.3 PL11] for OS/2

You (Frank da Cruz) wrote:
> Thanks, Kai Uwe (and hello :-)

Hi!

> There are several problems with this scheme, though.  My greatest quibble
> with the ZIP approach is that the results are not truly transportable.
> If I have a directory containing a mixture of text and binary files, and I
> ZIP it, and then I transfer the ZIP file to another computer that uses a
> different text-file format, then either the binary files or the text files
> will be in the wrong format.
> 
> Obviously, Kermit has the same problem.  The real solution here is to be able
> for the software (ZIP or Kermit) to determine automatically, for each file,
> whether it is text or binary.  But in most common settings, that would be
> impossible -- DOS, Windows, UNIX, etc.  VMS lets us do this by inspecting the
> record format (but even then, we are sometimes fooled -- e.g. VMS ZIP files
> have a text-type record format, Stream_LF).

My personal opinion is: DON'T DO text mode transmissin/conversion.
Leave everything as is. Those who have to transfer texts between
systems with different representations of text files know how to deal
with that "problem".

> In DOS and Windows, I suppose we could go by filetype (e.g. ".TXT" is text),
> but of course it is not reliable.  

For example, german versions of Microsoft Word 5 used the extension
.TXT instead of .DOC for their (binary) document files.

> When transferring between LIKE systems, ZIPping is a good way to solve this
> problem, but otherwise it is likely to introduce more problems (corrupted
> files) than it solves.

I agree.

Kai Uwe

--
/* Kai Uwe Rommel      ARS Computer & Consulting GmbH, Muenchen, Germany *
 * rommel@ars.de             CompuServe 100265,2651, Fax +49 89 324 4524 *
 * rommel@leo.org (ftp://ftp.leo.org/pub/comp/os/os2 maintenance)        */

DOS ... is still a real mode only non-reentrant interrupt
handler, and always will be.                -Russell Williams

27-Aug-96 18:25:10-GMT,3759;000000000011
Received: from haven.uchicago.edu (haven.uchicago.edu [128.135.12.3]) by watsun.cc.columbia.edu (8.7.5/8.7.3) with ESMTP id OAA25239 for <fdc@watsun.cc.columbia.edu>; Tue, 27 Aug 1996 14:25:08 -0400 (EDT)
Received: from ellis.uchicago.edu (roe2@ellis.uchicago.edu [128.135.12.62]) by haven.uchicago.edu (8.7.5/8.7.3) with ESMTP id NAA26159; Tue, 27 Aug 1996 13:24:12 -0500 (CDT)
Received: (from roe2@localhost) by ellis.uchicago.edu (8.7.1/8.7.2) id NAA06405; Tue, 27 Aug 1996 13:26:17 -0500 (CDT)
Date: Tue, 27 Aug 1996 13:26:17 -0500 (CDT)
From: Cave Newt <roe2@midway.uchicago.edu>
Message-Id: <199608271826.NAA06405@ellis.uchicago.edu>
To: Zip-Bugs@LISTS.WKU.EDU, fdc@watsun.cc.columbia.edu, jaltman@columbia.edu
Subject: Re: Info-ZIP and C-Kermit
Cc: JCHBN@CUVMB.CC.COLUMBIA.EDU, JRD@cc.usu.edu

> Thanks, Kai Uwe (and hello :-)

Hi (and hi to Joe--we had e-mail years ago, but I don't remember what about),

> There are several problems with this scheme, though.  My greatest quibble
> with the ZIP approach is that the results are not truly transportable.
> If I have a directory containing a mixture of text and binary files, and I
> ZIP it, and then I transfer the ZIP file to another computer that uses a
> different text-file format, then either the binary files or the text files
> will be in the wrong format.

Not so.  As long as you can specify accurately which is which, UnZip can
do translations on the fly.  (Of course, since I currently have Kermit
pegged as the world's second most portable program and UnZip as third,
I'm assuming you have a few ports that we don't, and those could be a
problem.)

Note that Zip currently *tries* to identify text vs. binary, but it fails
every now and then (especially on Windoze DLLs--it's only about 70% accurate
there, vs. maybe 97% on most files--at least for me).  If you choose to make
use of Info-ZIP code directly, that's one area where you'd want to modify
the existing code.

> VMS lets us do this by inspecting the
> record format (but even then, we are sometimes fooled -- e.g. VMS ZIP files
> have a text-type record format, Stream_LF).

Hmm, I thought we had switched to fixed-512 for zipfiles.  In any case,
stream-anything is pretty rare in VMS and could be treated the same way
as DOS or Unix.

> We do have the "file" program in UNIX, but it guesses wrong more often than
> not.

Depends on which version you're using...  Darwin's is pretty good (if I
do say so myself :-) ).

> When transferring between LIKE systems, ZIPping is a good way to solve this
> problem, but otherwise it is likely to introduce more problems (corrupted
> files) than it solves.

Erm...well, nothing personal, but Kermit's "assume it's text unless told
otherwise" behavior has introduced plenty of problems and corrupted files
itself...

> I'm not sure that this is the right way to solve the problem, because it
> assumes that every other file system in the world can map to the UNIX file
> system (a structural issue), and on a more simplicistic level, that directory
> names can never contain slashes (or if they can, then we must introduce a new
> kind of quoting rule for pathnames).

So far, the Unix filesystem is *the* most encompassing of any we've 
encountered.  The / issue does come up on Acorn RISC OS, but one could
flag those filenames specially and swap with dots (which are what they
use for directory separators).  The real problem comes when you move a
rich directory system to a lousy (DOS) or non-existent (TOPS-20) one
and have to deal with all sorts of translations, truncations and name-
collisions.  One way or the other, user input is necessary then.

--
Greg Roelofs              "Name an animal that's small and fuzzy."  "Mold."
newt@pobox.com     or     http://pobox.com/~newt/

27-Aug-96 18:44:18-GMT,1418;000000000011
Received: from GRUMPY.USU.EDU (grumpy.usu.edu [129.123.1.86]) by watsun.cc.columbia.edu (8.7.5/8.7.3) with ESMTP id OAA02418 for <FDC@WATSUN.CC.COLUMBIA.EDU>; Tue, 27 Aug 1996 14:44:17 -0400 (EDT)
Received: from cc.usu.edu by cc.usu.edu (PMDF V5.0-5 #11556)
 id <01I8RR0YMB0WCE2FIO@cc.usu.edu>; Tue, 27 Aug 1996 12:43:37 -0600 (MDT)
Date: Tue, 27 Aug 1996 12:43:37 -0600 (MDT)
From: Joe Doupnik <JRD@cc.usu.edu>
Subject: Re: Info-ZIP and C-Kermit
To: ROE2@MIDWAY.UCHICAGO.EDU
Cc: ZIP-BUGS@LISTS.WKU.EDU, FDC@WATSUN.CC.COLUMBIA.EDU,
        JCHBN@CUVMB.CC.COLUMBIA.EDU
Message-id: <01I8RR0YMESICE2FIO@cc.usu.edu>
X-VMS-To: ROE2@MIDWAY.UCHICAGO.EDU
X-VMS-Cc: 
 ZIP-BUGS@LISTS.WKU.EDU,FDC@WATSUN.CC.COLUMBIA.EDU,JCHBN@CUVMB.CC.COLUMBIA.EDU,JRD
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=US-ASCII
Content-transfer-encoding: 7BIT

	Well, by everyone's admission filenames alone are insufficient to 
distinguish binary from text files. Looking at the first N bytes (whether
two for MZ or 512) is insufficient also (lines don't have to break at 80
columns). On machines with rich attributes those attributes are insufficient.
In short, what's text and what's binary is "in the eyes of the beholder",
and a particular beholder isn't predictable (or fixed) in advance.
	The conclusion is glaringly obvious, isn't it. People must choose 
because machines cannot. And that, I believe, is the end of this story.
	Joe D.

27-Aug-96 18:54:37-GMT,4069;000000000000
Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.7.5/8.7.3) id OAA03531; Tue, 27 Aug 1996 14:50:13 -0400 (EDT)
Date: Tue, 27 Aug 96 14:50:12 EDT
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: Cave Newt <roe2@midway.uchicago.edu>
Cc: Zip-Bugs@LISTS.WKU.EDU, jaltman@columbia.edu, JCHBN@CUVMB.CC.COLUMBIA.EDU,
        JRD@cc.usu.edu
Subject: Re: Info-ZIP and C-Kermit
In-Reply-To: Your message of Tue, 27 Aug 1996 13:26:17 -0500 (CDT)
Message-ID: <CMM.0.90.4.841171812.fdc@watsun.cc.columbia.edu>

> Not so.  As long as you can specify accurately which is which, UnZip can
> do translations on the fly.  (Of course, since I currently have Kermit
> pegged as the world's second most portable program and UnZip as third,
> I'm assuming you have a few ports that we don't, and those could be a
> problem.)
> 
> Note that Zip currently *tries* to identify text vs. binary, but it fails
> every now and then (especially on Windoze DLLs--it's only about 70% accurate
> there, vs. maybe 97% on most files--at least for me).  If you choose to make
> use of Info-ZIP code directly, that's one area where you'd want to modify
> the existing code.
> 
But even if it tries (and misses sometimes) in the DOS/Windows environment,
what about UNIX, OS-9, QNX, AOS/VS, VOS, VM/CMS, MVS/TSO, CICS, RT-11, ... ?
There is no way on earth to embody all that knowledge in one piece of
software.  And then add in the fact that you might be looking at foreign
files rather than domestic ones.  It's hopeless with any degree of reliability.

(In the latest version of C-Kermit, we let the user build lists of files to
be transferred -- the list can be any length at all, and each member of the
list can be tagged as binary or text.)

> > When transferring between LIKE systems, ZIPping is a good way to solve this
> > problem, but otherwise it is likely to introduce more problems (corrupted
> > files) than it solves.
> 
> Erm...well, nothing personal, but Kermit's "assume it's text unless told
> otherwise" behavior has introduced plenty of problems and corrupted files
> itself...
> 
Exactly the point -- if you assume text, you corrupt the binary files; if you
assume binary, you corrupt the text files.  Kermit 95, by the way, uses binary
by default, since most Windows users are transferring only ZIP, GIF, or JPG
files.  But the real issue is what to do about a mixture of file types.

Many people say it is better to just always transfer in binary mode, because
at least then you don't corrupt the binary files (ZIP, GIF, tar.gz, etc), and
even though you do corrupt the text files, the corruption is recoverable by
knowledgeable people.  I don't share that point of view, however, because (a)
an ever-increasing percentage of computer users are *not* knowledgeable, and
(b) text-file transfer converts not only the record format, but also the
character-sets, a consideration that most English speakers (or non-users of
IBM mainframes) tend to overlook.

> So far, the Unix filesystem is *the* most encompassing of any we've 
> encountered.  The / issue does come up on Acorn RISC OS, but one could
> flag those filenames specially and swap with dots (which are what they
> use for directory separators).  The real problem comes when you move a
> rich directory system to a lousy (DOS) or non-existent (TOPS-20) one
>
Hey!  TOPS-20 was just like VMS.  Nonexistent would be HP-1000 or OS/360 :-)
(although granted, TOPS-20 does not exist *now* except in Mark Crispin's
basement...)

> and have to deal with all sorts of translations, truncations and name-
> collisions.  One way or the other, user input is necessary then.
> 
Right.  That's why this is such a hard problem -- and why we've been putting
it off for so long :-)

Maybe the UNIX model is adequate in the sense it is a superset of all others,
but I haven't done a broad enough study to reach that conclusion yet.  For
example, are there other kinds of directory organizations that are more
complex than a simple tree?  Certainly there are file organizations that are
WAY more complex than a stream.

- Frank

27-Aug-96 19:01:41-GMT,2345;000000000001
Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.7.5/8.7.3) id PAA07527; Tue, 27 Aug 1996 15:00:52 -0400 (EDT)
Date: Tue, 27 Aug 96 15:00:52 EDT
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: Joe Doupnik <JRD@cc.usu.edu>
Cc: ROE2@MIDWAY.UCHICAGO.EDU, ZIP-BUGS@LISTS.WKU.EDU,
        JCHBN@CUVMB.CC.COLUMBIA.EDU
Subject: Re: Info-ZIP and C-Kermit
In-Reply-To: Your message of Tue, 27 Aug 1996 12:43:37 -0600 (MDT)
Message-ID: <CMM.0.90.4.841172452.fdc@watsun.cc.columbia.edu>

> 	Well, by everyone's admission filenames alone are insufficient to 
> distinguish binary from text files. Looking at the first N bytes (whether
> two for MZ or 512) is insufficient also (lines don't have to break at 80
> columns). On machines with rich attributes those attributes are insufficient.
> In short, what's text and what's binary is "in the eyes of the beholder",
> and a particular beholder isn't predictable (or fixed) in advance.
> 	The conclusion is glaringly obvious, isn't it. People must choose 
> because machines cannot. And that, I believe, is the end of this story.
> 
Absolutely, at least for most operating systems.  Personally, I think it would
be interesting to design a new file system in which files could be tagged as
text or binary -- by the application that created them, and also (as an
override) by their (human) owners.  When files were text, I think it would
also be a great boon to humankind if they could be tagged as to the character
set of encoding.  To my knowledge, no file system has ever provided such a
service in the file system itself.  "Binary" would mean, simply, that if the
file is to be transferred to another computer, no conversions should be done.
Conversely, "text" would mean that record-format and character-set conversions
should be done.

To reinforce Joe's point about inspection: a long time ago, somebody wrote a
Kermit program that decided if a file was text or binary based on whether it
contained any bytes with the 8th bit turned on.  Sombody else checked for lots
control characters.  None of this works when a text file is Latin-1, CP850,
UNICODE, Hebrew, Russian, Japanese, etc.  On the other hand, I have seen Intel
executables composed of only printable ASCII characters.  (We have one in the
Kermit archives somewhere but I can't put my finger on it at the moment...)

- Frank

27-Aug-96 14:33:19-GMT,2733;000000000001
Received: from ars.de (firewall.ars.de [194.97.120.113]) by watsun.cc.columbia.edu (8.7.5/8.7.3) with SMTP id KAA12640 for <fdc@WATSUN.CC.COLUMBIA.EDU>; Tue, 27 Aug 1996 10:33:16 -0400 (EDT)
Received: from jonas.ars.de by ars.de (IBM OS/2 SENDMAIL VERSION 1.3.17/3.0ars)
	  id AA0160; Tue, 27 Aug 96 16:32:34 +0200
Received: by internal-host.ars.de (IBM OS/2 SENDMAIL VERSION 1.3.17/3.0ars)
	  id AA0489; Tue, 27 Aug 96 16:31:00 +0200
Message-Id: <9608271431.AA0489@internal-host.ars.de>
Date: Tue, 27 Aug 96 16:30:59 +0100
From: Kai Uwe Rommel <rommel@ars.de>
Subject: Re: Info-ZIP and C-Kermit
To: Zip-Bugs@LISTS.WKU.EDU, fdc@WATSUN.CC.COLUMBIA.EDU
In-Reply-To: <CMM.0.90.4.841090919.fdc@watsun.cc.columbia.edu> from "Frank da Cruz" at Aug 26 96 4:21 pm
X-Mailer: ELM [version 2.3 PL11] for OS/2

You (Frank da Cruz) wrote:
> Thanks, Kai Uwe (and hello :-)

Hi!

> There are several problems with this scheme, though.  My greatest quibble
> with the ZIP approach is that the results are not truly transportable.
> If I have a directory containing a mixture of text and binary files, and I
> ZIP it, and then I transfer the ZIP file to another computer that uses a
> different text-file format, then either the binary files or the text files
> will be in the wrong format.
> 
> Obviously, Kermit has the same problem.  The real solution here is to be able
> for the software (ZIP or Kermit) to determine automatically, for each file,
> whether it is text or binary.  But in most common settings, that would be
> impossible -- DOS, Windows, UNIX, etc.  VMS lets us do this by inspecting the
> record format (but even then, we are sometimes fooled -- e.g. VMS ZIP files
> have a text-type record format, Stream_LF).

My personal opinion is: DON'T DO text mode transmissin/conversion.
Leave everything as is. Those who have to transfer texts between
systems with different representations of text files know how to deal
with that "problem".

> In DOS and Windows, I suppose we could go by filetype (e.g. ".TXT" is text),
> but of course it is not reliable.  

For example, german versions of Microsoft Word 5 used the extension
.TXT instead of .DOC for their (binary) document files.

> When transferring between LIKE systems, ZIPping is a good way to solve this
> problem, but otherwise it is likely to introduce more problems (corrupted
> files) than it solves.

I agree.

Kai Uwe

--
/* Kai Uwe Rommel      ARS Computer & Consulting GmbH, Muenchen, Germany *
 * rommel@ars.de             CompuServe 100265,2651, Fax +49 89 324 4524 *
 * rommel@leo.org (ftp://ftp.leo.org/pub/comp/os/os2 maintenance)        */

DOS ... is still a real mode only non-reentrant interrupt
handler, and always will be.                -Russell Williams

27-Aug-96 18:25:10-GMT,3759;000000000011
Received: from haven.uchicago.edu (haven.uchicago.edu [128.135.12.3]) by watsun.cc.columbia.edu (8.7.5/8.7.3) with ESMTP id OAA25239 for <fdc@watsun.cc.columbia.edu>; Tue, 27 Aug 1996 14:25:08 -0400 (EDT)
Received: from ellis.uchicago.edu (roe2@ellis.uchicago.edu [128.135.12.62]) by haven.uchicago.edu (8.7.5/8.7.3) with ESMTP id NAA26159; Tue, 27 Aug 1996 13:24:12 -0500 (CDT)
Received: (from roe2@localhost) by ellis.uchicago.edu (8.7.1/8.7.2) id NAA06405; Tue, 27 Aug 1996 13:26:17 -0500 (CDT)
Date: Tue, 27 Aug 1996 13:26:17 -0500 (CDT)
From: Cave Newt <roe2@midway.uchicago.edu>
Message-Id: <199608271826.NAA06405@ellis.uchicago.edu>
To: Zip-Bugs@LISTS.WKU.EDU, fdc@watsun.cc.columbia.edu, jaltman@columbia.edu
Subject: Re: Info-ZIP and C-Kermit
Cc: JCHBN@CUVMB.CC.COLUMBIA.EDU, JRD@cc.usu.edu

> Thanks, Kai Uwe (and hello :-)

Hi (and hi to Joe--we had e-mail years ago, but I don't remember what about),

> There are several problems with this scheme, though.  My greatest quibble
> with the ZIP approach is that the results are not truly transportable.
> If I have a directory containing a mixture of text and binary files, and I
> ZIP it, and then I transfer the ZIP file to another computer that uses a
> different text-file format, then either the binary files or the text files
> will be in the wrong format.

Not so.  As long as you can specify accurately which is which, UnZip can
do translations on the fly.  (Of course, since I currently have Kermit
pegged as the world's second most portable program and UnZip as third,
I'm assuming you have a few ports that we don't, and those could be a
problem.)

Note that Zip currently *tries* to identify text vs. binary, but it fails
every now and then (especially on Windoze DLLs--it's only about 70% accurate
there, vs. maybe 97% on most files--at least for me).  If you choose to make
use of Info-ZIP code directly, that's one area where you'd want to modify
the existing code.

> VMS lets us do this by inspecting the
> record format (but even then, we are sometimes fooled -- e.g. VMS ZIP files
> have a text-type record format, Stream_LF).

Hmm, I thought we had switched to fixed-512 for zipfiles.  In any case,
stream-anything is pretty rare in VMS and could be treated the same way
as DOS or Unix.

> We do have the "file" program in UNIX, but it guesses wrong more often than
> not.

Depends on which version you're using...  Darwin's is pretty good (if I
do say so myself :-) ).

> When transferring between LIKE systems, ZIPping is a good way to solve this
> problem, but otherwise it is likely to introduce more problems (corrupted
> files) than it solves.

Erm...well, nothing personal, but Kermit's "assume it's text unless told
otherwise" behavior has introduced plenty of problems and corrupted files
itself...

> I'm not sure that this is the right way to solve the problem, because it
> assumes that every other file system in the world can map to the UNIX file
> system (a structural issue), and on a more simplicistic level, that directory
> names can never contain slashes (or if they can, then we must introduce a new
> kind of quoting rule for pathnames).

So far, the Unix filesystem is *the* most encompassing of any we've 
encountered.  The / issue does come up on Acorn RISC OS, but one could
flag those filenames specially and swap with dots (which are what they
use for directory separators).  The real problem comes when you move a
rich directory system to a lousy (DOS) or non-existent (TOPS-20) one
and have to deal with all sorts of translations, truncations and name-
collisions.  One way or the other, user input is necessary then.

--
Greg Roelofs              "Name an animal that's small and fuzzy."  "Mold."
newt@pobox.com     or     http://pobox.com/~newt/

27-Aug-96 18:44:18-GMT,1418;000000000011
Received: from GRUMPY.USU.EDU (grumpy.usu.edu [129.123.1.86]) by watsun.cc.columbia.edu (8.7.5/8.7.3) with ESMTP id OAA02418 for <FDC@WATSUN.CC.COLUMBIA.EDU>; Tue, 27 Aug 1996 14:44:17 -0400 (EDT)
Received: from cc.usu.edu by cc.usu.edu (PMDF V5.0-5 #11556)
 id <01I8RR0YMB0WCE2FIO@cc.usu.edu>; Tue, 27 Aug 1996 12:43:37 -0600 (MDT)
Date: Tue, 27 Aug 1996 12:43:37 -0600 (MDT)
From: Joe Doupnik <JRD@cc.usu.edu>
Subject: Re: Info-ZIP and C-Kermit
To: ROE2@MIDWAY.UCHICAGO.EDU
Cc: ZIP-BUGS@LISTS.WKU.EDU, FDC@WATSUN.CC.COLUMBIA.EDU,
        JCHBN@CUVMB.CC.COLUMBIA.EDU
Message-id: <01I8RR0YMESICE2FIO@cc.usu.edu>
X-VMS-To: ROE2@MIDWAY.UCHICAGO.EDU
X-VMS-Cc: 
 ZIP-BUGS@LISTS.WKU.EDU,FDC@WATSUN.CC.COLUMBIA.EDU,JCHBN@CUVMB.CC.COLUMBIA.EDU,JRD
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=US-ASCII
Content-transfer-encoding: 7BIT

	Well, by everyone's admission filenames alone are insufficient to 
distinguish binary from text files. Looking at the first N bytes (whether
two for MZ or 512) is insufficient also (lines don't have to break at 80
columns). On machines with rich attributes those attributes are insufficient.
In short, what's text and what's binary is "in the eyes of the beholder",
and a particular beholder isn't predictable (or fixed) in advance.
	The conclusion is glaringly obvious, isn't it. People must choose 
because machines cannot. And that, I believe, is the end of this story.
	Joe D.

27-Aug-96 18:54:37-GMT,4069;000000000001
Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.7.5/8.7.3) id OAA03531; Tue, 27 Aug 1996 14:50:13 -0400 (EDT)
Date: Tue, 27 Aug 96 14:50:12 EDT
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: Cave Newt <roe2@midway.uchicago.edu>
Cc: Zip-Bugs@LISTS.WKU.EDU, jaltman@columbia.edu, JCHBN@CUVMB.CC.COLUMBIA.EDU,
        JRD@cc.usu.edu
Subject: Re: Info-ZIP and C-Kermit
In-Reply-To: Your message of Tue, 27 Aug 1996 13:26:17 -0500 (CDT)
Message-ID: <CMM.0.90.4.841171812.fdc@watsun.cc.columbia.edu>

> Not so.  As long as you can specify accurately which is which, UnZip can
> do translations on the fly.  (Of course, since I currently have Kermit
> pegged as the world's second most portable program and UnZip as third,
> I'm assuming you have a few ports that we don't, and those could be a
> problem.)
> 
> Note that Zip currently *tries* to identify text vs. binary, but it fails
> every now and then (especially on Windoze DLLs--it's only about 70% accurate
> there, vs. maybe 97% on most files--at least for me).  If you choose to make
> use of Info-ZIP code directly, that's one area where you'd want to modify
> the existing code.
> 
But even if it tries (and misses sometimes) in the DOS/Windows environment,
what about UNIX, OS-9, QNX, AOS/VS, VOS, VM/CMS, MVS/TSO, CICS, RT-11, ... ?
There is no way on earth to embody all that knowledge in one piece of
software.  And then add in the fact that you might be looking at foreign
files rather than domestic ones.  It's hopeless with any degree of reliability.

(In the latest version of C-Kermit, we let the user build lists of files to
be transferred -- the list can be any length at all, and each member of the
list can be tagged as binary or text.)

> > When transferring between LIKE systems, ZIPping is a good way to solve this
> > problem, but otherwise it is likely to introduce more problems (corrupted
> > files) than it solves.
> 
> Erm...well, nothing personal, but Kermit's "assume it's text unless told
> otherwise" behavior has introduced plenty of problems and corrupted files
> itself...
> 
Exactly the point -- if you assume text, you corrupt the binary files; if you
assume binary, you corrupt the text files.  Kermit 95, by the way, uses binary
by default, since most Windows users are transferring only ZIP, GIF, or JPG
files.  But the real issue is what to do about a mixture of file types.

Many people say it is better to just always transfer in binary mode, because
at least then you don't corrupt the binary files (ZIP, GIF, tar.gz, etc), and
even though you do corrupt the text files, the corruption is recoverable by
knowledgeable people.  I don't share that point of view, however, because (a)
an ever-increasing percentage of computer users are *not* knowledgeable, and
(b) text-file transfer converts not only the record format, but also the
character-sets, a consideration that most English speakers (or non-users of
IBM mainframes) tend to overlook.

> So far, the Unix filesystem is *the* most encompassing of any we've 
> encountered.  The / issue does come up on Acorn RISC OS, but one could
> flag those filenames specially and swap with dots (which are what they
> use for directory separators).  The real problem comes when you move a
> rich directory system to a lousy (DOS) or non-existent (TOPS-20) one
>
Hey!  TOPS-20 was just like VMS.  Nonexistent would be HP-1000 or OS/360 :-)
(although granted, TOPS-20 does not exist *now* except in Mark Crispin's
basement...)

> and have to deal with all sorts of translations, truncations and name-
> collisions.  One way or the other, user input is necessary then.
> 
Right.  That's why this is such a hard problem -- and why we've been putting
it off for so long :-)

Maybe the UNIX model is adequate in the sense it is a superset of all others,
but I haven't done a broad enough study to reach that conclusion yet.  For
example, are there other kinds of directory organizations that are more
complex than a simple tree?  Certainly there are file organizations that are
WAY more complex than a stream.

- Frank

27-Aug-96 19:01:41-GMT,2345;000000000001
Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.7.5/8.7.3) id PAA07527; Tue, 27 Aug 1996 15:00:52 -0400 (EDT)
Date: Tue, 27 Aug 96 15:00:52 EDT
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: Joe Doupnik <JRD@cc.usu.edu>
Cc: ROE2@MIDWAY.UCHICAGO.EDU, ZIP-BUGS@LISTS.WKU.EDU,
        JCHBN@CUVMB.CC.COLUMBIA.EDU
Subject: Re: Info-ZIP and C-Kermit
In-Reply-To: Your message of Tue, 27 Aug 1996 12:43:37 -0600 (MDT)
Message-ID: <CMM.0.90.4.841172452.fdc@watsun.cc.columbia.edu>

> 	Well, by everyone's admission filenames alone are insufficient to 
> distinguish binary from text files. Looking at the first N bytes (whether
> two for MZ or 512) is insufficient also (lines don't have to break at 80
> columns). On machines with rich attributes those attributes are insufficient.
> In short, what's text and what's binary is "in the eyes of the beholder",
> and a particular beholder isn't predictable (or fixed) in advance.
> 	The conclusion is glaringly obvious, isn't it. People must choose 
> because machines cannot. And that, I believe, is the end of this story.
> 
Absolutely, at least for most operating systems.  Personally, I think it would
be interesting to design a new file system in which files could be tagged as
text or binary -- by the application that created them, and also (as an
override) by their (human) owners.  When files were text, I think it would
also be a great boon to humankind if they could be tagged as to the character
set of encoding.  To my knowledge, no file system has ever provided such a
service in the file system itself.  "Binary" would mean, simply, that if the
file is to be transferred to another computer, no conversions should be done.
Conversely, "text" would mean that record-format and character-set conversions
should be done.

To reinforce Joe's point about inspection: a long time ago, somebody wrote a
Kermit program that decided if a file was text or binary based on whether it
contained any bytes with the 8th bit turned on.  Sombody else checked for lots
control characters.  None of this works when a text file is Latin-1, CP850,
UNICODE, Hebrew, Russian, Japanese, etc.  On the other hand, I have seen Intel
executables composed of only printable ASCII characters.  (We have one in the
Kermit archives somewhere but I can't put my finger on it at the moment...)

- Frank

27-Aug-96 22:35:23-GMT,3278;000000000011
Received: from haven.uchicago.edu (haven.uchicago.edu [128.135.12.3]) by watsun.cc.columbia.edu (8.7.5/8.7.3) with ESMTP id SAA23564 for <fdc@watsun.cc.columbia.edu>; Tue, 27 Aug 1996 18:35:22 -0400 (EDT)
Received: from ellis.uchicago.edu (roe2@ellis.uchicago.edu [128.135.12.62]) by haven.uchicago.edu (8.7.5/8.7.3) with ESMTP id RAA10648; Tue, 27 Aug 1996 17:29:22 -0500 (CDT)
Received: (from roe2@localhost) by ellis.uchicago.edu (8.7.1/8.7.2) id RAA04577; Tue, 27 Aug 1996 17:31:32 -0500 (CDT)
Date: Tue, 27 Aug 1996 17:31:32 -0500 (CDT)
From: Cave Newt <roe2@midway.uchicago.edu>
Message-Id: <199608272231.RAA04577@ellis.uchicago.edu>
To: JRD@cc.usu.edu, fdc@watsun.cc.columbia.edu
Subject: Re: Info-ZIP and C-Kermit
Cc: JCHBN@CUVMB.CC.COLUMBIA.EDU, zip-bugs@lists.wku.edu

[I got interrupted in writing this, and now I see there are other
 messages in the thread, so apologies if there's some duplication.]

> Kermit 95, by the way, uses binary
> by default, since most Windows users are transferring only ZIP, GIF, or JPG
> files.

Oh, cool.  I'll have to make note of that in our docs.

> Many people say it is better to just always transfer in binary mode, because
> at least then you don't corrupt the binary files (ZIP, GIF, tar.gz, etc), and
> even though you do corrupt the text files, the corruption is recoverable by
> knowledgeable people.  I don't share that point of view, however, because (a)
> an ever-increasing percentage of computer users are *not* knowledgeable, and
> (b) text-file transfer converts not only the record format, but also the
> character-sets, a consideration that most English speakers (or non-users of
> IBM mainframes) tend to overlook.

I don't consider non-conversion of text files to be "corruption"; it's
simply non-translation (or deferred translation, if you prefer).  On
the contrary, translation from an 8-bit (or more) character set to 7-bit
(or even to another 8-bit set) can introduce irreversible corruption
simply due to the fact that there's no one-to-one mapping from the first
set into the second.  You can never win in that regard; you simply do
your best and hope it's good enough.

(Related anecdote:  there's no way for a text-mode utility to do such
translations under Linux since X provides a Latin-1 character set and
Linux virtual consoles [I'm pretty sure] use the IBM PC character set.
One or the other will always be wrong for those using extended/foreign 
characters.)

> Hey!  TOPS-20 was just like VMS.

Similar, but my understanding is that (1) there's only one level of 
subdirectories, and (2) you have to specify how many files will go 
into it when you create it.  Perhaps I'm mistaken about (1)?  Frank
somebody from WSMR was our source of info on TOPS-20.

> (although granted, TOPS-20 does not exist *now* except in Mark Crispin's
> basement...)

CompuServe and White Sands both canned theirs?

> For example, are there other kinds of directory organizations that are more
> complex than a simple tree?

I'm not aware of any (unless you think of VMS indexed-whatever files as
a weird kind of directory system).  There can be extra attributes associated
with directories, but zipfiles (can) store directories as separate entries
with separate "resource forks" of essentially arbitrary size.

Greg

27-Aug-96 22:49:14-GMT,1948;000000000001
Received: from haven.uchicago.edu (haven.uchicago.edu [128.135.12.3]) by watsun.cc.columbia.edu (8.7.5/8.7.3) with ESMTP id SAA25251 for <fdc@watsun.cc.columbia.edu>; Tue, 27 Aug 1996 18:49:13 -0400 (EDT)
Received: from ellis.uchicago.edu (roe2@ellis.uchicago.edu [128.135.12.62]) by haven.uchicago.edu (8.7.5/8.7.3) with ESMTP id RAA11228; Tue, 27 Aug 1996 17:46:02 -0500 (CDT)
Received: (from roe2@localhost) by ellis.uchicago.edu (8.7.1/8.7.2) id RAA06017; Tue, 27 Aug 1996 17:46:43 -0500 (CDT)
Date: Tue, 27 Aug 1996 17:46:43 -0500 (CDT)
From: Cave Newt <roe2@midway.uchicago.edu>
Message-Id: <199608272246.RAA06017@ellis.uchicago.edu>
To: JRD@cc.usu.edu, fdc@watsun.cc.columbia.edu
Subject: Re: Info-ZIP and C-Kermit
Cc: JCHBN@CUVMB.CC.COLUMBIA.EDU, zip-bugs@lists.wku.edu

> Absolutely, at least for most operating systems.  Personally, I think it would
> be interesting to design a new file system in which files could be tagged as
> text or binary -- by the application that created them, and also (as an
> override) by their (human) owners.  When files were text, I think it would
> also be a great boon to humankind if they could be tagged as to the character
> set of encoding.  To my knowledge, no file system has ever provided such a
> service in the file system itself.  "Binary" would mean, simply, that if the
> file is to be transferred to another computer, no conversions should be done.
> Conversely, "text" would mean that record-format and character-set conversions
> should be done.

Amen.  Please feel free to design it, patent it, sell it, whatever it takes.
That would be a dream file system for Zip/UnZip.

> On the other hand, I have seen Intel
> executables composed of only printable ASCII characters.  (We have one in the
> Kermit archives somewhere but I can't put my finger on it at the moment...)

Uuencode.  It's distributed as part of the c.b.i.p starter kit regularly
(or formerly regularly).  Very cute.

Greg

28-Aug-96  5:52:44-GMT,2136;000000000011
Received: from ars.de (firewall.ars.de [194.97.120.113]) by watsun.cc.columbia.edu (8.7.5/8.7.3) with SMTP id BAA26458 for <fdc@WATSUN.CC.COLUMBIA.EDU>; Wed, 28 Aug 1996 01:52:43 -0400 (EDT)
Received: from jonas.ars.de by ars.de (IBM OS/2 SENDMAIL VERSION 1.3.17/3.0ars)
	  id AA0151; Wed, 28 Aug 96 07:51:07 +0200
Received: by internal-host.ars.de (IBM OS/2 SENDMAIL VERSION 1.3.17/3.0ars)
	  id AA1296; Wed, 28 Aug 96 07:51:05 +0200
Message-Id: <9608280551.AA1296@internal-host.ars.de>
Date: Wed, 28 Aug 96 7:51:04 +0100
From: Kai Uwe Rommel <rommel@ars.de>
Subject: Re: Info-ZIP and C-Kermit
To: fdc@WATSUN.CC.COLUMBIA.EDU
Cc: roe2@midway.uchicago.edu, Zip-Bugs@LISTS.WKU.EDU, jaltman@columbia.edu,
        JCHBN@CUVMB.CC.COLUMBIA.EDU, JRD@cc.usu.edu
In-Reply-To: <CMM.0.90.4.841171812.fdc@watsun.cc.columbia.edu> from "Frank da Cruz" at Aug 27 96 2:50 pm
X-Mailer: ELM [version 2.3 PL11] for OS/2

You (Frank da Cruz) wrote:
> > > When transferring between LIKE systems, ZIPping is a good way to solve this
> > > problem, but otherwise it is likely to introduce more problems (corrupted
> > > files) than it solves.
> > 
> > Erm...well, nothing personal, but Kermit's "assume it's text unless told
> > otherwise" behavior has introduced plenty of problems and corrupted files
> > itself...
> > 
> Exactly the point -- if you assume text, you corrupt the binary files; if you
> assume binary, you corrupt the text files.  

I beg to differ. If you assume binary, you do _not_ corrupt the text
files. You just don't convert them, leaving this extra step to the
user (and extra tools).

> Kermit 95, by the way, uses binary
> by default, since most Windows users are transferring only ZIP, GIF, or JPG
> files.

I would vote for making binary the default for ALL platforms.

Kai Uwe

--
/* Kai Uwe Rommel      ARS Computer & Consulting GmbH, Muenchen, Germany *
 * rommel@ars.de             CompuServe 100265,2651, Fax +49 89 324 4524 *
 * rommel@leo.org (ftp://ftp.leo.org/pub/comp/os/os2 maintenance)        */

DOS ... is still a real mode only non-reentrant interrupt
handler, and always will be.                -Russell Williams

28-Aug-96 15:46:31-GMT,1886;000000000001
Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.7.5/8.7.3) id LAA15123; Wed, 28 Aug 1996 11:46:00 -0400 (EDT)
Date: Wed, 28 Aug 96 11:46:00 EDT
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: Kai Uwe Rommel <rommel@ars.de>
Cc: roe2@midway.uchicago.edu, Zip-Bugs@LISTS.WKU.EDU, jaltman@columbia.edu,
        JCHBN@CUVMB.CC.COLUMBIA.EDU, JRD@cc.usu.edu
Subject: Re: Info-ZIP and C-Kermit
In-Reply-To: Your message of Wed, 28 Aug 96 7:51:04 +0100
Message-ID: <CMM.0.90.4.841247160.fdc@watsun.cc.columbia.edu>

> I would vote for making binary the default for ALL platforms.
> 
Surely not IBM mainframes?  :-)

Seriously, this has been proposed many times, and it makes more sense now
than before.  But in our limited experience with Kermit 95's new binary-mode
default, we are seeing new problems.  A common example is when people export
application (word processor, spreadsheet, database) files for transfer, and
then can't import them again because the records are in the wrong format.
Or they can import them, but the accented/non-Roman characters are wrong.

So, with the stupid file systems that are in use today -- and I maintain that
ALL of them are stupid -- indeed you can't win.  (Perhaps "stupid" is not the
right word -- the idea has always been more one of locking people in to a
particular environment, rather than promoting diversity and interchange.)

- Frank

P.S. MS-DOS Kermit 3.15, C-Kermit 6.0, and IBM Mainframe Kermit 4.2.2 have a
new feature which, at least, lets the two Kermit programs recognize that they
are running on "like" computers (if they are) and therefore to switch into
binary mode automatically, even if their default transfer mode is text.  So
this should go a long way towards satisfying many people.  In this regard,
Kermit has an advantage over ZIP programs, because ZIP programs never know
where the ZIP file might be unzipped.

28-Aug-96 16:37:52-GMT,1481;000000000001
Received: from barney.usu.edu (barney.usu.edu [129.123.1.89]) by watsun.cc.columbia.edu (8.7.5/8.7.3) with ESMTP id MAA23445 for <fdc@watsun.cc.columbia.edu>; Wed, 28 Aug 1996 12:37:51 -0400 (EDT)
Received: from cc.usu.edu by cc.usu.edu (PMDF V5.0-5 #11556)
 id <01I8T13PB1T2CE2UCT@cc.usu.edu>; Wed, 28 Aug 1996 10:37:37 -0600 (MDT)
Date: Wed, 28 Aug 1996 10:37:37 -0600 (MDT)
From: Joe Doupnik <JRD@cc.usu.edu>
Subject: Re: Info-ZIP and C-Kermit
To: fdc@watsun.cc.columbia.edu
Cc: ROE2@MIDWAY.UCHICAGO.EDU, JCHBN@CUVMB.CC.COLUMBIA.EDU,
        ZIPS-BUGS@LISTS.WKU.EDU, JALTMAN@COLUMBIA.EDU
Message-id: <01I8T13PB2RCCE2UCT@cc.usu.edu>
X-VMS-To: IN%"fdc@watsun.cc.columbia.edu"
X-VMS-Cc: 
 ROE2@MIDWAY.UCHICAGO.EDU,JCHBN@CUVMB.CC.COLUMBIA.EDU,ZIPS-BUGS@LISTS.WKU.EDU,JALTMAN@COLUMBIA.EDU,JRD
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=US-ASCII
Content-transfer-encoding: 7BIT

	I remain surprized at the "text mode" bias in all this thinking.
That is, presuming that binary also means "stream of bytes." That's naive.
Anyone heard of byte order of numerical quantities? Have a look at the
ponderous NFS RPC material to see the trouble one gets into trying to
describe information, and that stuff covers only the easy items.
	Binary simply isn't transportable between unlike systems, in 
general, because the systems differ in representation. Big/little
endian, for example, not to mention "word" size.
	Those fixating on byte stream should not get into this game.
	Joe D.

28-Aug-96 18:11:45-GMT,3722;000000000011
Received: from haven.uchicago.edu (haven.uchicago.edu [128.135.12.3]) by watsun.cc.columbia.edu (8.7.5/8.7.3) with ESMTP id OAA08199 for <fdc@watsun.cc.columbia.edu>; Wed, 28 Aug 1996 14:11:42 -0400 (EDT)
Received: from ellis.uchicago.edu (roe2@ellis.uchicago.edu [128.135.12.62]) by haven.uchicago.edu (8.7.5/8.7.3) with ESMTP id NAA11827; Wed, 28 Aug 1996 13:10:39 -0500 (CDT)
Received: (from roe2@localhost) by ellis.uchicago.edu (8.7.1/8.7.2) id NAA04942; Wed, 28 Aug 1996 13:12:49 -0500 (CDT)
Date: Wed, 28 Aug 1996 13:12:49 -0500 (CDT)
From: Cave Newt <roe2@midway.uchicago.edu>
Message-Id: <199608281812.NAA04942@ellis.uchicago.edu>
To: fdc@watsun.cc.columbia.edu
Subject: Re: Info-ZIP and C-Kermit
Cc: JCHBN@CUVMB.CC.COLUMBIA.EDU, JRD@cc.usu.edu, rommel@ars.de

Frank wrote:

> Any Kermit program can have its default transfer mode be binary -- the user
> just puts a "set file type binary" command in her/his initialization file; or
> the site puts one in the site-wide initialization file.  Kermit is totally
> (and almost infinitely) customizable.

And you accuse *me* of a developer's bias? :-) * 0.5

> Not with Kermit.  Kermit is an extremely common method of accessing Linux,
> and it can easily be adjusted to either case.

I was wrong anyway.  Linux uses a Latin-1 code page even for its virtual 
consoles (which should have been obvious, given its origins).

> Frank Wancho?  (I managed TOPS-20 systems for about 10 years.)

Yes, Wancho.  Huh...wonder why he punted on the file-system info.  I
*know* he claimed you had to know the directory size at creation time.
But, as you say, it's irrelevant.  I guess there's no point in hanging
onto the partial TOPS-20 support in UnZip, either.

> But then do you have a standard and transportable way of mapping complex
> structural information between unlike systems?

No, of course not, but at least you can transport it and recreate it on
like systems.

> No matter; this discussion is mostly academic -- I don't see us going anywhere
> with it for a while, and by that time, maybe ALL file systems will have
> devolved to the simple DOS/UNIX model and then we won't have to worry about it
> any more, except for deep questions like how to turn backslashes around :-)

One can only hope (or fear)... :-)

> this should go a long way towards satisfying many people.  In this regard,
> Kermit has an advantage over ZIP programs, because ZIP programs never know
> where the ZIP file might be unzipped.

Au contraire!  You're right that Zip doesn't know where it will be 
unzipped, but UnZip knows where it was zipped, and that amounts to the
same thing.  (Indeed, that fact is the sole reason I took the steps that
eventually led to my leadership of the UnZip project six years ago...)

Joe wrote:

> 	Binary simply isn't transportable between unlike systems, in 
> general, because the systems differ in representation. Big/little
> endian, for example, not to mention "word" size.

That's only true for poorly designed binary file formats.  There's really
no reason for any format designed since 1985 not to specify precisely what
byte order is used and what each of the field sizes is.  This is true even
for formats that may use different sizes and byte orders on each platform
for maximum I/O efficiency; there's no excuse for not including a fixed,
well-defined header that specifies the key parameters of the rest of the
file.  FORTRAN binary files are a prime example of how *not* to do it.

For non-stream things like VMS database file formats, there's inherently
no portability, so it's not so much an issue there.

Greg

P.S.  I've removed zip-bugs from the recipient list since this has become
      largely an academic exercise in nitpicking. :-)

28-Aug-96 18:26:12-GMT,2336;000000000001
Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.7.5/8.7.3) id OAA09997; Wed, 28 Aug 1996 14:25:19 -0400 (EDT)
Date: Wed, 28 Aug 96 14:25:18 EDT
From: Frank da Cruz <fdc@watsun.cc.columbia.edu>
To: Cave Newt <roe2@midway.uchicago.edu>
Cc: JCHBN@CUVMB.CC.COLUMBIA.EDU, JRD@cc.usu.edu, rommel@ars.de
Subject: Re: Info-ZIP and C-Kermit
In-Reply-To: Your message of Wed, 28 Aug 1996 13:12:49 -0500 (CDT)
Message-ID: <CMM.0.90.4.841256718.fdc@watsun.cc.columbia.edu>

> Yes, Wancho.  Huh...wonder why he punted on the file-system info.  I
> *know* he claimed you had to know the directory size at creation time.
> But, as you say, it's irrelevant.  I guess there's no point in hanging
> onto the partial TOPS-20 support in UnZip, either.
> 
Nevertheless, I always thought it would be a cool idea to have a "virtual
museum" on the Internet, where people could see some of the old and
influential operating systems -- TOPS-10, TOPS-20, MULTICS, ITS, etc.  (Even
RT-11, which CP/M and DOS were largely copied from :-)

Obviously impractical given the expense involved of maintaining the hardware,
let alone the access and security problems it would raise.  But an awful lot
of good ideas have gone out the window and soon nobody will remember anything
but MS Windows, and will think that is the definition of an operating system.

In my recent "travels", I've come across a few real gems that are still up and
running that will disappear forever the next time the hardware breaks because
nobody wants to pay to keep them going.  For example, just for the perverse
heck of it, I ported C-Kermit (and I use "port" in the real sense of doing a
lot of work, not the vulgar sense of "recompiling" :-) to Bell Research UNIX
V10, which I guarantee no more than two people in the world are still using.

> Au contraire!  You're right that Zip doesn't know where it will be 
> unzipped, but UnZip knows where it was zipped, and that amounts to the
> same thing.  (Indeed, that fact is the sole reason I took the steps that
> eventually led to my leadership of the UnZip project six years ago...)
> 
Ah, great minds...

> I've removed zip-bugs from the recipient list since this has become
> largely an academic exercise in nitpicking. :-)
> 
No, it's just having a little fun -- something increasingly hard to come by
these days...

- Frank

28-Aug-96 18:57:36-GMT,2575;000000000001
Received: from haven.uchicago.edu (haven.uchicago.edu [128.135.12.3]) by watsun.cc.columbia.edu (8.7.5/8.7.3) with ESMTP id OAA15730 for <fdc@watsun.cc.columbia.edu>; Wed, 28 Aug 1996 14:57:35 -0400 (EDT)
Received: from ellis.uchicago.edu (roe2@ellis.uchicago.edu [128.135.12.62]) by haven.uchicago.edu (8.7.5/8.7.3) with ESMTP id NAA14460 for <fdc@watsun.cc.columbia.edu>; Wed, 28 Aug 1996 13:56:42 -0500 (CDT)
Received: (from roe2@localhost) by ellis.uchicago.edu (8.7.1/8.7.2) id NAA09091; Wed, 28 Aug 1996 13:58:53 -0500 (CDT)
Date: Wed, 28 Aug 1996 13:58:53 -0500 (CDT)
From: Cave Newt <roe2@midway.uchicago.edu>
Message-Id: <199608281858.NAA09091@ellis.uchicago.edu>
To: fdc@watsun.cc.columbia.edu
Subject: Re: Info-ZIP and C-Kermit

Frank,

> Nevertheless, I always thought it would be a cool idea to have a "virtual
> museum" on the Internet, where people could see some of the old and
> influential operating systems -- TOPS-10, TOPS-20, MULTICS, ITS, etc.  (Even
> RT-11, which CP/M and DOS were largely copied from :-)

Hey, I used RT-11 on a 1978-vintage Terak (PDP8-based, I think) in college!
8-inch diskettes, funky screen with separate text and graphics memory, and 
a random-access Diablo daisy-wheel printer that we used to create six-foot-
long traces of magnetospheric phenomena.  It had 64KB (32KW) of RAM, as I 
recall, and was quite a step up from the 8KB Commodore PETs we had in high
school...  Amazingly enough, they *still* had Teraks in use at NASA Ames
ca. 1991.

> Obviously impractical given the expense involved of maintaining the hardware,
> let alone the access and security problems it would raise.  But an awful lot
> of good ideas have gone out the window and soon nobody will remember anything
> but MS Windows, and will think that is the definition of an operating system.

Shudder...

> For example, just for the perverse
> heck of it, I ported C-Kermit (and I use "port" in the real sense of doing a
> lot of work, not the vulgar sense of "recompiling" :-) to Bell Research UNIX
> V10, which I guarantee no more than two people in the world are still using.

Ooo, I'd love to take a crack at UnZip on a system like that. :-)  (Well,
maybe not.  It's not like there aren't a billion other projects waiting
for me.)

> Ah, great minds...

Or at least Dilbert-like ones...

> No, it's just having a little fun -- something increasingly hard to come by
> these days...

Ah, but I and many of the luna^H^H^H^Hengineers with whom I associate
*equate* academic nitpicking with fun--consider this sentence, for
example. ;-)

Regards,
  Greg