TStream descendant to compress/decompress gzip data from another stream

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

TStream descendant to compress/decompress gzip data from another stream

Michalis Kamburelis-3
Hi,

Like the subject says, I'm looking for a TStream implementation that
takes another TStream and compresses/decompresses data in gzip format. I
would like to read/write gzip data to a stream, any TStream (maybe
TFileStream, maybe TMemoryStream, maybe a stream from the network like
TSocketStream, etc.).

One would think that the ZStream unit will contain something like that,
but it does not. ZStream has:

- TCompressionStream and TDecompressionStream, but they use the "deflate
format", which is different than "gzip format" (gzip format at least
adds some headers around). So you can't use them to directly read/write
data from a xxx.gz file.

- TGZFileStream, which does use the desired "gzip format"... but it can
only read local files (constructor takes just a FileName parameter). It
cannot read from/to another stream.

Is there a stream class that satisfies both these conditions
simultaneously (uses "gzip format" and can read/write to another TStream)?

Offtopic: I probably would avoid the ZStream unit anyway, as it uses
PasZLib, which is a Pascal rewrite of the old zlib code. I would prefer
to rather use ZLib unit (that links to a Zlib library, usually in
so/dll/dylib) instead. Reasons are security (the C zlib library is
probably more maintained than gory internals of paszlib) and because I
have to distribute Zlib library anyway (libpng depends on it, and I use
libpng).

Google found an old thread on lazarus mailing list about this ("FPC,
gzip and stream") but without any solution, everything mentioned there
has either the limitations of TCompressionStream/TDecompressionStream
(no gzip format) or TGZFileStream (not able to work wit ObjectPascal
streams).

Thanks in advance for answers,
Michalis
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: TStream descendant to compress/decompress gzip data from another stream

José Mejuto
El 27/03/2013 19:22, Michalis Kamburelis escribió:

> Google found an old thread on lazarus mailing list about this ("FPC,
> gzip and stream") but without any solution, everything mentioned there
> has either the limitations of TCompressionStream/TDecompressionStream
> (no gzip format) or TGZFileStream (not able to work wit ObjectPascal
> streams).
>

Hello,

.gz is a quite simple format, but it can not be implemented as a TStream
(only) descendant because in a single .gz file many files could be added
so something like the class to handle .zip files should be used.

Parsing the .gz header is easy, and calling TCompressionStream and
TDecompressionstream over the payload should be easy, but only for the
first file in the .gz

--

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: TStream descendant to compress/decompress gzip data from another stream

Michalis Kamburelis-3
José Mejuto wrote:

> El 27/03/2013 19:22, Michalis Kamburelis escribió:
>
>> Google found an old thread on lazarus mailing list about this ("FPC,
>> gzip and stream") but without any solution, everything mentioned there
>> has either the limitations of TCompressionStream/TDecompressionStream
>> (no gzip format) or TGZFileStream (not able to work wit ObjectPascal
>> streams).
>>
>
> Hello,
>
> .gz is a quite simple format, but it can not be implemented as a TStream
> (only) descendant because in a single .gz file many files could be added
> so something like the class to handle .zip files should be used.

The existing TGZFileStream in ZStream unit already shows that it can be
implemented, i.e. TGZFileStream reads gzip data and is a TStream
descendant. I assume that TGZFileStream simply returns the first file
inside a multi-file gzip (otherwise it would need to call gzopen
multiple times, as far as I understand).

This is also my use case: I know that my .gz files should contain only
one file (X3D standard guarantees it for me), I want to ignore the
remaining files (if any).

>
> Parsing the .gz header is easy, and calling TCompressionStream and
> TDecompressionstream over the payload should be easy, but only for the
> first file in the .gz
>

I will probably have to implement it then. I was hoping to have
something available ready, since ZStream contains things that are *so*
close (but not quite) to what I want :)

Thanks,
Michalis
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: TStream descendant to compress/decompress gzip data from another stream

Ewald-2
In reply to this post by José Mejuto

On 27 Mar 2013, at 19:54, José Mejuto wrote:

> El 27/03/2013 19:22, Michalis Kamburelis escribió:
>
>> Google found an old thread on lazarus mailing list about this ("FPC,
>> gzip and stream") but without any solution, everything mentioned there
>> has either the limitations of TCompressionStream/TDecompressionStream
>> (no gzip format) or TGZFileStream (not able to work wit ObjectPascal
>> streams).
>>
>
> Hello,
>
> .gz is a quite simple format, but it can not be implemented as a TStream (only) descendant because in a single .gz file many files could be added so something like the class to handle .zip files should be used.

Sorry to just drop in on this quite late, but isn't gzip  a compression algorithm and not a file format as such? gzip (the command line utility) only compresses one file and *doesn't* put this in a multi-file container. To get `multi-file gzips`, you will first want to bundle the files and compress this bundle (files -> tar -> gzip) or compress the files separately and then bundle them together (files -> multiple separate gzipped files -> tar). Or are we talking about a different gzip here?

So basically (to answer to the subject of this mail), you will only need a stream that implements this algorithm, which you could quite easily write yourself. Take a look at zlibar, it's a project that iirc implements an archive format that uses gzip. Now there are two functions in there that compress and respectively decompress a stream and return the result, also as a TStream descendant. If you cannot find this easily just say it, I've got the code (or at the very least these two functions) lying around.

If I just said anything anyone has already mentioned, please forgive me, I've only just read this message and thought to elaborate a bit on it :-)


--
Ewald

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: TStream descendant to compress/decompress gzip data from another stream

José Mejuto
El 28/03/2013 1:06, Ewald escribió:

>>> Google found an old thread on lazarus mailing list about this ("FPC,
>>> gzip and stream") but without any solution, everything mentioned there
>>> has either the limitations of TCompressionStream/TDecompressionStream
>>> (no gzip format) or TGZFileStream (not able to work wit ObjectPascal
>>> streams).
>>
>> Hello,
>>
>> .gz is a quite simple format, but it can not be implemented as a TStream (only) descendant because in a single .gz file many files could be added so something like the class to handle .zip files should be used.
>
> Sorry to just drop in on this quite late, but isn't gzip  a compression algorithm and not a file format as such? gzip (the command line utility) only compresses one file and *doesn't* put this in a multi-file container. To get `multi-file gzips`, you will first want to bundle the files and compress this bundle (files -> tar -> gzip) or compress the files separately and then bundle them together (files -> multiple separate gzipped files -> tar). Or are we talking about a different gzip here?
>

Hello,

Just quoting the RFC1952 about .gz format:

--- http://tools.ietf.org/html/rfc1952 --------

2.2. File format

       A gzip file consists of a series of "members" (compressed data
       sets).  The format of each member is specified in the following
       section.  The members simply appear one after another in the file,
       with no additional information before, between, or after them.

-----------------------------------------------

So I think it is legal to concatenate several .gz files and get a final
.gz with several files inside.

In the other hand, yes, the usual behavior in .gz is to store only one file.

--

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: TStream descendant to compress/decompress gzip data from another stream

etrusco
On Thu, Mar 28, 2013 at 8:51 AM, José Mejuto <[hidden email]> wrote:

> El 28/03/2013 1:06, Ewald escribió:
>
>
>>>> Google found an old thread on lazarus mailing list about this ("FPC,
>>>> gzip and stream") but without any solution, everything mentioned there
>>>> has either the limitations of TCompressionStream/TDecompressionStream
>>>> (no gzip format) or TGZFileStream (not able to work wit ObjectPascal
>>>> streams).
>>>
>>>
>>> Hello,
>>>
>>> .gz is a quite simple format, but it can not be implemented as a TStream
>>> (only) descendant because in a single .gz file many files could be added so
>>> something like the class to handle .zip files should be used.
>>
>>
>> Sorry to just drop in on this quite late, but isn't gzip  a compression
>> algorithm and not a file format as such? gzip (the command line utility)
>> only compresses one file and *doesn't* put this in a multi-file container.
>> To get `multi-file gzips`, you will first want to bundle the files and
>> compress this bundle (files -> tar -> gzip) or compress the files separately
>> and then bundle them together (files -> multiple separate gzipped files ->
>> tar). Or are we talking about a different gzip here?
>>
>
> Hello,
>
> Just quoting the RFC1952 about .gz format:
>
> --- http://tools.ietf.org/html/rfc1952 --------
>
> 2.2. File format
>
>       A gzip file consists of a series of "members" (compressed data
>       sets).  The format of each member is specified in the following
>       section.  The members simply appear one after another in the file,
>       with no additional information before, between, or after them.
>
> -----------------------------------------------
>
> So I think it is legal to concatenate several .gz files and get a final .gz
> with several files inside.
>
> In the other hand, yes, the usual behavior in .gz is to store only one file.
>
> --

"Members" refer to each available section according to the flags.
Re-read this whole paragraph you posted and a few following you'll
realize only one "file" is allowed in a gzip file/blob.

-Flávio
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: TStream descendant to compress/decompress gzip data from another stream

Michalis Kamburelis-3
Flávio Etrusco wrote:
> "Members" refer to each available section according to the flags.
> Re-read this whole paragraph you posted and a few following you'll
> realize only one "file" is allowed in a gzip file/blob.
>

I think this confusion comes from the fact that
http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage says
clearly that "Multiple compressed files can be concatenated". It doesn't
mean that gzip is a good format to keep multiple-file archive (for this
you should use tar), as multiple compressed files will be decompressed
to a single stream on output.

Michalis
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: TStream descendant to compress/decompress gzip data from another stream

Ewald-2
In reply to this post by José Mejuto
Once upon a time, José Mejuto said:

> El 28/03/2013 1:06, Ewald escribió:
>> Sorry to just drop in on this quite late, but isn't gzip  a
>> compression algorithm and not a file format as such? gzip (the
>> command line utility) only compresses one file and *doesn't* put this
>> in a multi-file container. To get `multi-file gzips`, you will first
>> want to bundle the files and compress this bundle (files -> tar ->
>> gzip) or compress the files separately and then bundle them together
>> (files -> multiple separate gzipped files -> tar). Or are we talking
>> about a different gzip here?
>>
>
> Hello,
>
> Just quoting the RFC1952 about .gz format:
>
> --- http://tools.ietf.org/html/rfc1952 --------
>
> 2.2. File format
>
>       A gzip file consists of a series of "members" (compressed data
>       sets).  The format of each member is specified in the following
>       section.  The members simply appear one after another in the file,
>       with no additional information before, between, or after them.
>
> -----------------------------------------------
>
> So I think it is legal to concatenate several .gz files and get a
> final .gz with several files inside.
>
> In the other hand, yes, the usual behavior in .gz is to store only one
> file.
>

Yeah, you're right indeed. Sorry for the noise.


--
Ewald

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: TStream descendant to compress/decompress gzip data from another stream

Ewald-2
In reply to this post by etrusco
Once upon a time, Flávio Etrusco said:

> On Thu, Mar 28, 2013 at 8:51 AM, José Mejuto <[hidden email]> wrote:
>
>> Hello,
>>
>> Just quoting the RFC1952 about .gz format:
>>
>> --- http://tools.ietf.org/html/rfc1952 --------
>>
>> 2.2. File format
>>
>>       A gzip file consists of a series of "members" (compressed data
>>       sets).  The format of each member is specified in the following
>>       section.  The members simply appear one after another in the file,
>>       with no additional information before, between, or after them.
>>
>> -----------------------------------------------
>>
>> So I think it is legal to concatenate several .gz files and get a final .gz
>> with several files inside.
>>
>> In the other hand, yes, the usual behavior in .gz is to store only one file.
>>
>> --
> "Members" refer to each available section according to the flags.
> Re-read this whole paragraph you posted and a few following you'll
> realize only one "file" is allowed in a gzip file/blob.
Alright, now I am contradicting what I said earlier, but there can
apparently be more than one file in a gzipped `thing`: see
https://en.wikipedia.org/wiki/Gzip#File_format . It is not really all
files in one container, but more like concatenating serveral of these
one-file gzipped files, if you see what I mean. Anyway, I've never seen
it happen, so I'm going to leave it at that before I start sounding
silly ;-)

--
Ewald

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal