How to split file of whitespace separated numbers?

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

How to split file of whitespace separated numbers?

Bo Berglund
Is there a quick way to split a string of whitespace separated values
into the separate members?
I have to create a function to process a number of big data files
where
numbers are stored in lines of 4-6 values using whitespace inbetween.
First I got a sample looking like this:
{code}
0.4167    0.3636    -14.1483    227.2260
{code}
Here the separators were 4 spaces so on each line I used (slDecode is
a TStringList):
{code}
  sLine := StringReplace(sLine, '    ', #13, [rfReplaceAll]);
  slDecode.Text := sLine;
{code}
Worked fine if a bit slow...
The stringlist items are then passed to a string to float function and
stored into a dynamic array.

But then it failed on a file containing lines like this:
{code}
   0.000    0.000    7.000    0.000  29.6628
{code}
Here there are 3 leading spaces plus one separator is only 2 spaces
wide. So I had to modify the code:
{code}
  sLine := Trim(sLine);
  sLine := StringReplace(sLine, '    ', #13, [rfReplaceAll]);
  sLine := StringReplace(sLine, '  ', #13, [rfReplaceAll]);
  slDecode.Text := sLine;
{code}

This works in this case but now I realize I need something better,
which can deal with varying number of whitespace chars inbetween
numbers.
The test files are very big, like half a million lines and up, so I
cannot introduce a lot of code in the loop since processing time will
increase.

Is there any good and quick way to extract real data from a space
separated list without knowing beforehand the size of the whitespace
separators?

I guess that my next sample problem will be a file with TAB rather
than space or even mixed TAB and space...

--
Bo Berglund
Developer in Sweden

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

Gabor Boros
2016. 12. 23. 9:14 keltezéssel, Bo Berglund írta:
> Is there a quick way to split a string of whitespace separated values
> into the separate members?

Hi,

I don't know quick or not...

program Project1;

uses Classes;

var
   SL:TStringList;
   i:Integer;

begin
   SL:=TStringList.Create;
   SL.DelimitedText:='   0.000    0.000    7.000    0.000  29.6628';
   for i:=0 to SL.Count-1 do
    begin
      WriteLn('*'+SL.Strings[i]+'*');
    end;
   ReadLn;
end.

Gabor
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

Bo Berglund
On Fri, 23 Dec 2016 10:04:09 +0100, Gabor Boros
<[hidden email]> wrote:

>2016. 12. 23. 9:14 keltezéssel, Bo Berglund írta:
>> Is there a quick way to split a string of whitespace separated values
>> into the separate members?

>   SL.DelimitedText:='   0.000    0.000    7.000    0.000  29.6628';

Thanks,
I did not know that one could do this and get away with it. Believed
one had to set the delimiter first and since it is varying number of
spaces it would not work.
But it seems like it does work!

I applied your method by removing all the code for handling this and
used only the following:
{code}
   ReadLn(F, sLine);
   slDecode.DelimitedText := sLine;
   ....

{code}

I timed my original code for a file of some 660000+ lines to 9.9s.
Result:
Original code takes 9.9 s to process the file.
Modified code takes 4.4 s

And I checked with the file containing the extra spaces and varying
size of whitespace. It too was processed correctly.

Thanks again!

--
Bo Berglund
Developer in Sweden

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

Howard Page-Clark
In reply to this post by Bo Berglund
On 23/12/16 08:14, Bo Berglund wrote:
> Is there a quick way to split a string of whitespace separated values
> into the separate members?
It is possible that a custom string parser (something along these lines)
might improve your processing speed:

type
     TDoubleArray = array of Double;

function StrToDblArray(const aString: string): TDoubleArray;
var
   c: Char;
   prevNumeric: boolean = False;
   sNum: string = '';
   number: double;

   function IsNumeric: boolean; inline;
   begin
     Exit(c in ['.', '0'..'9']);
   end;

begin
   SetLength(Result, 0);
   for c in aString do begin
     case IsNumeric of
       False: if prevNumeric then begin
                 if TryStrToFloat(sNum, number) then begin
                   SetLength(Result, Length(Result) + 1);
                   Result[High(Result)]:=number;
                 end;
                 sNum:='';
                 prevNumeric:=False;
              end;
       True: begin
               sNum:=sNum + c;
               if not prevNumeric then
                 prevNumeric:=True;
             end;
     end;
   end;
   if prevNumeric and TryStrToFloat(sNum, number) then begin
     SetLength(Result, Length(Result) + 1);
     Result[High(Result)]:=number;
   end;
end;
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

Graeme Geldenhuys-6
In reply to this post by Bo Berglund
On 2016-12-23 08:14, Bo Berglund wrote:
> Is there a quick way to split a string of whitespace separated values
> into the separate members?


That problem is perfectly suited for regular expressions. And a rather
simple one at than. The FPC's FCL packages include a regex unit too
which should suite your needs.


http://www.regex101.com/

http://www.regexplained.co.uk/

http://regex.info/
  Even the trial book (first chapter only) of "Mastering Regular
  Expressions" is invaluable for users new to regex. And will
  explain all you need to know to solve your problem.


Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

Sven Barth-2

Am 23.12.2016 12:54 schrieb "Graeme Geldenhuys" <[hidden email]>:
>
> On 2016-12-23 08:14, Bo Berglund wrote:
> > Is there a quick way to split a string of whitespace separated values
> > into the separate members?
>
>
> That problem is perfectly suited for regular expressions. And a rather
> simple one at than. The FPC's FCL packages include a regex unit too
> which should suite your needs.
>
>
> http://www.regex101.com/
>
> http://www.regexplained.co.uk/
>
> http://regex.info/
>   Even the trial book (first chapter only) of "Mastering Regular
>   Expressions" is invaluable for users new to regex. And will
>   explain all you need to know to solve your problem.
>

Regular expressions usually have a higher overhead however (as you might have noticed, Bo timed his code later on).
For example at work I changed a regular expression based parser for the lines of a log file to a simpler one and the speedup was noticeable (I don't have exact numbers anymore however).

Regards,
Sven


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

greim
Hey Kids,

why so complicated?

Good old Niklaus Wirth has already everything done for you:
I have to cite one sentence on the last slide at his birthday colloquium:

"Reducing size and complexity is the triumph"

So READ is already quite clever, it doesn't care about whitespaces,
carriage returns and linefeeds :


PROGRAM readline;

VAR a  : ARRAY[0..1000000] OF double;
     infile : TEXT;
     lauf, lauf2 : longint;

BEGIN
lauf := 0;
assign(infile, 'infile.txt');
reset(infile);
WHILE NOT(eof(infile)) DO

         BEGIN
         read(infile, a[lauf]);
         inc(lauf);
         END;

close(infile);

FOR lauf2 := 0 TO pred(lauf) DO

         BEGIN
         writeln('Index : ', lauf2, ' Value : ', a[lauf2]);
         END;


END.


And here infile.txt:

  123.4   55.2 33.1 4
  12.1 1.1
1 2 3 4
     333.888 444.555

Regards

Markus


















Am 23.12.2016 um 14:06 schrieb Sven Barth:

> Am 23.12.2016 12:54 schrieb "Graeme Geldenhuys"
> <[hidden email] <mailto:[hidden email]>>:
>>
>> On 2016-12-23 08:14, Bo Berglund wrote:
>> > Is there a quick way to split a string of whitespace separated values
>> > into the separate members?
>>
>>
>> That problem is perfectly suited for regular expressions. And a rather
>> simple one at than. The FPC's FCL packages include a regex unit too
>> which should suite your needs.
>>
>>
>> http://www.regex101.com/
>>
>> http://www.regexplained.co.uk/
>>
>> http://regex.info/
>>   Even the trial book (first chapter only) of "Mastering Regular
>>   Expressions" is invaluable for users new to regex. And will
>>   explain all you need to know to solve your problem.
>>
>
> Regular expressions usually have a higher overhead however (as you might
> have noticed, Bo timed his code later on).
> For example at work I changed a regular expression based parser for the
> lines of a log file to a simpler one and the speedup was noticeable (I
> don't have exact numbers anymore however).
>
> Regards,
> Sven
>
>
>
> _______________________________________________
> fpc-pascal maillist  -  [hidden email]
> http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
>

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

Graeme Geldenhuys-6
In reply to this post by Sven Barth-2
On 2016-12-23 13:06, Sven Barth wrote:
> Regular expressions usually have a higher overhead

That is not always a given.

I remember years back we had a similar discussion, but then about syntax
highlighting large code units. eg: The the large OSX related unit in FPC
(can't remember how many MB's in size it was). Lazarus performed okay
syntax highlighting that, but other editors didn't. Everybody was told
that Lazarus did so well, because it "understood the code and syntax".
jEdit, a Java based editor, implements all it's syntax highlighting
(100's of them) all via regex. jEdit was extremely fast, even on that
very large OSX related unit. Even when you jump from the beginning of
the file straight to the end.

For many other things, plain code could be faster, but often a lot more
effort and time consuming to implement. Where as you could have written
a regex expression in under 10 seconds and accomplish the same task 8
lines of code or less - very little effort required.

Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

Sven Barth-2
On 23.12.2016 18:46, Graeme Geldenhuys wrote:
> On 2016-12-23 13:06, Sven Barth wrote:
>> Regular expressions usually have a higher overhead
>
> That is not always a given.

You are aware that I wrote "usually" there?

> For many other things, plain code could be faster, but often a lot more
> effort and time consuming to implement. Where as you could have written
> a regex expression in under 10 seconds and accomplish the same task 8
> lines of code or less - very little effort required.

But sometimes the effort vs performance trade of is worth it. E.g.
opening a log file in 10 seconds vs nearly none make a difference (as I
said, I don't remember the exact speed up anymore, but it was
significant; but also not the only problematic point as originally the
opening of a large enough log file took minutes :P ).

Regards,
Sven

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

Marco van de Voort
In reply to this post by Graeme Geldenhuys-6
In our previous episode, Graeme Geldenhuys said:
> For many other things, plain code could be faster, but often a lot more
> effort and time consuming to implement. Where as you could have written
> a regex expression in under 10 seconds and accomplish the same task 8
> lines of code or less - very little effort required.

Writing or even worse, reading/debugging regex is about the most intensive
effort there is IMHO.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

Graeme Geldenhuys-6
In reply to this post by Sven Barth-2
On 2016-12-23 18:04, Sven Barth wrote:
> E.g.
> opening a log file in 10 seconds vs nearly none make a difference

Again, it depends on the tool (editor) you use. Both jEdit and EditPad
Pro (implement in Delphi) uses regex for syntax highlighting. EditPad
Pro also uses it for file navigation, syntax highlighting tools output,
output/code navigation etc. Both can handle massive text files and both
open them instantly and everything is highlighted from the word go. No
idea how they accomplish that, but that's another story. ;-)

Regards,
  Graeme

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

Graeme Geldenhuys-6
In reply to this post by Marco van de Voort
On 2016-12-23 18:27, Marco van de Voort wrote:
> Writing or even worse, reading/debugging regex is about the most intensive
> effort there is IMHO.

So is standard programming code - if you don't know the syntax or how it
works. ;-)  Also the reason why I posted a couple of links to regex
sites to get the original poster started (in case he doesn't know
regex). Here is another link (by the author of EditPad Pro), who really
knows his regex!

  http://www.regular-expressions.info/tutorial.html

Regards,
  Graeme

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

noreply
In reply to this post by Howard Page-Clark
On Fri, December 23, 2016 4:49 am, Howard Page-Clark wrote:

> On 23/12/16 08:14, Bo Berglund wrote:
>
>> Is there a quick way to split a string of whitespace separated values
>> into the separate members?
> It is possible that a custom string parser (something along these lines)
> might improve your processing speed:
>
> type TDoubleArray = array of Double;
>
>
> function StrToDblArray(const aString: string): TDoubleArray;
> var c: Char;

And as soon as char is involved, unicode gets screwed up....

Am I right, am I right...

But if he is not dealing with any unicode data and it is all simple chars,
should be okay.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

Luiz Americo Pereira Camara-3
In reply to this post by Bo Berglund


Em 23 de dez de 2016 05:15, "Bo Berglund" <[hidden email]> escreveu:
Is there a quick way to split a string of whitespace separated values
into the separate members?


Unir strutils

Wordcount + extractword 

Or

Extractsubstr in loop



Luiz

I have to create a function to process a number of big data files
where
numbers are stored in lines of 4-6 values using whitespace inbetween.
First I got a sample looking like this:
{code}
0.4167    0.3636    -14.1483    227.2260
{code}
Here the separators were 4 spaces so on each line I used (slDecode is
a TStringList):
{code}
  sLine := StringReplace(sLine, '    ', #13, [rfReplaceAll]);
  slDecode.Text := sLine;
{code}
Worked fine if a bit slow...
The stringlist items are then passed to a string to float function and
stored into a dynamic array.

But then it failed on a file containing lines like this:
{code}
   0.000    0.000    7.000    0.000  29.6628
{code}
Here there are 3 leading spaces plus one separator is only 2 spaces
wide. So I had to modify the code:
{code}
  sLine := Trim(sLine);
  sLine := StringReplace(sLine, '    ', #13, [rfReplaceAll]);
  sLine := StringReplace(sLine, '  ', #13, [rfReplaceAll]);
  slDecode.Text := sLine;
{code}

This works in this case but now I realize I need something better,
which can deal with varying number of whitespace chars inbetween
numbers.
The test files are very big, like half a million lines and up, so I
cannot introduce a lot of code in the loop since processing time will
increase.

Is there any good and quick way to extract real data from a space
separated list without knowing beforehand the size of the whitespace
separators?

I guess that my next sample problem will be a file with TAB rather
than space or even mixed TAB and space...

--
Bo Berglund
Developer in Sweden

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

noreply
In reply to this post by Graeme Geldenhuys-6
On Fri, December 23, 2016 12:54 pm, Graeme Geldenhuys wrote:

> On 2016-12-23 18:27, Marco van de Voort wrote:
>
>> Writing or even worse, reading/debugging regex is about the most
>> intensive effort there is IMHO.
>
> So is standard programming code - if you don't know the syntax or how it
> works. ;-)  Also the reason why I posted a couple of links to regex sites
> to get the original poster started (in case he doesn't know regex). Here
> is another link (by the author of EditPad Pro), who really knows his
> regex!
>
> http://www.regular-expressions.info/tutorial.html
>

Next thing todo: implement PERL inside pascal programs, compiled in perl.
Then, realize, why you didn't originally want to go there.... ;-)
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

Mark Morgan Lloyd-5
On 24/12/16 11:30, Lars wrote:

> On Fri, December 23, 2016 12:54 pm, Graeme Geldenhuys wrote:
>> On 2016-12-23 18:27, Marco van de Voort wrote:
>>
>>> Writing or even worse, reading/debugging regex is about the most
>>> intensive effort there is IMHO.
>>
>> So is standard programming code - if you don't know the syntax or how it
>> works. ;-)  Also the reason why I posted a couple of links to regex sites
>> to get the original poster started (in case he doesn't know regex). Here
>> is another link (by the author of EditPad Pro), who really knows his
>> regex!
>>
>> http://www.regular-expressions.info/tutorial.html
>>
>
> Next thing todo: implement PERL inside pascal programs, compiled in perl.
> Then, realize, why you didn't originally want to go there.... ;-)

Or even allow FPC to to call Lua.

I know this is rare and probably wouldn't happen outside the "season of
goodwill", but I actually agree with Graeme here: regexes are useful.
BUT FFS DOCUMENT WHAT YOU'RE DOING FOR PEOPLE WHO DON'T UNDERSTAND THEM!


--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

Sven Barth-2

Am 24.12.2016 12:53 schrieb "Mark Morgan Lloyd" <[hidden email]>:
>
> On 24/12/16 11:30, Lars wrote:
>>
>> On Fri, December 23, 2016 12:54 pm, Graeme Geldenhuys wrote:
>>>
>>> On 2016-12-23 18:27, Marco van de Voort wrote:
>>>
>>>> Writing or even worse, reading/debugging regex is about the most
>>>> intensive effort there is IMHO.
>>>
>>>
>>> So is standard programming code - if you don't know the syntax or how it
>>> works. ;-)  Also the reason why I posted a couple of links to regex sites
>>> to get the original poster started (in case he doesn't know regex). Here
>>> is another link (by the author of EditPad Pro), who really knows his
>>> regex!
>>>
>>> http://www.regular-expressions.info/tutorial.html
>>>
>>
>> Next thing todo: implement PERL inside pascal programs, compiled in perl.
>> Then, realize, why you didn't originally want to go there.... ;-)
>
>
> Or even allow FPC to to call Lua.

You realize that we already have language bindings for Lua somewhere? ;)

Regards,
Sven


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

Luiz Americo Pereira Camara-3
In reply to this post by Marco van de Voort


2016-12-23 15:27 GMT-03:00 Marco van de Voort <[hidden email]>:
In our previous episode, Graeme Geldenhuys said:
> For many other things, plain code could be faster, but often a lot more
> effort and time consuming to implement. Where as you could have written
> a regex expression in under 10 seconds and accomplish the same task 8
> lines of code or less - very little effort required.

Writing or even worse, reading/debugging regex is about the most intensive
effort there is IMHO.


Agree that Regex carries an extra mental overhead. This is why i kept away from it for a long time.

Early this year i needed to use it in one of my projects, so i decided to bite the bullet and read Mastering Regular Expressions book.

Once you understand the reasoning behind regex, it's a lot less intimidating.

These days i use eventually

For coincidence, yesterday, i was writing code to parse raw text to extract some data.

Initially i did manually but when i needed to extract a new field i realized things would get even worse. Than rewrote with regex.


IMO much better.

Not saying that is easy or should be used at will. But once you learn the basics, regex is a valuable tool.

For debugging i use http://regexr.com/ and rely on unit tests to ensure correctness 

Luiz

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

Graeme Geldenhuys-6
In reply to this post by Mark Morgan Lloyd-5
On 2016-12-24 11:52, Mark Morgan Lloyd wrote:
> regexes are useful.

Very much so, and I’m far from being an expert. But I am finding more
and more uses for them, the more I use them.


> BUT FFS DOCUMENT WHAT YOU'RE DOING FOR PEOPLE WHO DON'T UNDERSTAND THEM!

The same can be said for standard code too. At least with regex, there
are plenty of tools that explain what they do, and what each part means.

For example:

http://rick.measham.id.au/paste/explain.pl?regex=%5Cb%5B-%2B%5D%3F%5B0-9%5D%2B%28%3F%3A%5C.%5B0-9%5D%2B%29%3F%5Cb


NODE                     EXPLANATION
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  [-+]?                    any character of: '-', '+' (optional
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  [0-9]+                   any character of: '0' to '9' (1 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \.                       '.'
--------------------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )?                       end of grouping
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char


That tells you exactly what each part of the following regex means:

   ⌜\b[-+]?[0-9]+(?:\.[0-9]+)?\b⌟

Note:
  I wrap the regex with ⌜ and ⌟ character to denote the start and
  end of the regex. Some people get confused when you use double
  quotes.


I believe Perl or egrep or something can output the exact same “regex
explanation” information from the command line.


Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: How to split file of whitespace separated numbers?

Bo Berglund
In reply to this post by Graeme Geldenhuys-6
On Fri, 23 Dec 2016 11:53:58 +0000, Graeme Geldenhuys
<[hidden email]> wrote:

>That problem is perfectly suited for regular expressions. And a rather
>simple one at than. The FPC's FCL packages include a regex unit too
>which should suite your needs.

I was away over Xmas so I have not seen all this regexp discussion for
my problem until now....

In past times I have come across solutions using regular expressions
for example in shellscripts or similar. In most cases I saw that they
worked but had a hard time understanding *how* they worked, the syntax
is too dense for me.

The actual problem I had was that a data processing program, I did not
write myself, was using up extremely long times just loading the input
data file so I was looking at alternate ways to read that data in.

The program was written originally using RAD Studio 2007 by someone
else and I "ported" it to RAD Studio XE5 a few years ago. But I did
not get into the working code, just making the transfer to Unicode and
updating the GUI. All processing code was untouched (except for
changing string to ansistring where needed).
It is a very math intensive processing package and I am no
mathematician...

Anyway, the original author was no real coder but a scientist so
things like file I/O was not optimal. This shows up when reading the
large actual data files, which could be hundreds of Mbytes in length.
In his code it takes minutes to do!
And this was the cause of my original question. Since it seemed rather
general in nature I posted both here and in the Embarcadero forum...

Now I am down to just seconds using the ReadLn +
StringList.Delimitedtext way to parse the data.

My goal now is to simply create a utility that transforms these files
into binary format instead and add code to load the data into dynamic
float arrays.
The tests I did timed the conversion at some few seconds and once the
binary files are created the load of these resulting binary data is
done in fractions of a second.

So I am pretty much done with this problem (without resorting to
regexp usage).

Thanks anyway for pointing out an alternate path!

--
Bo Berglund
Developer in Sweden

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
12