Unicode filenames

classic Classic list List threaded Threaded
44 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Unicode filenames

Vincent Snijders
How does the RTL support using unicode filenames (e.g. file names that
cannot be represented by the ansi char set)?

For example the FileExists function takes a string which is encoded in
the system char set. If the system char set is UTF8, like most linuxes
and Mac OS X, then there is no problem. But on windows using a western
european charset, I cannot check for existence of a file with cyrilic
characters, even though I can enter them in the windows explorer and
create such files.

Vincent
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Jonas Maebe-2

On 29 Jun 2008, at 09:27, Vincent Snijders wrote:

> How does the RTL support using unicode filenames (e.g. file names  
> that cannot be represented by the ansi char set)?

It doesn't.


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Martin Schreiber
In reply to this post by Vincent Snijders
On Sunday 29 June 2008 09.27:24 Vincent Snijders wrote:

> How does the RTL support using unicode filenames (e.g. file names that
> cannot be represented by the ansi char set)?
>
> For example the FileExists function takes a string which is encoded in
> the system char set. If the system char set is UTF8, like most linuxes
> and Mac OS X, then there is no problem. But on windows using a western
> european charset, I cannot check for existence of a file with cyrilic
> characters, even though I can enter them in the windows explorer and
> create such files.
>
In MSEgui I use my own set of widestring based file utilities to overcome the
problem. They are located in lib/common/msefileutils.pas and the
systemspecific msesysintf.pas. The windows version currently converts the
MSEgui widestring filenames to the system encoding before doing system calls,
I plan to call the *W versions of the system routines instead if available
(post version 1.8).

Martin
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Jonas Maebe-2
In reply to this post by Jonas Maebe-2

On 29 Jun 2008, at 09:43, Jonas Maebe wrote:

> On 29 Jun 2008, at 09:27, Vincent Snijders wrote:
>
>> How does the RTL support using unicode filenames (e.g. file names  
>> that cannot be represented by the ansi char set)?
>
> It doesn't.

See also http://bugs.freepascal.org/view.php?id=7863


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Marco van de Voort
In reply to this post by Vincent Snijders
> How does the RTL support using unicode filenames (e.g. file names that
> cannot be represented by the ansi char set)?

As said on IRC, afaik decisions about this have been continuously postponed.

There are some problems:

- If the border condition that w9x must remain supported persists, this
  becomes a magnitude more work. A way to deal with this has to be found
  (two win32 FPC releeases, one advocated as D2..D2006 compat + w9x, one as
  NT + Unicode/Tiburon?)

- Also I have some doubts that using two different encodings is a good thing
  for a portable compiler, so a decision has to be made about that too (e.g.
  always support UTF-8 and allow overloading with utf16 if the platform
  defaults to it)

- Do we have a non com widestring on Windows?

- Tiburon compability. It doesn't make sense to roll our own slightly
  incompatible schema. At least we should have a look at it, before we
  decide if we support it (if it can folded in the multiplatform vision and
  it is sane from a native perspective)

What are the exact plans of Lazarus in this? Is there some wiki page with
how Lazarus plans to tackle this with all multi-platform concerns?
 
Btw I saw that Cygwin is giving up 9x support in the next release.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Vincent Snijders
Marco van de Voort schreef:
> What are the exact plans of Lazarus in this? Is there some wiki page with
> how Lazarus plans to tackle this with all multi-platform concerns?
>

All strings the LCL are UTF8, see
http://wiki.lazarus.freepascal.org/LCL_Unicode_Support

For windows this means all strings in the LCL are converted to ansi, if
the OS is win9X, and to widestring, if the OS is NT or higher.

See in particular:
http://wiki.lazarus.freepascal.org/LCL_Unicode_Support#Dealing_with_directory_and_filenames

IMHO there should be a better solution than to convert file and
directory to/from ansi. I think, this is the archiles heel of the
Lazarus unicode support, that it depends on a RTL that has insufficient
unicode support.

Vincent
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Marco van de Voort
> Marco van de Voort schreef:
> > What are the exact plans of Lazarus in this? Is there some wiki page with
> > how Lazarus plans to tackle this with all multi-platform concerns?
>
> All strings the LCL are UTF8, see
> http://wiki.lazarus.freepascal.org/LCL_Unicode_Support
>
> For windows this means all strings in the LCL are converted to ansi, if
> the OS is win9X, and to widestring, if the OS is NT or higher.
>
> See in particular:
> http://wiki.lazarus.freepascal.org/LCL_Unicode_Support#Dealing_with_directory_and_filenames
 
> IMHO there should be a better solution than to convert file and
> directory to/from ansi. I think, this is the archiles heel of the
> Lazarus unicode support, that it depends on a RTL that has insufficient
> unicode support.

Yes. And with the above question I meant more what you want long term, and
the reasons that we can't see (widgetset related, db components related),
not which hacks you employ now to workaround that :-)

If we want to make an informed decision, we have to put all requirements on
the table, e.g.

- Base principle for me: requiring too much handcoding is not desirable. In
  an application (not RTL/FCL) I don't want to have to insert manual
  conversions for each string operation and/or passing. Some of this must be
  automated, it is the delphi way IMHO.

- Which encoding(s) to support (utf-8 and/or utf-16 mostly)
- The and/or in the question above, one primal encoding or two? If you have
  just one, you have to convert on some OS to access API+widgetset, and header
  translated for that OS must be redone with gluecode doing the transforms
  If you have two, each general purpose string routine must be
  doubly implemented. (or face conversion chaos)
- keep one windows release per windows target (win32,win64) or have two
  (ascii+w9x compat, unicode+NTonly)  ?
  This because if we significantly step unicode API use up, keeping runtime compatability
   with w9x will require a lot of hackish code. Split them up, and you just
   have a few unicode vs ansi includefiles, and way less glue code to make
   mistakes in.
- Do we (longterm) let UTF8 piggy back on ansistring, or do we have a
  distinct type for it, so that the compiler knows that type is utf-8 (and
  consequently that ansistring isn't) ?  I've a feeling that this might be
  required, even if we decide on utf8 as universal encoding to avoid having
  to add too many checks and hand transformations.
 
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Graeme Geldenhuys-2
In reply to this post by Vincent Snijders
2008/6/29 Vincent Snijders <[hidden email]>:
> How does the RTL support using unicode filenames (e.g. file names that
> cannot be represented by the ansi char set)?
>
> For example the FileExists function takes a string which is encoded in the
> system char set. If the system char set is UTF8, like most linuxes and Mac
> OS X, then there is no problem. But on windows using a western european
> charset, I cannot check for existence of a file with cyrilic characters,
> even though I can enter them in the windows explorer and create such files.

In fpGUI we use UTF-8 for everything. We have wrapper file access
functions which replaces the RTL ones. The unit is called
gfx_utils.pas

eg:

function fpgFileExists(const FileName: TfpgString): Boolean;
begin
  Result := FileExists(fpgToOSEncoding(FileName));
end;

fpgToOSEncoding() is then implemented in platform dependent include
files (like FPC also does with many functions).


Linux & *BSD X11:
---------------------------
// yes we assume UTF-8. Only very old Linux versions don't use UTF-8,
but that is very rare now.
function fpgToOSEncoding(aString: TfpgString): string;
begin
  Result := aString;
end;


Windows GDI:
-------
function fpgToOSEncoding(aString: TfpgString): string;
begin
  Result := Utf8ToAnsi(aString);
end;


Regards,
 - Graeme -


_______________________________________________
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Vincent Snijders
Graeme Geldenhuys schreef:

> 2008/6/29 Vincent Snijders <[hidden email]>:
>> How does the RTL support using unicode filenames (e.g. file names that
>> cannot be represented by the ansi char set)?
>>
>> For example the FileExists function takes a string which is encoded in the
>> system char set. If the system char set is UTF8, like most linuxes and Mac
>> OS X, then there is no problem. But on windows using a western european
>> charset, I cannot check for existence of a file with cyrilic characters,
>> even though I can enter them in the windows explorer and create such files.
>
> In fpGUI we use UTF-8 for everything. We have wrapper file access
> functions which replaces the RTL ones. The unit is called
> gfx_utils.pas
>
> eg:
> Windows GDI:
> -------
> function fpgToOSEncoding(aString: TfpgString): string;
> begin
>   Result := Utf8ToAnsi(aString);
> end;

I see you are crippled in the same way as the LCL, because you only can
handle ansi filenames correctly.

Vincent
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Martin Schreiber
In reply to this post by Marco van de Voort
On Sunday 29 June 2008 13.10:33 Marco van de Voort wrote:

>
> - Which encoding(s) to support (utf-8 and/or utf-16 mostly)

In order to complement Graemes mail, MSEgui uses widestrings for everything.

Martin
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Graeme Geldenhuys-2
In reply to this post by Vincent Snijders
2008/6/29 Vincent Snijders <[hidden email]>:
>
> I see you are crippled in the same way as the LCL, because you only can
> handle ansi filenames correctly.

fpGUI was tested under Windows with Russian locale and filenames. Not
tested by me, but my a co-developer (Vladimir). He reported that the
file dialogs and other file related functions worked correctly. He
actually implemented the locale file handing.


Regards,
 - Graeme -


_______________________________________________
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Felipe Monteiro de Carvalho
In reply to this post by Marco van de Voort
On Sun, Jun 29, 2008 at 6:32 AM, Marco van de Voort <[hidden email]> wrote:
> - If the border condition that w9x must remain supported persists, this
>  becomes a magnitude more work. A way to deal with this has to be found
>  (two win32 FPC releeases, one advocated as D2..D2006 compat + w9x, one as
>  NT + Unicode/Tiburon?)

procedure AnyFileRoutineInWin32(AFileName: widestring);
begin
  if UnicodeEnabledOS then SomeWin32APIW()
  else AnsiToWideString(SomeWin32ApiA())
end;

Not very hard to keep 9x support.

--
Felipe Monteiro de Carvalho
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Vincent Snijders
In reply to this post by Graeme Geldenhuys-2
Graeme Geldenhuys schreef:
> 2008/6/29 Vincent Snijders <[hidden email]>:
>> I see you are crippled in the same way as the LCL, because you only can
>> handle ansi filenames correctly.
>
> fpGUI was tested under Windows with Russian locale and filenames. Not
> tested by me, but my a co-developer (Vladimir). He reported that the
> file dialogs and other file related functions worked correctly. He
> actually implemented the locale file handing.

Yes, it works ok, if all the characters used are part of the system
locale, so for Russian locale, it contains the ascii characters and the
cyrilic characters.
Try using a path, that contains characters from more than codepage, for
example using cyrillic characters in a file name and french accented
chars in the directory. That is when you need real unicode support.

Vincent
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Felipe Monteiro de Carvalho
In reply to this post by Graeme Geldenhuys-2
On Sun, Jun 29, 2008 at 3:19 PM, Graeme Geldenhuys
<[hidden email]> wrote:
> fpGUI was tested under Windows with Russian locale and filenames. Not
> tested by me, but my a co-developer (Vladimir). He reported that the
> file dialogs and other file related functions worked correctly. He
> actually implemented the locale file handing.

What if you have a russian directory in a Windows with western latin locale?

What if you have a chinese directory in a Linux with non-utf-8 locale?

We also support a russian directory in a russian windows the problem
only appears if the locale cannot represent the characters in the
file.

--
Felipe Monteiro de Carvalho
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Felipe Monteiro de Carvalho
In reply to this post by Felipe Monteiro de Carvalho
> procedure AnyFileRoutineInWin32(AFileName: widestring);
> begin
>  if UnicodeEnabledOS then SomeWin32APIW()
>  else AnsiToWideString(SomeWin32ApiA())
> end;

If you want even more details you can initialize UnicodeEnabledOS by
reading the operating system version and the operating system type
NT/9x very easely.

--
Felipe Monteiro de Carvalho
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Vincent Snijders
In reply to this post by Graeme Geldenhuys-2
Graeme Geldenhuys schreef:

> 2008/6/29 Vincent Snijders <[hidden email]>:
>> I see you are crippled in the same way as the LCL, because you only can
>> handle ansi filenames correctly.
>
> fpGUI was tested under Windows with Russian locale and filenames. Not
> tested by me, but my a co-developer (Vladimir). He reported that the
> file dialogs and other file related functions worked correctly. He
> actually implemented the locale file handing.
>
>

If you create a file with a russian name on your hard disk, can you use
fpgFileExists to check for its existence?

Vincent
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Graeme Geldenhuys-2
2008/6/29 Vincent Snijders <[hidden email]>:
>
> If you create a file with a russian name on your hard disk, can you use
> fpgFileExists to check for its existence?

My Windows doesn't contain a Russian locale, but I tried it with
French, German etc names and it works. I asked Vladimir to create a
zip file containing English and Russian directory and file names.
I'll unzip that and give it another try as soon as he emails the file.
 I'll let you know what happens.  :)

Regards,
 - Graeme -


_______________________________________________
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Graeme Geldenhuys-2
In reply to this post by Vincent Snijders
2008/6/29 Vincent Snijders <[hidden email]>:
>
> If you create a file with a russian name on your hard disk, can you use
> fpgFileExists to check for its existence?

I don't know how to change the Windows language to Russian, but I do
under Linux. So I did the following. I changed my Linux systems locale
for an application to Russian. I had Russian translations of fpGUI
Toolkit so used those. I copied the rsCancel resourcestring value (in
Russian) to a Edit component. Copied that to clipboard, used the File
Open dialog to 'create directory' and pasted the Russian word for
Cancel in their.  Now I had a Russian directory on the hard drive.  I
quit the program, changed back to English locale. Loaded the program,
and it displayed the Russian directory correctly.

PS:
 Even Linux's terminal didn't display the russian directory correcty.
A whole bunch of '?????????' instead. fpGUI worked fine! ;-)

See attached screenshot. Application using English locale
(en_ZA.UTF-8) as can be seen by the grid headers and displaying a
Russian directory name.


Regards,
 - Graeme -


_______________________________________________
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Screenshot.gif (30K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Graeme Geldenhuys-2
In reply to this post by Vincent Snijders
2008/6/29 Vincent Snijders <[hidden email]>:
>
> If you create a file with a russian name on your hard disk, can you use
> fpgFileExists to check for its existence?

I did a test and sent a screenshot of the results. I don't know what's
the limit of attachments in this mailing list.  So let me know if the
attachment didn't go through. The file was 22kb in size.


Regards,
 - Graeme -


_______________________________________________
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Unicode filenames

Felipe Monteiro de Carvalho
On Sun, Jun 29, 2008 at 8:06 PM, Graeme Geldenhuys
<[hidden email]> wrote:
> I did a test and sent a screenshot of the results. I don't know what's
> the limit of attachments in this mailing list.  So let me know if the
> attachment didn't go through. The file was 22kb in size.

Nothing arived here. Can't you just say if it worked or not?

--
Felipe Monteiro de Carvalho
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
123