Discussion:
[Thunar-dev] Feature request: convert file names if source and target text encoding differs.
Péter
2017-06-21 16:51:18 UTC
Permalink
Hi,

Please add to the wish list:
To convert file names if source and target text encoding differs.
(In order to fully integrate possibly differently encoded filesystems into one.)

For example, an smb: location can have utf8-encoded filenames and the local filesystem can have iso88592-encoded
(latin2) filenames. When copying a file, then the (abstract) *characters* are to be preserved. Not the bytes. In order
to preserve the characters, Thunar should first decode (interpret) the byte sequence, then when writing out (to a
dirrefent location) it should encode the characters into byte sequence.

Already exists a G_FILENAME_ENCODING environment variable, but it cannot distinguish nor express different locations.
With "G_FILENAME_ENCODING=UTF8,iso88592": when I copy a file from a utf8-location to a latin2-location then Thunar
copies the filename byte sequence unchanged (despite that probably, in order to show the name properly, Thunar did
decoding, from bytes to characters). When I copy a file from latin2-location to utf8-location then Thunar fails with
"Invalid argument".

rsync do have options to translate (=preserve characters) filenames.
"--iconv=LOCAL,REMOTE" "Rsync can convert filenames between character sets using this option."

smbnetfs also has config options for converting filenames.
[iconv]
-o from_code=CHARSET
original encoding of file names (default: UTF-8)
-o to_code=CHARSET
new encoding of the file names (default: UTF-8)
I cannot (of course) surely determine exactly which software component's business is (should be) such a glueing. Please
feel free to determine (forward this request to the proper recipients). (It may be, for example, "gvfs-mount" or such
component.)

--
Alex
2017-06-21 21:49:47 UTC
Permalink
Hi Péter,

thanks for your investigation!

There is already an open bugs on this issue:
https://bugzilla.xfce.org/show_bug.cgi?id=5346

For now I just added your mail to the bug. Probably the information can
speed up the fix. Feel free to add more, if you like!

I am sorry to say that currently not much happens on the different
thunar bugs.
Maybe things will change after thunar got ported to gtk3.

Cheers,
Alex
Post by Péter
Hi,
To convert file names if source and target text encoding differs.
(In order to fully integrate possibly differently encoded filesystems into one.)
For example, an smb: location can have utf8-encoded filenames and the
local filesystem can have iso88592-encoded (latin2) filenames. When
copying a file, then the (abstract) *characters* are to be preserved.
Not the bytes. In order to preserve the characters, Thunar should
first decode (interpret) the byte sequence, then when writing out (to
a dirrefent location) it should encode the characters into byte sequence.
Already exists a G_FILENAME_ENCODING environment variable, but it
cannot distinguish nor express different locations. With
"G_FILENAME_ENCODING=UTF8,iso88592": when I copy a file from a
utf8-location to a latin2-location then Thunar copies the filename
byte sequence unchanged (despite that probably, in order to show the
name properly, Thunar did decoding, from bytes to characters). When I
copy a file from latin2-location to utf8-location then Thunar fails
with "Invalid argument".
rsync do have options to translate (=preserve characters) filenames.
"--iconv=LOCAL,REMOTE" "Rsync can convert filenames between character
sets using this option."
smbnetfs also has config options for converting filenames.
[iconv]
-o from_code=CHARSET
original encoding of file names (default: UTF-8)
-o to_code=CHARSET
new encoding of the file names (default: UTF-8)
I cannot (of course) surely determine exactly which software
component's business is (should be) such a glueing. Please feel free
to determine (forward this request to the proper recipients). (It may
be, for example, "gvfs-mount" or such component.)
Loading...