[Thunar-dev] Thunar performance improvements

Discussion:

André Gillibert

2011-11-16 16:54:11 UTC

Tired of waiting 75 seconds in front of Thunar when listing a directory of 7500 symbolic links over a CIFS share, I tried to improve performances.

I noticed Thunar (as well as pretty all Linux file managers) perform network or disk intensive stat(2)/lstat(2)/access(2) and readlink(2) operations on every file or directory before listing them, even though most of the info it needs to get a proper icon and sort order can be got from the d_type byte in struct dirent (see readdir(3)) and from the file name.

First, I wrote a set of patches for Thunar-1.0.2, which is basically what I had available on my CentOS 5.x system. Then, I discovered that Thunar-1.2 I/O subsystem had been dramatically changed to use the GIO virtual file system rather than thunar-vfs.

Not wanting to uselessly fork the Thunar project (because IMO, forking is harmful to the OSS community), I wrote similar patches for Thunar-1.2, and, I wonder whether they can be included in the mainline Thunar.

Patches overview:
First, I had to patch the standard file system backend of GIO of GLIB 2.28, because, GIO performed stat(2)/lstat(2) operations even when requested information didn't need that.
I wrote a patch to GIO that make it lazy. stat(2)/lstat(2)/readlink(2)/access(2) operations are only performed when required. Moreover, special optimization has been done to reduce the number of access(2) calls. For example, in a directory where most files are read-executable, it will perform only access(R_OK|X_OK) and access(W_OK) calls on most files, which is 2 syscalls rather than 3.

Secondly, I patched Thunar, as follows:
1) Changed the ThunarFile implementation to only compute information that's "required", and, as more information is needed, complete this information.

2) When a directory is loaded, only basic information is retrieved: File type (directory/regular-file/symlink/special-inode), fast content type without following symlinks. This is very fast as it requires very little I/O. On the big CIFS directory: 75 seconds -> 2 or 3 seconds.
This is not very correct for symlinks, as it doesn't even know whether it points to a file/special-inode or directory. So, we go to point (3).

3) When the file is actually viewed, a background thread computes more info (and so, follow symlinks), and, after a few milliseconds, the icon is changed. If a symlink is discovered to point to a directory, it's inserted as subdirectory in the tree view.

4) When a file or set of files is selected, their real content-type is computed in order to show a correct context menu, although, not everything is computed in some cases (e.g. If there's a file + a directory in the selection, it knows that the only verb is "open").

Consequently, the behavior of double-click or context menu is not changed. Only icons may be "incorrect" a short amount of time.

5) The side tree view was extremely slow in some cases. It could freeze Thunar for several minutes. This is because Thunar wanted to know if each directory visible in the tree view had any subdirectory (following symlinks) in order to display a little cross to be able to expand the directory and view the subdirectories. This was performed in a background thread, but, on I/O bound systems, could slow down extremely all other I/O operations.

Actually, this was the "bug" that made me initially write this set of patches.

I changed that to make it behave like Nautilus: Don't enter subdirectories until the cross is clicked, in that case, if no subdirectory is found, just make the cross disappear... This is a "feature regression", but I may update the patch to make something fast and correct most of the time: Seek a subdirectory, parsing a limited number (e.g. one hundred) of sub-files at most, and stop as soon as a true subdirectory (not symlink) is found. In doubt, assume the directory may have subdirectories.

I may also provide a user preference to balance between performances and correctness.
At the highest level of correctness, it would behave as the old Thunar (although twice as fast because, a "bug" in the folder listing function would make everything listed/stat-ed twice).

6) I fixed a few performance bugs. For example, when viewing a directory, it was sorted with a O(n^2) algorithm because the dir was initially listed as empty, and files, after having been listed in a background job, were seen as dynamically added files.

7) IIRC, I fixed a bug that made Thunar crash when seeing broken mount points (e.g. FUSE file system where the user-space process crashed).

My code is neither very pretty nor very commented, but I can improve the code quality before submitting the patches. I'd prefer to know whether Thunar maintainers would accept the patches befor cleaning them up.

I hope that the philosophy of correctness of Thunar don't prevent pure performance patches like these ones, to be included.

--
Andr? Gillibert

Nick Schermer

2011-11-16 18:56:37 UTC

Permalink

He Andre,

Of course we're interested in performance improvements and I don't
think Jannis would even mind small regressions if it gives a
performance speed-up.

So? Where are the patches ;-)?

Nick

Jannis Pohlmann

2011-11-17 17:40:03 UTC

Permalink

Hi Andr?,

On Wed, 16 Nov 2011 17:54:11 +0100

Post by AndrÃ© Gillibert
Tired of waiting 75 seconds in front of Thunar when listing a
directory of 7500 symbolic links over a CIFS share, I tried to
improve performances.

Cool! We often don't have the time to perform detailed benchmarking and
performance optimizations so I'm happy to hear you picked this up as a
challenge.

Post by AndrÃ© Gillibert
First, I had to patch the standard file system backend of GIO of GLIB
2.28, because, GIO performed stat(2)/lstat(2) operations even when
requested information didn't need that. I wrote a patch to GIO that
make it lazy. stat(2)/lstat(2)/readlink(2)/access(2) operations are
only performed when required. Moreover, special optimization has been
done to reduce the number of access(2) calls. For example, in a
directory where most files are read-executable, it will perform only
access(R_OK|X_OK) and access(W_OK) calls on most files, which is 2
syscalls rather than 3.

I've seen you reported your GIO/GLib related patches on gtk-devel-list.
You'll have to talk to the GLib developers about getting them approved.

Post by AndrÃ© Gillibert
1) Changed the ThunarFile implementation to only compute information
that's "required", and, as more information is needed, complete this
information.

The "required" bit is a bit complicated, I guess, so I can only judge
about this improvement based on real code.

Post by AndrÃ© Gillibert
File type (directory/regular-file/symlink/special-inode), fast
content type without following symlinks. This is very fast as it
requires very little I/O. On the big CIFS directory: 75 seconds -> 2
or 3 seconds. This is not very correct for symlinks, as it doesn't
even know whether it points to a file/special-inode or directory. So,
we go to point (3).

Interesting and good so far.

Post by AndrÃ© Gillibert
3) When the file is actually viewed, a background thread computes
more info (and so, follow symlinks), and, after a few milliseconds,
the icon is changed. If a symlink is discovered to point to a
directory, it's inserted as subdirectory in the tree view.

Doesn't that mean things will jump up and down in the directory as you
browse it?

Post by AndrÃ© Gillibert
4) When a file or set of files is selected, their real content-type
is computed in order to show a correct context menu, although, not
everything is computed in some cases (e.g. If there's a file + a
directory in the selection, it knows that the only verb is "open").

I wouldn't want to add two many special cases where we load additional
information. A first quick pass and then lazy loading additional
information all at once sounds more simple to me.

Post by AndrÃ© Gillibert
Consequently, the behavior of double-click or context menu is not
changed. Only icons may be "incorrect" a short amount of time.
5) The side tree view was extremely slow in some cases. It could
freeze Thunar for several minutes. This is because Thunar wanted to
know if each directory visible in the tree view had any subdirectory
(following symlinks) in order to display a little cross to be able to
expand the directory and view the subdirectories. This was performed
in a background thread, but, on I/O bound systems, could slow down
extremely all other I/O operations.
Actually, this was the "bug" that made me initially write this set of patches.
I changed that to make it behave like Nautilus: Don't enter
subdirectories until the cross is clicked, in that case, if no
subdirectory is found, just make the cross disappear... This is a
"feature regression", but I may update the patch to make something
fast and correct most of the time: Seek a subdirectory, parsing a
limited number (e.g. one hundred) of sub-files at most, and stop as
soon as a true subdirectory (not symlink) is found. In doubt, assume
the directory may have subdirectories.

That sounds a little better and not too complicated either. Although I
wonder if seeking a subdirectory won't be much faster if we query less
information. Maybe that is enough optimization already?

Post by AndrÃ© Gillibert
I may also provide a user preference to balance between performances
and correctness. At the highest level of correctness, it would behave
as the old Thunar (although twice as fast because, a "bug" in the
folder listing function would make everything listed/stat-ed twice).

Please don't. No option for technical feature sets like this.

Post by AndrÃ© Gillibert
6) I fixed a few performance bugs. For example, when viewing a
directory, it was sorted with a O(n^2) algorithm because the dir was
initially listed as empty, and files, after having been listed in a
background job, were seen as dynamically added files.

That itself doesn't imply O(n?), does it? My guess would be that it
depends on how you do the online sorting.

Post by AndrÃ© Gillibert
7) IIRC, I fixed a bug that made Thunar crash when seeing broken
mount points (e.g. FUSE file system where the user-space process
crashed).

Crash fixers are always good.

Post by AndrÃ© Gillibert
My code is neither very pretty nor very commented, but I can improve
the code quality before submitting the patches. I'd prefer to know
whether Thunar maintainers would accept the patches befor cleaning
them up.

It would of course be nice if you cleaned things up before we merge
them. I'm interested in your optimizations although I can't promise to
merge all individual improvements. Can you, for each of the above
points, create a bug on bugzilla.xfce.org and attach the corresponding
patches? Or, alternatively, clone the Thunar git repository, upload
your clone and put the fixes for each point into a dedicated branch?

Then we can continue discussing them on bugzilla.xfce.org.

Post by AndrÃ© Gillibert
I hope that the philosophy of correctness of Thunar don't prevent
pure performance patches like these ones, to be included.

Absolutely not, as state earlier. Raising these problems in a good
email is already worth quite a lot. Patches: awesome.

- Jannis

André Gillibert

2011-11-18 11:42:00 UTC

Permalink

Post by Jannis Pohlmann
Hi Andr?,

Post by AndrÃ© Gillibert
1) Changed the ThunarFile implementation to only compute information
that's "required", and, as more information is needed, complete this
information.

The "required" bit is a bit complicated, I guess, so I can only judge
about this improvement based on real code.

Indeed. Now, I'm working on commenting and splitting patches.

Post by Jannis Pohlmann

Doesn't that mean things will jump up and down in the directory as you
browse it?

Good question. When a file was changed thunar_list_model_file_changed
was invoked, and the file changed its position to keep the directory
sorted. A symlink becoming a directory would have made the directory
jump to the start of the folder, if directories are sorted before
files. To avoid that side effect, plus a re-entrancy issue with
thunar_standard_view_selection_changed (although it was fixable), I
simply don't move files when they change, so the folder is not really
sorted properly.
Currently, leaving a folder and re-entering it, sort it again, so that
symlink to directories that had been seen are moved at the top of the
folder, while symlink to directories that had not been seen, are still
sorted together with files.
This only affects symlinks to directories. Regular directories and
symlinks to files are not affected.
This inconsistency disappears if the "sort folders before files"
option is unset.

Post by Jannis Pohlmann

I wouldn't want to add two many special cases where we load additional
information. A first quick pass and then lazy loading additional
information all at once sounds more simple to me.

The main code change is in thunar/thunar-file.c. A few "fast"
functions, that don't follow symlinks unless they have already been
followed previously, are added such as thunar_file_is_directory_fast.
The logic of lazy loading of information is in thunar-file.c, through
a central function:

thunar_file_complete_info(ThunarFile *file, guint flags)
where flags are a few categories.
THUNAR_INFO_BASIC (zero-cost information every ThunarFile gets on creation)
THUNAR_INFO_XSTAT (accurate target file type, content-type, and UNIX
attributes, which is pretty all info you can get with
lstat(2)/stat(2)).
THUNAR_INFO_ACCESS (Info you get with access(2))
THUNAR_INFO_SYMLINK_TARGET (target of a symlink (readlink(2)))
THUNAR_INFO_TRASH (info related to trash items)

The idea was that mass operations (viewing, selecting) need only
THUNAR_INFO_XSTAT, and other info would be rarely asked.
But, I discovered yesterday that, to be compatible with plugins such
as UCA, complete info must be computed whenever files are selected,
unless the plugin interface is changed to allow plugins to ask only
information they need.

Consequently, it makes sense to limit this info to two levels:
THUNAR_INFO_BASIC
THUNAR_INFO_ALL

There is not much code to change. :)

Post by Jannis Pohlmann

That sounds a little better and not too complicated either. Although I
wonder if seeking a subdirectory won't be much faster if we query less
information. Maybe that is enough optimization already?

If symlinks are not followed, it can have acceptable performances.
readdir(3) is much cheaper than stat(2).

For example (target machine = K6-2 550 Mhz, 448 MB RAM, samba, 7500
symlink folder, client machine = Core 2 Duo 100 Mbps ethernet
network):
time ls -f /huge_cifs_folder > /dev/null
-> 0.89 second
time ls -l /huge_cifs_folder > /dev/null
-> 14 seconds (even though everything is in server cache)

BTW, NFS is much faster than CIFS. :)

Basically, we want to know whether a directory has regular
sub-directories. Symlink sub-directories are not significant as far as
they are not viewed.

POSSIBLE optimization, but hard to make portable: Use the st_nlink
field on the few well-known file systems where its behavior is
reliable. If st_nlink > 2, then, most probably, there's a regular
sub-directory.

Even with readdir(3), the worst case can be very poor: Many symlinks
to the same huge CIFS directory viewed, or a CIFS directory with many
crossed symlinks. Each one would be parsed independently, so it would
require 0.89*many seconds. This is due to the fact that we cannot
assume that the graph of symlinks is a tree. It's an arbitrary
oriented graph.
SOLUTION proposition: It may be possible to save info on each parsed
symlink target folder, in order to avoid recomputing whether there are
subfolders.
This doesn't solve the problem of many multi-mounting (as can be
obtained with mount --bind), or indirection through a symlink unaware
system (e.g. Thunar on Linux viewing a CIFS share of a Linux SAMBA
directory shared through a Windows SMB share, and so, masking
directory symlinks as if they were regular directories, but these are
corner cases that may not be significant.

Anyway, I wouldn't like Thunar to be one of those applications that
make the system CPU, disk or network slower for extensive periods of
time, only because it's launched, even though there's no user
interaction.

Post by Jannis Pohlmann

Please don't. No option for technical feature sets like this.

Okay. It's always better to make the machine take this decision... The
computer can find its own balance. For example, the concept of not
following symlinks could be moderated by the number of symlinks or a
time limit.

Post by Jannis Pohlmann

That itself doesn't imply O(n?), does it? My guess would be that it
depends on how you do the online sorting.

Yes, it could be made O(n*log(n)) by sorting the list of files and
then, merging them to the already-sorted list with a linear sorted
list merge algorithm, but it was done through iterative insertions.

This raises a question about events creating new files in unsorted
folders (e.g. new files created through a shell script and notified
via GAMIN).
Symlinks that "become" directories are currently not properly sorted
(not moved), and so, make any sorted insertion algorithm
non-functionnal in some way or other.
Moreover, this orthogonal idea of having the folder always sorted is
not so nice in the real world. The Windows explorer behavior of
putting new files at the end of the folder make it much easier to keep
track of the last file you just created with a separate command line
tool.

--
Andr? Gillibert