1.3 or 1.4 - Very slow remote navigation on large amount of elements

Found a bug or something that needs fixing?
Post Reply
ggbce
Posts: 1
Joined: Wed Nov 25, 2020 1:07 am

1.3 or 1.4 - Very slow remote navigation on large amount of elements

Post by ggbce »

Hi,

I tested Explorer++ to have an alternative of Windows Explorer. Basically, it work fine... But I got an issue when I need to navigate remote location (UNC Path like \\myserver\share).

With stable version 1.3, we always have a 2-3 seconds of delay each time I double-click on a remote folder.

With beta version 1.4, this delay is fixed. The waiting time is similar to Windows Explorer

... But in both version, when the remote share have over 5000 elements and enumerated base view is enable, it could take over 5 minutes to show files/folders. With Windows Explorer, it could take 3-10 seconds.

I can understand the DEV of Explorer++ is not very active, but if you see this issue and found a solution, please fix it !

Regards
maraskan_user
Posts: 4
Joined: Wed Sep 28, 2011 8:21 am

Re: 1.3 or 1.4 - Very slow remote navigation on large amount of elements

Post by maraskan_user »

Well, Explorer++ is looping all folders with the IShellFolder::EnumObjects method, but instead of using SHGetDataFromIDList to retrieve the WIN32_FIND_DATA (which would be much faster but has some restrictions with the information actually available in the struct, like timestamp rounding and on older versions of windows even file size being restricted to 32 bit values), uses a call to FindFirstFile() for each item to be added to the listview which gets very slow for large folders, especially over networks.

There are different ways of improving this, each with their own tradeoffs:

- Only using SHGetDataFromIDList() would be ~10 times faster in benchmarks with a 20000 files folder, but file times might be off by a second.

- Use SHGetDataFromIDList() to populate the listview right away and after that add the more complete WIN32_FIND_DATA with FindFirstFile() later in a separate thread. More anoying to program.

- If you REALLY care about speed, you would add separated pathes for virtual folders (loop with EnumObjects) and physical folder (loop FindFirstFile/FindNextFile). You would have to rewrite many of the functions to retrieve m_pExtraItemInfo[uItemId].pridl if it's not set yet. So the network folder would pop up in a second displaying all its files (at least it did, before meltdown/spectre paches; now, even witch patches disabled on Win10 v1809 I'm at 1700 ms, enabled was about twice that.), but if you hit Ctrl+A and Right-Click them to get a Context Menu, it would be very slow, since you'd have to retrieve all the pridls of the marked files at that point in time.

- BTW I have no idea how microsoft's explorer can do its thing so fast. Part of it is smokes & mirrors - showing you incomplete infomation to make it look faster (opening a network folder in ms explorer shows items right away, but they're not sorted yet, and the list is only complete a second or two later, after which it sorts). Probably retrieves additional information in background threads. Who knows. Official XP source code got leaked, one could take a look, but it's not easy to navigate.

- Bonus fact. ListView has an additional optimization trick vs the other view modes. I used to wonder why detail view was so much faster than listview (100 ms vs. 1000 ms for 20000 files). Turns out if you have listview redraw without calling LVM_SETCOLUMNWIDTH beforehand, it's slow. If you tell it what the width of the first column should be before redrawing, it's as fast as detail view. So when populating the listview, you call LVM_GETSTRINGWIDTH for each item to find the maximum width (you lose some ~150 ms for 20000 files), then call LVM_SETCOLUMNWIDTH, then redraw the listview, redraw takes <1 ms. Overall 5x faster than before, but still half as fast as detail view.
David Erceg
Site Admin
Posts: 933
Joined: Sat Apr 18, 2009 1:46 am

Re: 1.3 or 1.4 - Very slow remote navigation on large amount of elements

Post by David Erceg »

Coincidentally, I discussed this exact issue yesterday in a pull request.

The idea I mentioned in the pull request was to use IShellFolder2::GetDetailsEx to retrieve some of the basic item details. Which is something I've implemented locally, but I'm still testing the change.

I was unaware of SHGetDataFromIDList. It would definitely be simpler to call that, but it doesn't look like it works in the case of filesystem-backed virtual folders. For example, calling SHGetDataFromIDList on a file in the "Documents" library folder fails, even though items in that folder are ultimately regular filesystem items.

From a few quick tests, it appears that calling IShellFolder2::GetDetailsEx to retrieve the necessary properties takes about the same time as SHGetDataFromIDList.

It would also be good to remove the reliance on the WIN32_FIND_DATA struct altogether. As I talked about in the linked pull request, the enumeration used to be based off FindFirstFile/FindNextFile entirely, but was later switched to IShellFolder::EnumObjects. The fact that there's any reliance on the data contained in the WIN32_FIND_DATA struct is only because there was (and still is) other code I wrote that depended on having easy access to the struct.

I believe I actually used to use separate enumeration paths for virtual folders and filesystem folders, back when the IShellFolder::EnumObjects code was added. But it's definitely not something I want to return to. I'm also not sure it would be any faster. The libraries example above shows that you need to retrieve basic details like file sizes and dates, even for items in virtual folders. And whether you use FindFirstFile + FindNextFile or IShellFolder::EnumObjects + IShellFolder2::GetDetailsEx, you still end up retrieving the same information.
David Erceg
Site Admin
Posts: 933
Joined: Sat Apr 18, 2009 1:46 am

Re: 1.3 or 1.4 - Very slow remote navigation on large amount of elements

Post by David Erceg »

After investigating a bit further, it looks like the reason why IShellFolder2::GetDetailsEx has the same performance as SHGetDataFromIDList is that they both retrieve data cached within the PIDL. The behavior of SHGetDataFromIDList is discussed at https://www.zabkat.com/2xExplorer/shell ... lore2.html. From testing IShellFolder2::GetDetailsEx, it definitely appears to retrieve cached information.

If a file is changed after the PIDL is obtained, but before the necessary information is retrieved, the information will be stale. Also, changing the memory contents of the PIDL affects the values that are returned by IShellFolder2::GetDetailsEx.

So they should be equivalent, except IShellFolder2::GetDetailsEx can retrieve information on files within library folders.
maraskan_user
Posts: 4
Joined: Wed Sep 28, 2011 8:21 am

Re: 1.3 or 1.4 - Very slow remote navigation on large amount of elements

Post by maraskan_user »

David Erceg wrote: Sat Dec 05, 2020 10:12 am I was unaware of SHGetDataFromIDList. It would definitely be simpler to call that, but it doesn't look like it works in the case of filesystem-backed virtual folders. For example, calling SHGetDataFromIDList on a file in the "Documents" library folder fails, even though items in that folder are ultimately regular filesystem items.

From a few quick tests, it appears that calling IShellFolder2::GetDetailsEx to retrieve the necessary properties takes about the same time as SHGetDataFromIDList.
I wasn't aware of SHGetDataFromIDList's shortcomings with the Documents folder. I have to admit I never used them... I guess in this case either using IShellFolder2::GetDetailsEx or conditionally using FindFirstFile just for folders where SHGetDataFromIDList is known to give bad data seems the best way forward.
David Erceg wrote: Sat Dec 05, 2020 10:12 am I believe I actually used to use separate enumeration paths for virtual folders and filesystem folders, back when the IShellFolder::EnumObjects code was added. But it's definitely not something I want to return to. I'm also not sure it would be any faster.
Oh, first browsing is faster allright. These are some benchmark results of my (outdated) fork opening a folder with 20000 files and populating the listview:

(Local SSD / Win7 / Spectre mitigations disabled)
FindFirstFile+FileNextFile loop: 180 ms
EnumObjects loop+SHGetDataFromIDList: 367 ms

(Local SSD / Win10 v1809 / Spectre mitigations disabled)
FindFirstFile+FileNextFile loop: 355 ms
EnumObjects loop+SHGetDataFromIDList: 744 ms

(Local SSD / Win10 v1809 / Spectre mitigations enabled)
FindFirstFile+FileNextFile loop: 682 ms
EnumObjects loop+SHGetDataFromIDList: not tested

(NAS folder / Win10 v1809 / Spectre mitigations disabled)
FindFirstFile+FileNextFile loop: 1534 ms
EnumObjects loop+SHGetDataFromIDList: 2014 ms

By the way, the official versions of Exp++ take ~23 seconds to open that NAS folder, just for calling FindFirstFile for each single file.

Of course, using FindFirstFile+FileNextFile has the one big disadvantage that, if you later want the items pridls (like for building the Right-Click-Menu), they will have to be retrieved with GetIdlFromParsingName(), which is about as slow as you can image. Pretty much as bad as calling FindFirstFile for each single.
David Erceg wrote: Sat Dec 05, 2020 5:36 pm After investigating a bit further, it looks like the reason why IShellFolder2::GetDetailsEx has the same performance as SHGetDataFromIDList is that they both retrieve data cached within the PIDL. The behavior of SHGetDataFromIDList is discussed at https://www.zabkat.com/2xExplorer/shell ... lore2.html. From testing IShellFolder2::GetDetailsEx, it definitely appears to retrieve cached information.

If a file is changed after the PIDL is obtained, but before the necessary information is retrieved, the information will be stale. Also, changing the memory contents of the PIDL affects the values that are returned by IShellFolder2::GetDetailsEx.
Well, yes. If we retrieved WIN32_FIND_DATA for a file and it changed afterwards, our find data would be stale as well, no?
David Erceg
Site Admin
Posts: 933
Joined: Sat Apr 18, 2009 1:46 am

Re: 1.3 or 1.4 - Very slow remote navigation on large amount of elements

Post by David Erceg »

Oh, first browsing is faster allright.
That's good to know, at least. It's a bit frustrating that the best performance would come from having a split implementation, when a unified implementation is simpler. Personally, I think there would be much greater gains made from performing the enumeration on a background thread. It wouldn't directly improve performance, but it would prevent the application from hanging during an extended navigation. Showing partial results, as Explorer tends to do, might also help perceived performance.
Well, yes. If we retrieved WIN32_FIND_DATA for a file and it changed afterwards, our find data would be stale as well, no?
It's more of a surprise to me that the information is stored within the pidl. I've always seen it as a black box, which it essentially is from the application's perspective. And while there is mention of the behavior on the zabkat.com page, it's not something I think I've seen mentioned in the official documentation, perhaps because it's an implementation detail. Whereas the WIN32_FIND_DATA struct is publicly documented.
maraskan_user
Posts: 4
Joined: Wed Sep 28, 2011 8:21 am

Re: 1.3 or 1.4 - Very slow remote navigation on large amount of elements

Post by maraskan_user »

David Erceg wrote: Sun Dec 06, 2020 12:56 pm That's good to know, at least. It's a bit frustrating that the best performance would come from having a split implementation, when a unified implementation is simpler.
Indeed. That's why I moved my fork back to using just SHGetDataFromIDList. But if NAS browsing speed was still as important to me as it was a couple of years ago, I think I'd still go with a split approach and maybe try to retrieve the pridls in a background thread, if I could get it working.
David Erceg wrote: Sun Dec 06, 2020 12:56 pm Personally, I think there would be much greater gains made from performing the enumeration on a background thread.
One of these days, if I ever find the time, I will have to export my mods and reemplement them in a copy of the current master. My fork was from around 2011 and while I adopted selected bug fixes, you did some great work with moving stuff to background threads the last couple of years, and I'd really like to be based on the present master.
David Erceg wrote: Sun Dec 06, 2020 12:56 pm It wouldn't directly improve performance, but it would prevent the application from hanging during an extended navigation. Showing partial results, as Explorer tends to do, might also help perceived performance.
While I'm not exactly a proponent of perceived performance, in some cases it does help. Microsoft tends to design its products for the widest possible range of use cases, so their version of a file browser tries to handle extrem cases like accessing a file server on the other side of the planet with 1 million items in a folder in a way so that things don't totally break down. It will as least show something, so the user might notice that he's not even in the right folder and move to another one. I always found it delighting to read Raymond Chen's blog and realize how many compromizes they had to make creating something that mostly works for everyone, even if they ended up with something that didn't work perfectly for anyone...
Post Reply