Optimize removeDuplicateGradient to avoid O(n²) in the average case
Created by: nthykier
The original implementation of removeDuplicateGradient does O(n²) search over all gradients to remove duplicates. In images with many gradients (such as MediaWiki_logo_1.svg), this becomes a significant overhead.
This patch optimizes for the average case by splitting gradients into smaller lists (called "buckets" in the code). The splitting is done by selecting some attributes to generate a key with the following properties:
- If multiple gradients have the same key, then a subset of those gradients /might/ be duplicates of each other.
- If their keys are not identical, then they cannot be duplicates.
Note that in worst case, we will still hit O(n²) and it is easily possible to construct svg files that deliberately triggers the O(n²) runtime.
With that caveat aside, this improves the runtime performance on MediaWiki_logo_1.svg by about 25% (8m51s -> 6m40s on 5 runs).
Original:
$ time for I in $(seq 1 5) ; do
python3 -m scour.scour MediaWiki_logo_1.svg out.svg ;
done
Scour processed file "heavy.svg" in 105042 ms: 1582746/4989544 bytes new/orig -> 31.7%
Scour processed file "heavy.svg" in 103412 ms: 1582746/4989544 bytes new/orig -> 31.7%
Scour processed file "heavy.svg" in 105334 ms: 1582746/4989544 bytes new/orig -> 31.7%
Scour processed file "heavy.svg" in 107902 ms: 1582746/4989544 bytes new/orig -> 31.7%
Scour processed file "heavy.svg" in 108161 ms: 1582746/4989544 bytes new/orig -> 31.7%
8m51.855s ...
Optimized: Scour processed file "heavy.svg" in 78162 ms: 1582746/4989544 bytes new/orig -> 31.7% Scour processed file "heavy.svg" in 81202 ms: 1582746/4989544 bytes new/orig -> 31.7% Scour processed file "heavy.svg" in 81554 ms: 1582746/4989544 bytes new/orig -> 31.7% Scour processed file "heavy.svg" in 80067 ms: 1582746/4989544 bytes new/orig -> 31.7% Scour processed file "heavy.svg" in 77267 ms: 1582746/4989544 bytes new/orig -> 31.7%
Signed-off-by: Niels Thykier niels@thykier.net