One of the tasks that less experienced programmers find really difficult is to sort multidimensional arrays by one of the values in the child arrays. In today's tutorial I'm going to show you 3 ways of achieving that - you decide which ones seems easier for you and provides the best performance.
Let's assume a scenario where we have a huge array like the one below and we need to sort it by a value contained in the last child array - in our case the value of the 'weight' key.
Sorting method 1 - Using usort
and a simple callback function (Recommended).
This is a really simple method - usort automatically loops through each combination of two elements of the array, compares whatever you want it to compare in the callback function, and by returning -1 or 1 respectively, it assigns the appropriate index to the given element in the resulting array. In our case, if the 'weight' in
$a
is smaller or equal than the 'weight' in $b
, the content of $a
will end up having a smaller index in the array to be ordered. Here's the code:The bubble method - or doing it the old school way
The bubble method is something people used to learn in computer science back in the 90s. It is known to be a slow method of reordering as it needs to loop through the array multiple times.
The bubble method checks each set of adjacent values and if the required condition is met it swaps their positions. This action is then performed again and again until no reordering is necessary in a full array traversal. In this example the array is traversed until all weights found at key $j are smaller than the weights found at $j+1. Have a look:
The do it yourself method
Some programmers simply prefer writing their own code for such tasks so I thought I would give you an example of custom code for achieving the same result.
The way I designed it is pretty straightforward:
First create a temporary empty array, then loop through the array that needs to be ordered and replace its existing keys with the values of what you are ordering by. Please note that you also need to make sure that the new keys are unique so you don't lose any elements. In my case - I decided to use the old keys and append them to the new ones so the resulting keys are something like: "new_key" . "some_string" . "old_unique_key". Then I sort the temporary array by key using the built in
ksort()
function and use array values to re-index the array numerically. Here's the code:
Please note that the code below will not work correctly in some cases. For example, if one 'weight' would be more than or equal to 100, then the
ksort
function would fail to give the right results. You can use ksort($temp, SORT_NATURAL)
in such a case.
Update: January 29th 2017 - The array_multisort()
way...
Another option for sorting multi-dimensional arrays is to use a built in function called
array_multisort()
. While it can also be used to sort several independent arrays at once, its real power comes into play because it provides an off the shelf method to sort a multi-dimensional array by one or more "child" dimensions.
Combining
array_multisort()
with array_map()
and its callback for obtaining an array of weights, allows for this complex sorting to happen in only one call.
Because this last method doesn't look as easy as the others... I will explain what it does:
array_map
applies a callback function on each element of the input, thus generating a numerically indexed array of weights.array_multisort
reorders the array of weights and it also applies it to the input array.
Update: February 1st 2018 - Which one is best ?
Many of you have asked me which one of these methods is best. This led me to writing a quick and dirty performance and memory usage test with different data sizes. This should give you an idea on what to use depending on your server specs and array size.
The test code can be found on my git repo as a snippet.
Scenario 1: count($array)=1000
& memory_limit=128M
As expected, the algorithm complexity doesn't really matter when dealing with small amounts of data. Both execution time and memory usage are very low. Even so, we can already observe that the
bubble-sort method (Method 2)
is the slowest. This led me to introduce a time limit of 20 seconds after which, the script will just bail out of trying to resolve the sorting using that method. Another early conclusion is that Method 3
will use more memory because of the necessity to build the temporary array which is used for sorting.
Scenario 2: count($array)=10.000
& memory_limit=128M
Stepping up the array size to 10.000 elements only confirms the initial findings. The
bubble sort
method has already failed whereas method 3
in which we assign the sorting column as the key of a temporary array seems the fastest but uses the most memory.
Scenario 3: count($array)=100.000
& memory_limit=128M
Pushing our test even further but keeping the maximum allocated memory to 128M makes the test fail which was to be expected so from this point on I decided to just give the script as much memory as it needed so I set
memory_limit
at 2GB.
Scenario 4: count($array)=100.000
& memory_limit=2G
This is where things start to get really interesting because the
usort() (method1)
method and the array_map()/array_multisort() (method4)
combination are on par when it comes to memory usage but it looks like method4
is a bit faster than the first. Again, while using a bit more memory, the temporary array path seems to be the way to go.
Scenario 5: count($array)=1.000.000
& memory_limit=2G
Method 3
(the technique in which we create a temporary array holding the values to be sorted) is the fastest but uses an average of4%
more memory.Method 4
(the use of array_multisort() in combination with array_map()) is the second fastest in fact almost twice as fast as theusort()
method, while using the same amount of memory.Method 1
(usort() + callback function) is slow when it comes to dealing with large amounts of data on this scenario.Method 2
(the bubble method) is clearly useful only when dealing with very small amounts of data as it has a worst case complexity ofО(n2)
.
Because the test doesn't care about weights being unique, nor does it care about what happens when 2 weights are equal, in some cases the resulting arrays will not be equal.
Hope you liked today's tutorial.Thanks!
0 comments:
Post a Comment