Tuesday 14 August 2018

3 Solutions For Sorting Multidimensional Arrays By Child Keys Or Values In PHP

One of the tasks that less experienced programmers find really difficult is to sort multidimensional arrays by one of the values in the child arrays. In today's tutorial I'm going to show you 3 ways of achieving that - you decide which ones seems easier for you and provides the best performance.
Let's assume a scenario where we have a huge array like the one below and we need to sort it by a value contained in the last child array - in our case the value of the 'weight' key.
  
  <?php
  $array = [
              [
                ['name'=>'John B'],
                ['age'=>30],
                ['sizes'=>
                          [
                          'weight'=>80, 
                          'height'=>120
                          ]
                ]
              ],
              [
                ['name'=>'Marie B'],
                ['age'=>31],
                ['sizes'=>
                          [
                          'weight'=>60, 
                          'height'=>110
                          ]
                ]
              ],
              [
                ['name'=>'Carl M'],
                ['age'=>12],
                ['sizes'=>
                          [
                          'weight'=>70, 
                          'height'=>100
                          ]
                ]
              ],
              [
                ['name'=>'Mike N'],
                ['age'=>19],
                ['sizes'=>
                          [
                          'weight'=>70, 
                          'height'=>150
                          ]
                ]
              ],
              [
                ['name'=>'Nancy N'],
                ['age'=>15],
                ['sizes'=>
                          [
                          'weight'=>60, 
                          'height'=>150
                          ]
                ]
              ],
              [
                ['name'=>'Cory X'],
                ['age'=>15],
                ['sizes'=>
                          [
                          'weight'=>44, 
                          'height'=>150
                          ]
                ]
              ]
  ];
  ?>
  

Sorting method 1 - Using usort and a simple callback function (Recommended).

This is a really simple method - usort automatically loops through each combination of two elements of the array, compares whatever you want it to compare in the callback function, and by returning -1 or 1 respectively, it assigns the appropriate index to the given element in the resulting array. In our case, if the 'weight' in $a is smaller or equal than the 'weight' in $b, the content of $a will end up having a smaller index in the array to be ordered. Here's the code:

  <?php
  //Method1: sorting the array using the usort function and a "callback that you define"
  function method1($a,$b) 
  {
    return ($a[2]["sizes"]["weight"] <= $b[2]["sizes"]["weight"]) ? -1 : 1;
  }
  usort($array, "method1");
  print_r($array);
  ?>

The bubble method - or doing it the old school way

The bubble method is something people used to learn in computer science back in the 90s. It is known to be a slow method of reordering as it needs to loop through the array multiple times.
The bubble method checks each set of adjacent values and if the required condition is met it swaps their positions. This action is then performed again and again until no reordering is necessary in a full array traversal. In this example the array is traversed until all weights found at key $j are smaller than the weights found at $j+1. Have a look:

<?php
//Method 2: The bubble method
$j=0;
$flag = true;
$temp=0;

while ( $flag )
{
  $flag = false;
  for( $j=0;  $j < count($array)-1; $j++)
  {
    if ( $array[$j][2]["sizes"]["weight"] > $array[$j+1][2]["sizes"]["weight"] )
    {
      $temp = $array[$j];
      //swap the two between each other
      $array[$j] = $array[$j+1];
      $array[$j+1]=$temp;
      $flag = true; //show that a swap occurred
    }
  }
}
print_r($array);
?>

The do it yourself method

Some programmers simply prefer writing their own code for such tasks so I thought I would give you an example of custom code for achieving the same result.
The way I designed it is pretty straightforward:
First create a temporary empty array, then loop through the array that needs to be ordered and replace its existing keys with the values of what you are ordering by. Please note that you also need to make sure that the new keys are unique so you don't lose any elements. In my case - I decided to use the old keys and append them to the new ones so the resulting keys are something like: "new_key" . "some_string" . "old_unique_key". Then I sort the temporary array by key using the built in ksort() function and use array values to re-index the array numerically. Here's the code:
Please note that the code below will not work correctly in some cases. For example, if one 'weight' would be more than or equal to 100, then the ksort function would fail to give the right results. You can use ksort($temp, SORT_NATURAL) in such a case.

<?php
  //Method3: DIY 
  $temp = [];
  foreach ($array as $key => $value)
    $temp[$value[2]["sizes"]["weight"] . "oldkey" . $key] = $value; //concatenate something unique to make sure two equal weights don't overwrite each other
  ksort($temp); // or ksort($temp, SORT_NATURAL); see paragraph above to understand why
  $array = array_values($temp);
  unset($temp);
  print_r($array);
?>

Update: January 29th 2017 - The array_multisort() way...

Another option for sorting multi-dimensional arrays is to use a built in function called array_multisort(). While it can also be used to sort several independent arrays at once, its real power comes into play because it provides an off the shelf method to sort a multi-dimensional array by one or more "child" dimensions.
Combining array_multisort() with array_map() and its callback for obtaining an array of weights, allows for this complex sorting to happen in only one call.

  <?php
  array_multisort(array_map(function($element) {
      return $element[2]['sizes']['weight'];
  }, $array), SORT_ASC, $array);
Because this last method doesn't look as easy as the others... I will explain what it does:
  • array_map applies a callback function on each element of the input, thus generating a numerically indexed array of weights.
  • array_multisort reorders the array of weights and it also applies it to the input array.

Update: February 1st 2018 - Which one is best ?

Many of you have asked me which one of these methods is best. This led me to writing a quick and dirty performance and memory usage test with different data sizes. This should give you an idea on what to use depending on your server specs and array size.
The test code can be found on my git repo as a snippet.

Scenario 1: count($array)=1000 & memory_limit=128M

Test 1As expected, the algorithm complexity doesn't really matter when dealing with small amounts of data. Both execution time and memory usage are very low. Even so, we can already observe that the bubble-sort method (Method 2) is the slowest. This led me to introduce a time limit of 20 seconds after which, the script will just bail out of trying to resolve the sorting using that method. Another early conclusion is that Method 3 will use more memory because of the necessity to build the temporary array which is used for sorting.

Scenario 2: count($array)=10.000 & memory_limit=128M

Test 2Stepping up the array size to 10.000 elements only confirms the initial findings. The bubble sort method has already failed whereas method 3 in which we assign the sorting column as the key of a temporary array seems the fastest but uses the most memory.

Scenario 3: count($array)=100.000 & memory_limit=128M

Test 3Pushing our test even further but keeping the maximum allocated memory to 128M makes the test fail which was to be expected so from this point on I decided to just give the script as much memory as it needed so I set memory_limit at 2GB.

Scenario 4: count($array)=100.000 & memory_limit=2G

Test 4This is where things start to get really interesting because the usort() (method1) method and the array_map()/array_multisort() (method4) combination are on par when it comes to memory usage but it looks like method4 is a bit faster than the first. Again, while using a bit more memory, the temporary array path seems to be the way to go.

Scenario 5: count($array)=1.000.000 & memory_limit=2G

Test 5Pushing this to the limit provides us certainty that previous conclusions are accurate:
  1. Method 3 (the technique in which we create a temporary array holding the values to be sorted) is the fastest but uses an average of 4% more memory.
  2. Method 4 (the use of array_multisort() in combination with array_map()) is the second fastest in fact almost twice as fast as the usort() method, while using the same amount of memory.
  3. Method 1 (usort() + callback function) is slow when it comes to dealing with large amounts of data on this scenario.
  4. Method 2 (the bubble method) is clearly useful only when dealing with very small amounts of data as it has a worst case complexity of О(n2).
Because the test doesn't care about weights being unique, nor does it care about what happens when 2 weights are equal, in some cases the resulting arrays will not be equal.
Hope you liked today's tutorial.Thanks!

0 comments:

Post a Comment