Tuesday 17 July 2018

PHP Benchmark – sha1_file() vs sha1(file_get_contents())

PHP Benchmark – sha1_file() vs sha1(file_get_contents())

When working on a recent project the need came up to produce a SHA1 hash of a file to allow comparisons between new and existing files. I knew that I could easily create a hash of a file using PHP’s sha1_file() function but I was unsure on the performance and speed of it, bearing in mind that this process would need to be done hundreds of times a day.
I’d seen other developers on forums mention that the same result could be acheived using a combination of the sha1() and file_get_contents() functions but there was no mention of whether it would be quicker, slower, or if indeed it was essentially the same thing. As a result I decided to pit the two methods against each other to see how each performed individually.
Let The Tests Begin…
My testing script looked a little like the following:

  1. echo microtime(true)."\n";  
  2.   
  3. for ($i=0; $i<1000; $i++) {  
  4.     $output = sha1_file($file);  
  5. }  
  6.   
  7. echo microtime(true)."\n";  
  8.   
  9. for ($i=0; $i<1000; $i++) {  
  10.     $output = sha1(file_get_contents($file));  
  11. }  
  12.   
  13. echo microtime(true);  
I ran the script for two different files located in the same directory as the script; one that was 213 KB and a slightly larger one at 1.52 MB. The script was ran 5 times for each file and average time differences taken at the end.
The Results
File 1 - 213 KB
sha1_file() average time - 1.25498 seconds
sha1(file_get_contents()) average time - 1.21908 seconds
File 2 - 1.52 MB
sha1_file() average time - 8.4037 seconds
sha1(file_get_contents()) average time - 8.89108 seconds
The Conclusion
As you see from the results above the method speeds differ based on the size of the file being hashed. For smaller files the second method was faster, but for larger files the sha1_file() was the preferred option. Remember that these tests contained a thousand iterations each so the time difference between the two is minimal, however I feel it does become an important factor when you'll be repeating this process thousands of times, over and over again.
It's also worth noting that this wasn't a fluke. I repeated the same tests three more times with the same files and the outcome was the same.

0 comments:

Post a Comment