Wednesday, 1 October 2014

Generators in PHP

If you’ve followed my previous posts about iterators then you’ll know that iteration is an important programming concept, but implementing the required interfaces to create an iterable object can be a hassle at best because of the amount of boilerplate code that is required. With the release of PHP 5.5, we finally have generators!
In this article we’ll take a look at generators which provide an easy way to implement simple iterators without the overhead or complexity of the Iterator interface.

How do Generators Work?

According to Wikipedia, a generator “is very similar to a function that returns an array, in that a generator has parameters, can be called, and generates a sequence of values”. A generator is basically a normal function, but instead of returning a value it yields as many values as it needs to. It looks like a function but acts like an iterator.
Generators use the yield keyword instead of return. It acts similar to return in that it returns a value to the caller of the function, but instead of removing the function from the stack, yield saves its state. This allows the function to continue from where it was when it’s called again. In fact, you cannot return a value from a generator although you can use return without a value to terminate its execution.
The PHP manual states: “When a generator function is called, it returns an object that can be iterated over.” This is an object of the internal Generator class and implements the Iterator interface in the same way a forward-only iterator object does. As you iterate over that object, PHP calls the generator each time it needs a value. The state is saved when the generator yields so that it can be resumed when the next value is required.











<?php
function nums() {
    echo "The generator has startedn";
    for ($i = 0; $i < 5; ++$i) {
        yield $i;
        echo "Yielded $in";
    }
    echo "The generator has endedn";
}
foreach (nums() as $v);
The output of the above code will be:
The generator has started
Yielded 0
Yielded 1
Yielded 2
Yielded 3
Yielded 4
The generator has ended

Our First Generator

Generators are not a new concept and already exist in languages such as C#, Python, JavaScript, and Ruby (enumerators), and are usually identified by their use of the yield keyword. The following is an example in Python:








def file_lines(filename):
    file = open(filename)
    for line in file:
        yield line
    file.close()
for line in file_lines('somefile'):
    #do some work here
Let’s rewrite the example Python generator in PHP. (Note that both snippets do not perform any sort of error checking.)












<?php
function file_lines($filename) {
    $file = fopen($filename, 'r');
    while (($line = fgets($file)) !== false) {
        yield $line;
    }
    fclose($file);
}
foreach (file_lines('somefile') as $line) {
    // do some work here
}
The generator function opens a file and then yields each line of the file as and when it is required. Each time the generator is called, it continues from where it left off. It doesn’t start from the beginning again as its state had been saved when the yield statement was executed. Once all lines have been read, the generator simply terminates and the loop ends.

Returning Keys

PHP iterators consist of key/value pairs. In our example, only a value was returned and therefore the keys were numeric (keys are numeric by default). If you wish to return an associative pair, simply change the yield statement to include the key using array syntax.








<?php
function file_lines($filename) {
    ...
        yield $key => $line;
    ...
}
foreach (file_lines('somefile') as $key => $line) {
    // do some work here
}

Injecting Values

yield does not only return values; it can receive values from the outside as well. This is done by calling the send() method of the generator object with the value you wish to pass. This value can then be used in computing or doing other stuff. The method sends the value to the generator as a result of the yield expression and resumes execution.



















<?php
function nums() {
    for ($i = 0; $i < 5; ++$i) {
        // get a value from the caller
        $cmd = (yield $i);
        if ($cmd == 'stop') {
            return; // exit the generator
        }
    }
}
$gen = nums();
foreach ($gen as $v) {
    // we are satisfied
    if ($v == 3) {
        $gen->send('stop');
    }
    echo "{$v}n";
}
The output will be:
0
1
2
3

Saving Memory with Generators

Generators are great for when you are calculating large sets and you don’t want to allocate memory for all of the results at the same time or when you don’t know if you will need all of the results, Due to the way results are processed, the memory footprint can be reduced to a very bare minimum by allocating memory for only the current result.
Imagine the file() function which returns all of the lines in a file as an array. Running a simple benchmark for file() and our demo file_lines() functions, each using the same random 100 paragraph text file generated using Lipsum, showed the file function used up to 110 times more memory than the generator.










<?php
// Test 1
$m = memory_get_peak_usage();
foreach (file_lines('lipsum.txt') as $l);
echo memory_get_peak_usage() - $m, "n"; //Outputs 7336
// Test 2
$m = memory_get_peak_usage();
foreach (file('lipsum.txt') as $l);
echo memory_get_peak_usage() - $m, "n"; // Outputs 148112

Conclusion

With the introduction of Generators, PHP has placed a powerful tool into the hands of developers. We can now write iterators rapidly while saving a lot of memory in the process. With this tutorial, I hope you have gained enough to start using them yourself in your projects. I for one have quite a few objects in mind that I am going to rewrite. If you have any ideas or comments, drop them.


What's the difference to normal functions?

Now you might wonder why we are not simply using PHP's native range function to achieve that output. And right you are. The output would be the same. The difference is how we got there.
When we use range PHP, will execute it, create the entire array of numbers in memory and return that entire array to the foreach loop which will then go over it and output the values. In other words, the foreach will operate on the array itself. The range function and the foreach only "talk" once. Think of it like getting a package in the mail. The delivery guy will hand you the package and leave. And then you unwrap the entire package, taking out whatever is in there.
When we use the generator function, PHP will step into the function and execute it until it either meets the end or a yield keyword. When it meets a yield, it will then return whatever is the value at that time to the outer loop. Then it goes back into the generator function and continues from where it yielded. Since your xrange holds a for loop, it will execute and yield until $max was reached. Think of it like the foreach and the generator playing ping pong.

Why do I need that?

Obviously, generators can be used to work around memory limits. Depending on your environment, doing a range(1, 1000000) will fatal your script whereas the same with a generator will just work fine. Or as Wikipedia puts it:
Because generators compute their yielded values only on demand, they are useful for representing sequences that would be expensive or impossible to compute at once. These include e.g. infinite sequences and live data streams.
Generators are also supposed to be pretty fast. But keep in mind that when we are talking about fast, we are usually talking in very small numbers. So before you now run off and change all your code to use generators, do a benchmark to see where it makes sense.

Since when can I use yield?

Generators have been introduced in PHP 5.5. Trying to use yield before that version will result in various parse errors, depending on the code that follows the keyword. So if you get a parse error from that code, update your PHP.

So, they are useful when:
  • you need to do things simple (or simple things);
    generator is really much simplier then implementing the Iterator interface. other hand is, ofcource, that generators are less functional. compare them.
  • you need to generate BIG amounts of data - saving memory;
    actually to save memory we can just generate needed data via functions for every loop iteration, and after iteration utilize garbage. so here main points is - clear code and probably performance. see what is better for your needs.
  • you need to generate sequence, which depends on intermediate values;
    this is extending of the previous thought. generators can make things easier in comparison with functions. check Fibonacci example, and try to make sequence without generator. Also generators can work faster is this case, at least because of storing intermediate values in local variables;
  • you need to improve performance.

0 comments:

Post a Comment