Thursday, 30 August 2018

PHP foreach loop returns an additional unwanted array (Wikipedia API)

I've been researching this all day and haven't found any solutions. I'm also very new to php.

The purpose of my function is to take user input (Category1) of a Wikipedia article and return its categories. The basic function below does this without any problems.
function get_all_categories ( ) {

        $url = $this->get_url ( 'categories' ) ;
        $url .= 'titles='.urlencode($_POST['Category1']);
        $url .= '&cllimit=500' ;
        $data = $this->get_result ( $url ) ;

        $array = json_decode($data, true); }

Example result for Urban planning:
Array
(
[batchcomplete] =>
[query] => Array
    (
        [pages] => Array
            (
                [46212943] => Array
                    (
                        [pageid] => 46212943
                        [ns] => 0
                        [title] => Urban planning
                        [categories] => Array
                            (
                                [0] => Array
                                    (
                                        [ns] => 14
                                        [title] => Category:All Wikipedia articles written in American English
                                    )

                                [1] => Array
                                    (
                                        [ns] => 14
                                        [title] => Category:Commons category with local link same as on Wikidata
                                    )

                                [2] => Array
                                    (
                                        [ns] => 14
                                        [title] => Category:Pages using ISBN magic links
                                    )

                                [3] => Array
                                    (
                                        [ns] => 14
                                        [title] => Category:Urban planning
                                    )

                                [4] => Array
                                    (
                                        [ns] => 14
                                        [title] => Category:Use American English from April 2015
                                    )

                                [5] => Array
                                    (
                                        [ns] => 14
                                        [title] => Category:Use dmy dates from April 2015
                                    )

                                [6] => Array
                                    (
                                        [ns] => 14
                                        [title] => Category:Wikipedia articles needing clarification from June 2015
                                    )

                                [7] => Array
                                    (
                                        [ns] => 14
                                        [title] => Category:Wikipedia articles with GND identifiers
                                    )

                            )

                    )

            )

    )

)

My problem begins when I try to extract from this array only the title values. I've attempted to do this with a foreach loop which is the easiest solution I found for multidimensional arrays:
$array1 = new RecursiveIteratorIterator(
        new RecursiveArrayIterator($array),
        RecursiveIteratorIterator::SELF_FIRST);

        foreach ($array1 as $key => $value) {
            if (is_array($value) && $key == 'categories') {
                $result = array_map(function($element){return $element['title'];}, $value);

                print_r($result);
                }
        }

What I get with this code are two arrays. One array with only the titles (what I wanted), but also an unwanted array (sometime includes the first title) attached to the end:
Array
(
[0] => Category:All Wikipedia articles written in American English
[1] => Category:Commons category with local link same as on Wikidata
[2] => Category:Pages using ISBN magic links
[3] => Category:Urban planning
[4] => Category:Use American English from April 2015
[5] => Category:Use dmy dates from April 2015
[6] => Category:Wikipedia articles needing clarification from June 2015
[7] => Category:Wikipedia articles with GND identifiers
)
Array
(
[ns] =>
[title] => C
)

This extra array is what I don't understand. I think the problem is caused by the foreach loop. I tried unsetting $variable outside of the loop but it didn't help. The extra array becomes especially troublesome if I try to pass these results to another function. How can I prevent this from happening?

For simplicity you can traverse array manually rather than using RecursiveIteratorIterator.
RecursiveIteratorIterator will kill performance for large arrays.
Change your extracting logic to this:
$result = array();
foreach($arr['batchcomplete']['query']['pages'] as $k => $v)
{
    foreach($v['categories'] as $cat)
    {
        $result[] = $cat['title'];
    }
}

0 comments:

Post a Comment