Wednesday 11 July 2018

PDO and MSSQL

PDO and MSSQL

When you write your first web application, chances are you’re going to query a database. When you write it in PHP, chances are it’ll look like this:
$mysqli = new mysqli("example.com", "user", "password", "database");
$result = $mysqli->query("SELECT * FROM product");
$row = $result->fetch_assoc();
Before long, you have to start handling user input, which means escaping:
$mysqli = new mysqli("example.com", "user", "password", "database");
$result = $mysqli->query("SELECT * FROM product WHERE name = " . mysqli_real_escape_string($mysqli, $product_name));
$row = $result->fetch_assoc();
As your application grows, you start writing code like this a lot. You may start encapsulating it in a DAO, but they do little besides erect walls around this chimeric code. “Okay,” you say. “This is fine, because it’s only me. I’m a Responsible Engineer and I don’t have to sugar-coat things for myself.” But soon, this project is going gangbusters. You’ve got a team, and then a large one, and now there’s no rug large enough under which you can hide this mess. And woe unto you should you decide you need connection pooling or any other resource management.
One solution to this problem is an ORM. But, some people prefer having their database interactions more “managed” than “abstracted away.” Instead your code could look more like this:
$pdo = new PDO("mysql:host=example.com;dbname=database", "user", "password");
$statement = $pdo->prepare("SELECT * FROM product WHERE name = :name");
$statement->bindParam(":name", $product_name);
$statement->execute();
$row = $statement->fetch(PDO::FETCH_ASSOC);
A little more verbose, yes, but also easier to read and less error-prone. This is PDO. It’s a PHP extension that provides a vendor-agnostic interface to various relational databases. It pairs a well-structured API for performing queries with a series of different database drivers.
When Wayfair began adopting PDO, our database access was relatively managed. An in-house library managed connections over the course of a request, but building queries involved a whole lot of string concatenation. Complex queries would get unwieldy. Engineers with prior PDO experience wanted to know why we weren’t using it. However, to convince engineers new to PDO that it would make their lives easier, it had be as low friction as the existing library and produce output in the same format.
Simplifying PDO syntax was the easy part. Technically, the example given is shy on error handling. The PDO constructor can throw exceptions. Related functions return a boolean value, indicating whether they succeeded. So a “correct” PDO example would look like this:
$pdo = null;
try {
  $pdo = new PDO("mysql:host=example.com;dbname=database", "user", "password");
} catch (Exception $e) {
  // logging, etc., if you want to note when you were unable to get a connection
}

$statement = false;  // PDO::prepare() will return false if there’s an error
if ($pdo) {
  $statement = $pdo->prepare("SELECT * FROM product WHERE name = :name");
}

$row = null;
if ($statement) {
  $statement->bindParam(":name", $product_name);

  if ($statement->execute()) {
    $row = $statement->fetch(PDO::FETCH_ASSOC);
  }
}

// now, do something with $row
Awesome, I know, right? Sure, the PDO API is, on the whole, “nicer,” but no one’s going to want to deal with it if they’re forced to jump through these kinds of hoops. And who could blame you? At Wayfair, we place a lot of value on developer ergonomics. These are problems we strive to solve well when rolling out new internal tools. We landed on a slight extension to PDO that would yield this syntax:
$statement = PDO::new_statement("PT", "SELECT * FROM product WHERE name = :name"); // the first argument refers to the desired host/database
$statement->bindParam(":name", $product_name);
$statement->execute();
$row = $statement->fetch(); // PDO::FETCH_ASSOC is now the default fetch style
We pulled all the boilerplate into a factory function. It does the necessary error handling and reporting. If everything succeeds, it’ll return a standard-issue PDO statement object. If there are errors, it will return a special object which acts like a statement that’s failed, but will return empty result sets if asked. We felt comfortable that this would remove most of the friction around using PDO while preserving the underlying interface. Anyone who wants finer-grained control can still utilize the stock API.
The trickier problem was “make output the same.” While PDO looks the same with each driver, the drivers don’t necessarily behave the same. The documentation isn’t always clear about these differences. We needed to do a fair amount of testing and source code reading to suss out the effects.
While my examples have used MySQL, Wayfair is an MSSQL shop. We had been using the mssql extension. It uses a C API called DBLIB to talk to the server. Microsoft doesn’t maintain an open source version. FreeTDS is the commonly-used free implementation. One of the PDO drivers also uses DBLIB, but it returns column data differently. Instead of returning strings as strings and ints as ints, the PDO DBLIB driver returns everything as a string. We had to patch it to use the expected data types. To be able to differentiate between quoting strings as VARCHAR vs. NVARCHAR, we also added a parameter type. We also added support for the setting connection timeouts (PDO defines a PDO::ATTR_TIMEOUT constant, but it has no effect with the DBLIB driver).
Another reason we were first attracted to PDO was for prepared statements. Since MSSQL supports them, it seemed like this could be an opportunity for a performance gain. However, after digging into the driver internals, we found that the DBLIB driver only emulates them. Microsoft has an ODBC driver for Linux. We tested it in conjunction with PDO’s ODBC driver, but found the two to be incompatible. We were able to get it working with the plain odbc extension, but (amazingly) found prepared statements to be slower than regular queries. Since using prepared statements would’ve necessitated a nontrivial change in coding style, we decided against investigating the speed difference.
We’re currently working on deploying SQL Relay. Preliminary tests have proven out that it reduces network load without adding much overhead. It has a PDO driver, so we’ll be able to swap it into our stack without having to change how queries are made.

0 comments:

Post a Comment