php-fpm, nginx, bad hoodoo magic, and the dreaded 502 gateway error on magento sites

When segmentation faults collide ….

If anything disrupts nginx’s ability to talk to php-fpm, you’ll see the 502 gateway error. Sometimes, that’s because the server guys did something. Sometimes, that’s because the dev guys did something. Sometimes, it’s really, really, really hard to tell.

If your host assures you it’s your code, and they are positive they have not updated anything on the server (php version, nginx version, anything at all), and more importantly, if the 502 gateway is intermittent and always worse under load, you need to consider your code. It may function, but function so poorly it kills the server.

On Magento sites, look for long running queries. You can stack trace by enabling the zend logger in lib/Varien/Db/Adapter/Pdo/Mysql.php.

One of the biggest coding mistakes in Magento customizations is inefficient queries. Running SQL queries inside a loop, for example, can really put a drain on your server. A long running query can consume so much memory it crashes php-fpm which times out the response from nginx and gives you a 502 gateway error, even though your server is running just fine as far as you can see.

Running SQL queries is a very expensive operation, and doing it in a loop tends to make it even worse. Instead of doing that we should use data collections to load models and then process the items in the collection.

Instead of:

foreach ($this->getProductIds() as $productId){
    $product = Mage::getModel('catalog/product')->load($productId);

Do this:

$collection = Mage:getResourceModel('catalog/product_collection')
    ->addFieldsToFilter('entity_id', array($this->getProductIds()))

foreach ($collection as $product){

Especially be on the lookout for queries run through an adapter. Method fetchAll() used to fetch and iterate over larger result sets will lead to a very long execution time (again, with the memory thing and the time out thing and the 502 gateway thing). The better solution is to fetch the results row by row using the fetch() method.

Assuming you declared/initialized your adapter already, instead of this:

$rowSet = $adapter->fetchAll($select);
foreach ($rowSet as $row) {
    //process row

Do this:

$query = $adapter()->query($select);

while ($row = $query->fetch()) {
    //process row

Adding Something to Your Path on Mac

Let’s say you just installed mysql on your Mac. You want to be able to fire up the daemon from the CLI.

This appends the mysql path to the other things in your path and stores it in .bash_profile so you never have to do this again for this daemon.
echo 'export PATH=$PATH:/usr/local/mysql/bin' >> ~/.bash_profile

This reloads your profile to activate your path change in the current terminal.
source ~/.bash_profile

Bonus: change the prompt on your terminal to include your current working directory.

echo "PS1='\w\$ '" >> ~/.bash_profile; source ~/.bash_profile

Vagrant: VM inaccessible

Perhaps, like me, you work from home, and perhaps, like me, you have an evil kitten. Maybe you’ve come back to your desk to get some work done and you go to fire up your VMs and see this:

Your VM has become "inaccessible." Unfortunately, this is a critical error with VirtualBox that Vagrant can not cleanly recover from. Please open VirtualBox and clear out your inaccessible virtual machines or find a way to fix them.

Don’t panic. OK. Panic. But, after that, simply unregister and reregister your VM.

VBoxManage list vms
This will show you a list of VMs with their UUIDs.
~/Dropbox/localhost/VMs$ VBoxManage list vms
"" {037eb4b6-10e5-4919-84aa-54ff72009892}
"" {1bba1f71-7fd1-4cd6-8375-8a37c16d643d}

Take that UUID and use it to unregister the box
~/Dropbox/localhost/VMs$ VBoxManage unregistervm 1bba1f71-7fd1-4cd6-8375-8a37c16d643d
Then, reregister it.
~/Dropbox/localhost/VMs$ VBoxManage registervm "/Users/jennifer/VirtualBox VMs/"

Version Control

Version Control is a system which tracks changes to a set of files over time in order to recall specific versions from the past at some point in the future. It will allow a development team to revert files to a previous (sane) state, identify who made changes to a file, compare and track changes over time, and more. In short, it provides a way to recover from disaster in the development cycle. Developers need to collaborate with other developers on other parts of a system and version control allows them to do so safely.

Centralized Version Control systems, such as CVS or SVN, have a single server which contains all the versioned files in the system, and any number of clients which check these files in and out from that central place. SVN manages files and directories, allowing developers to modify and manage the same set of data from their respective locations and tracks changes by version number so any incorrect change can be backed out. Developers can be aware of what files other developers are working on within the system, and administrators can control what parts of the project individual developers can access. A drawback of the centralized version control system is that it contains a single point of failure – should the central server go down, no one can save versioned changes or collaborate while it is down.

Distributed Version Control systems, such as Git, Mercurial, or Bazaar, fully mirror the repository at each client. If the server crashes, it can be restored from any of the client repositories. Essentially, every check out to a client is a full backup of all the data. Distributed version control leverages the power of high-speed network connections and low storage costs which de-centralizes the store of data. Each user keeps and operates a deep local version history. Changesets are traded directly between users’ local data stores, rather than a centralized master data store, offering incredible performance for day-to-day operations.