Always Get Better

Posts Tagged ‘hosting’

Azure Table Storage vs Azure SQL

Sunday, April 24th, 2011

There is a lot of debate among newcomers to Azure whether to use Azure SQL or file-based Table Storage for data persistence. Azure itself does not make the differences very clear, so let’s take a closer look at each option and try to understand the implications of each.

Azure SQL
Azure SQL is very similar to SQL Server Express. It is meant as a replacement for SQL Server Standard and Enterprise, but the feature set is not there yet. Reporting services, in particular, are in community technology preview (CTP) phase and at the time of writing are not ready for prime time.

Programmatically, Azure SQL is very similar to SQL in existing web applications; any ODBC classes you already wrote will work directly on Azure with no changes.

Size and cost are the major limitations with Azure SQL in its current incarnation. The largest supported data size is 50GB which runs for $499/month. Any databases larger than 50GB would need to be split across multiple instances – this requires knowledge of sharing at the application level and is not easy surgery to perform.

Table Storage
Table storage uses a key-value pair to retrieve data stored on Azure’s disk system, similar to the way memcache and MySQL work together to provide the requested data at fast speeds. Each storage container supports 100TB of data at incredibly cheap ($0.15/GB) rates.

Working with table storage involves accessing them directly from your application differently than you may be accustomed to with SQL. Going all-in ties you to the Azure platform – which is probably not a problem if you’re already developing for Azure as you will likely be trying to squeeze every ounce of performance out of all areas of the platform anyway.

Table storage does not support foreign key references, joins, or any of the other SQL-stuff we usually use. It is up to the programmer to compensate by making wide de-normalized tables and build their lists in memory. If you’re already building clustered applications, this is not a new design pattern as developers typically want to cache data in this manner.

Besides the larger space limits, table storage affords us automatic failover. Microsoft’s SLA guarantees we will always be able to access the data, and this is accomplished by replicating everything across at least three nodes. Compared to self-managing replication and failover with the SQL service, this is a huge advantage as it keeps the complexity out of our hands.

Difference in Evolution
If Azure SQL seems somewhat stunted compared to Table Storage, it’s not an accident: it is a newcomer who was not planned during the original construction of the Azure platform. Microsoft carefully considered the high-availabilty trends used for application development and found that the NoSQL way would most easily scale to their platform. Developer outrage prompted the company to develop Azure SQL, and its service offerings are improving rapidly.

Depending on your storage needs, your course of action may be to store as much data for cheap in Table Storage, and use SQL to index everything. Searching the SQL database will be incredibly fast, and can be done in parallel with any loads against persistent tables – everybody wins.

Small Site, Big Footprint

Saturday, April 16th, 2011
Creative Commons License photo credit: orinoko42

I like redundancy, to a fault. Part of it goes to my need for comprehensive backup – as long as you have a backup prepared, you are less likely to lose anything. So it stands to reason that if you have two identical copies of your web site running, you are more tolerant to all kinds of failures – from your web server going down to an unexpected surge in traffic.

Protection Against Hardware Failure
If your server experiences hardware issues, nothing you can do (short of changing the faulty components) will keep your site online. If you have two web servers and one of them fails, the other can still pick up the slack.

This is an important consideration when you’re using a cloud provider. In many cases you only occupy a small portion of the hardware you are hosting on, which means if someone else messes up their VM instance it can potentially affect your application. Spreading across to two VMs increases the odds that you end up on different physical hardware, perhaps even a different server rack altogether. If your provider performs maintenance on one of their servers you will not necessarily get knocked offline.

Protection Against Traffic
Even if you’re on a strong fast server, you can only support a certain number of concurrent users. A load balancing situation can reduce the average load per machine in your cluster, which improves the overal response rate, and eliminates the ‘queuing’ problem you would experience with a large numbers of users all trying to hit a particular machine.

Memcached as Session Handler

Saturday, April 9th, 2011

By default PHP loads and saves sessions to disk. Disk storage has a few problems:

1. Slow IO: Reading from disk is one of the most expensive operations an application can perform, aside from reading across a network.
2. Scale: If we add a second server, neither machine will be aware of sessions on the other.

Enter Memcached
I hinted at Memcached before as a content cache that can improve application performance by preventing trips to the database. Memcached is also perfect for storing session data, and has been supported in PHP for quite some time.

Why use memcached rather than file-based sessions? Memcache stores all of its data using key-value pairs in RAM – it does not ever hit the hard drive, which makes it F-A-S-T. In multi-server setups, PHP can grab a persistent connection to the memcache server and share all sessions between multiple nodes.

Before beginning, you’ll need to have the Memcached server running. I won’t get into the details for building and installing the program as it is different in each environment, but there are good guides on the memcached site. On Ubuntu it’s as easy as aptitude install memcached. Most package managers have a memcached installation available.

Installing memcache for PHP is hard (not!). Here’s how you do it:

pecl install memcache

Careful, it’s memcache, without the ‘d’ at the end. Why is this the case? It’s a long story – let’s save the history lesson for another day.

When prompted to install session handling, answer ‘Yes’.

Using memcache for sessions is as easy as changing the session handler settings in PHP.ini:
session.save_handler = memcache
session.save_path = “tcp://” (assuming memcached is set up to use default port)

Now restart apache (or nginx, or whatever) and watch as your sessions are turbo-charged.

Cloud Computing Is Not Magical

Tuesday, April 5th, 2011

Back in 2009 I was tired of hearing the phrases “cloud computing” and “in the cloud”. These days I’m so numb to their meaninglessness that it doesn’t even phase me anymore. Somewhere along the way marketers took over the internet and ‘social media’ became a job position.

So what do I have against cloud computing? Would I rather build servers, deal with co-location, and suffer massive downtimes in order to change hardware specs? Of course not.

Let’s not lose sight of the big picture: virtualized servers are still servers. From a remote perspective the management is all the same and from a hardware perspective you still need to be responsible for your data in the event of a catastrophic failure.

While I am a huge proponent of “cloud” providers like Rackspace (heck I host all of my web sites on Cloud Server instances), let’s call a spade a spade: there is nothing magical about servers in the cloud, they are just virtualized instances running on a massively powerful hardware architecture.

Why go with virtualization over a dedicated box? Virtual servers are cheap – I don’t need to incur the startup costs that I would from a dedicated server. For a small business this is a huge deal; for larger business with intense data needs the dedicated solution will always provide the most security but for anything from tiny, small to very large applications the virtualized way is the ticket. Add more servers, remove them, reconfigure: you don’t get that kind of flexibility from traditional server hosting.

Long live cloud computing; but the name has to go. Did the term come from network diagrams where the Internet was represented as a cloud? I don’t think it’s a particularly clever analogy to consider your business assets living as disembodied entities “somewhere” in the networking cloud.

We’re fighting a losing battle if we believe we’re going to get the marketers to back off the internet now. But on the tech side let’s keep calling it what it is and try not to let the marketing buzz cloud our opinion of the technologies we use.

Use a PHP Accelerator to Speed Up Your Website

Saturday, November 20th, 2010

I like PHP because it makes it really easy to quickly build a website and add functionality, and is generally lightning fast when executed without needing to wait for compilation as with ASP.NET or Java (Yes, we always pre-compile those languages prior to putting our applications into production, but with PHP we don’t even have to do that).

Even though compilation is very fast, it still has a resource and time cost, especially on high-traffic servers. We can improve our response times by more than 5x by pre-caching our compiled opcode for direct execution later. There are a few PHP accelerators which accomplish this for us:

Xcache is my favourite and is the one I use in my own configurations. It works by caching the compiled PHP opcode in memory so PHP can be directly executed by the web server without expensive disk reads and processing time. Many caching schemes also use Xcache to store the results of PHP rendering so individual pages don’t need to be re-processed.

APC (Alternative PHP Cache)
APC is a very similar product to XCache – in fact XCache was released partially as a response to the perceived lag APC’s support for newer PHP versions. APC is essentially the standard PHP Accelerator – in fact, it will be included by default in PHP 6. As much as I like XCache, it will be hard to compete with built-in caching.

Turck MMCache is one of the original PHP Accelerators. Although it is no longer in development, it is still widely used. An impressive feature of MMCache is its exporter which allows you to distribute compiled versions of your PHP applications without the source code. This is useful for those companies that feel they need to protect their program code when hosting in client environments.

eAccelerator picked up where MMCache left off, and added a number of features to increase its usability as a content cache. Over time, the content caching features have been removed as more efficient and scalable solutions like memcache have allowed caches to be shared across web servers.

Keep Optimizing
One major consideration that often goes forgotten when optimizing website speeds that not all of your visitors will be using a high-speed connection; some users will be using mobile or worse connections, even for non-mobile sites. Every ounce of speed will reflect favourably on you and improve your retention rates – and ultimately get more visitors to your ‘call to action’ goals. I’ll go into more detail about bigger speed improvements we can make in a later post.

Cheap File Replication: Synchronizing Web Assets with fsniper

Sunday, November 14th, 2010

Awhile ago I wrote about how I was using nginx to serve static files rather than letting the more memory-intensive Apache handle the load for files that don’t need its processing capabilities. The basic premise is that nginx is the web-facing daemon and handles static files directly from the file system, while shipping any other request off to Apache on another port.

What if Apache is on a different server entirely? Unless you have the luxury of an NAS device, your options are:

1. Maintain a copy of the site’s assets separate from the web site
There are two problems with this approach: maintainability, and synchronization. You’ll have to remember to deploy any content changes separately to the rest of the site, which is counter-intuitive and opens up your process to human error. User-generated content stays on the Apache server and would be inaccessible to nginx.

2. Use a replicating network file system like GlusterFS
Network-based replication systems are advanced and provide amazing redundancy. Any changes you make to one server can be replicated to the others very quickly, so any user generated content will be available to your content servers, and you only have to deploy your web site once.

The downside is that many NFS solutions are optimized for larger (>50Mb) filesizes. If you rely on your content server for small files (images, css, js), the read performance may decline when your traffic numbers increase. For high availability systems where it is critical for each server to have a full set of up-to-date files, this is probably the best solution.

3. Use an rsync-based solution
This is the method I’ve chosen to look at here. It’s important that my content server is updated as fast as possible, and I would like to know that when I perform disaster recovery or make backups of my web site the files will be reasonably up to date. If a single file takes a few seconds to appear on any of my servers, it isn’t a huge deal (I’m just running WordPress).

The Delivery Mechanism
rsync is fast and installed by default on most servers. Pair it with ssh and use password-less login keys, and you have an easy solution for script-able file replication. The only missing piece is the “trigger” – whenever the filesystem is updated, we need to run our update script in order to replicate to our content server.

Icrond is one possible solution – whenever a directory is updated icrond can run our update script. The problem here is that service does not act upon file updates recursively. fsniper is our solution.

The process flow should look like this.
1. When the content directory is updated (via site upload or user file upload), fsniper initiates our update script.
2. Update script connects to the content server via ssh, and issues an rsync command between our content directory and the server’s content directory.
3. Hourly (or whatever), initiate an rsync command from the content server to any web servers – this will keep all the nodes fairly up-to-date for backup and disaster recovery purposes.

Facebook Breaks, But Photos Not Lost

Monday, March 9th, 2009

Over the weekend Facebook suffered a multiple hardware failure that caused its photo service to fail. Up to 15% of the site’s multiple-billion photos displayed as nothing more than a question mark on Sunday night.

Facebook is already in the process of cleaning up the mess and restoring the lost files, but the failure is just another example highlighting the importance of making backups of our own data rather than relying on “the cloud” for permanence.