Always Get Better

Never stop looking for ways to improve

January 30th, 2012

All web site owners should feel a burning need to speed. Studies have shown that viewers waiting more than 2 or 3 seconds for content to load online are likely to leave without allowing the page to fully load. This is particularly bad if you’re trying to run a web site that relies on visitors to generate some kind of income – content is king but speed keeps the king’s coffers flowing.

If your website isn’t the fastest it can be, you can take some comfort in the fact that the majority of the “top” web sites also suffer from page load times pushing up into the 10 second range (have you BEEN to Amazon lately?). But do take the time to download YSlow today and use its suggestions to start making radical improvements.

I’ve been very interested in web server performance because it is the first leg of the web page’s journey to the end user. The speed of execution at the server level is capable of making or breaking the user’s experience by controlling the amount of ‘lag time’ between the web page request and visible activity in the web browser. We want our server to send page data as immediately as possible so the browser can begin rendering it and downloading supporting files.

Not long ago, I described my web stack and explained why I moved away from the “safe” Apache server solution in favour of nginx. Since nginx doesn’t have a PHP module I had to use PHP’s FastCGI (PHP FPM) server with nginx as a reverse proxy. Additionally, I used memcached to store sessions rather than writing to disk.

Here are the configuration steps I took to realize this stack:

1. Memcached Sessions
Using memcached for sessions gives me slightly better performance on my Rackspace VM because in-memory reading&writing is hugely faster than reading&writing to a virtualized disk. I went into a lot more detail about this last April when I wrote about how to use memcached as a session handler in PHP.

2. PHP FPM
The newest Ubuntu distributions have a package php5-fpm that installs PHP5 FastCGI and an init.d script for it. Once installed, you can tweak your php.ini settings to suit, depending on your system’s configuration. (Maybe we can get into this another time.)

3. Nginx
Once PHP FPM was installed, I created a site entry that would pass PHP requests forward to the FastCGI server, while serving other files directly. Since the majority of my static content (css, javascript, images) have already been moved to a content delivery network, nginx has very little actual work to do.


server {
listen 80;
server_name sitename.com www.sitename.com;
access_log /var/log/nginx/sitename-access.log;
error_log /var/log/nginx/sitename-error.log;
# serve static files
location / {
root /www/sitename.com/html;
index index.php index.html index.htm;

# this serves static files that exists without
# running other rewrite tests
if (-f $request_filename) {
expires 30d;
break;
}

# this sends all-non-existing file or directory requests to index.php
if (!-e $request_filename) {
rewrite ^(.+)$ /index.php?q=$1 last;
}
}

location ~ \.php$ {
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME /www/sitename.com/html$fastcgi_script_name;
include fastcgi_params;
}
}

The fastcgi_param setting controls which script is executed, based upon the root path of the site being accessed. All of the requests parameters are passed through to PHP, and once the configuration is started up I didn’t miss Apache one little bit.

Improvements
My next step will be to put a varnish server in front of nginx. Since the majority of my site traffic comes from search engine results where a user has not yet been registered to the site or needs refreshed content, Varnish can step in and serve a fully cached version of my pages from memory far faster than FastCGI can render the WordPress code. I’ll experiment with this setup in the coming months and post my results.

January 25th, 2012

Now that WebOS is being made open source, HP has released a new version of the Enyo JavaScript framework. Whereas the first version of the framework only supported Webkit-based environments (like the HP Touchpad, or Safari or Chrome), the newer version has expanded support for Firefox and IE9 as well. Developers who created apps with the old framework will have to wait a little while longer before all of the widgets and controls from Enyo 1.0 are ported over.

What does this mean for app developers? Now that Enyo is open-source, it means applications built on the platform will run on Android and iOS. But it’s not a disruptive technology – both Android and iOS have supported HTML5 applications for quite awhile; HP will be competing against mature frameworks like jQuery Mobile.

As a WebOS enthusiast I am definitely going to put some time into continuing my explorations of Enyo, but it’s getting harder and harder to justify the investment. My Pre is getting pretty old at this point, and hardware manufacturers have yet to express interest in making new devices to take advantage of WebOS. If I end up switching to Android with my next hardware purchase, it’s going to shift my priorities away from Enyo and its brethren.

August 6th, 2011
Mochila Firefox
Creative Commons License photo credit: jmerelo

So I got tired of using Firefox 3.6 in my Ubuntu machine and decided to upgrade to the newest version (5.0). It’s understandable that the package maintainers responsible for Ubuntu don’t put bleeding-edge cutting-edge releases in the distribution due to the possibility of introducing unstable elements into the user experience. But Firefox 4 has been out for over a year, and the migration to 5 is well underway.

Fortunately, it couldn’t be much easier to get the newest official release using our good friend aptitude.

In a terminal window, add the Mozilla team’s stable Firefox repository by issuing the following command:


sudo add-apt-repository ppa:mozillateam/firefox-stable

Next, perform an update to get the package listing, and upgrade to install the newest browser:


sudo apt-get update
sudo apt-get upgrade

That’s it – you’re done! Your shortcuts are even updated, and any bookmarks or open tabs you might have had on the go are carried forward.

I was pleasantly surprised at how easy this process was.

August 5th, 2011

Apple-based LAMP developers be warned: the new version of OS X does not include MySQL, which was formerly part of the developer tools shipped with the operating system. In its place look for deliciously Oracle-free PostgreSQL. Of course, developers can and will continue to download MySQL and install it themselves, but the out-of-box experience moving forward will be with PostgreSQL.

Although it is still an extremely popular database, Oracle’s presence in the MySQL world has put a chill over business users considering using the product as the backbone of their data solutions. Other databases with similar purposes exists but none have the deep community boasted by PostgreSQL.

August 2nd, 2011

Having played with Windows Azure for about six months or so, I think I have a good handle on its pros and cons for the tasks I’ve been trying to do. I definitely have a lot of positive conclusions about the platform as a whole, and a few pain points I can see Microsoft working hard to eliminate.

Starting with the good:

Seamless Deployments – No Client Downtime
One of the trickiest things to set up while deploying a web cluster is the ability to deploy new builds without causing downtime, and reverting to the last build version in the event of a discovery of some critical flaw.

Azure really takes the pain out of this part of the process. Just deploy to staging, test the staging area, and hit ‘Swap VIP’ to reverse the staging & production instances. This is a great way to increase and decrease capacity as long as the end points remain the same (same number of ports, etc). (Remember, you need to have 2 or more instances in order to invoke the SLA.)

Scalable Computing as it is Meant to Be
Most of the developer confusion I have encountered during these first months of development has turned out to be related to scalable programming in general and not Azure programming in particular. I haven’t been training people to use Azure so much as I have been promoting the importance of properly separating aspects of code to make use of data sources in a non-limiting way. Caching, database access, blob storage, table storage are not new concepts, but using them from the ground-up in every project has been new to most people.

We can’t develop with one web server in mind and then scale out, we now have to think about scaling right from the beginning, which is resulting in a much cleaner architecture overall. Developers become more connected to the practical results of their decisions when they have to choose whether to use NoSQL, SQL or both for data access based on factors ranging from speed to overall cost.

Tight Code Cohesion
Working with Azure for me has meant a return to .NET and C# programming. Combined with the ASP.NET MVC 3 framework and latest SQL database versions, the experience has been overwhelmingly fun. It has been great to have code that just works nicely in an environment that has been clearly optimized for exactly what I’m trying to do. Not having to deal with server setup and maintenance has freed me to dive more deeply into the capabilities of the system without needing to worry about the configuration. In short, I’m loving this.

And the bad:

Long Deploy Times
At one point our build times were pushing 45 minutes. This is a huge problem when it is 11 o’clock at night on launch day and you have a team of developers waiting around in order to verify their fixes.

During development the situation can be made easier with incremental pushes; since each Azure instance is a VM, it isn’t hard to upload the changed files through RDP and see changes immediately. For devving this is fine, but the changes are not persistent therefore the full deploy process of uploading the cspack file, provisioning instances and starting up is needed when moving production code into a live environment.

The process can definitely be sped up by reducing the overall project size; at one point we had several hundred megabytes of supporting files which didn’t need to be part of our project. Removing these basically eliminated our upload times, but the web role instantiation still clocks in at 15-20 minutes; a long time when you want to go home to your family.

Slow System Responses
When I say slow system response, I am referring to the management API which hits the AppFabric, not the response times of our applications which have been incredibly strong.

Any time I manage our Azure account, I find myself waiting around for a lot of information. During deployments especially, the status messages are unhelpful ‘Creating deployment’, ‘Initializing instance’, ‘Instance busy’, etc. This is really frustrating given the long deployment times; having more transparency into the process would be a real confidence builder.

Unclear Management Console
This is a bit nit-picky; the Silverlight-based management console is pretty cleanly laid out. Some of the configuration options are not clear, though. For example, to grant API access to your applications, you need to upload your certificate to the ‘Management Certificates’ section, which is separate and unrelated to the Certificates folder subordinate to each hosted service. This makes sense for veterans of the system as both certificate groups serve different roles, but new users are easily confused by the distinction and can spend hours trying to figure out how to deploy their projects from Visual Studio.

Similarly, after uploading your certificate, you can get the subscription ID by clicking on the subscription and copying the ID from the info panel at the right-hand side of your screen. If you right-click on the subscription ID in the menu pane and choose the ‘Copy’ option, it copies the creation date & name of the subscription. Hardly a major problem, but potentially a stumbling block for developers new to the platform.

April 29th, 2011

Apart from the login screen, if you are using a Ubuntu computer and want to know the version number:


> cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.04
DISTRIB_CODENAME=lucid
DISTRIB_DESCRIPTION="Ubuntu 10.04.2 LTS"

April 25th, 2011

As I dig deeper into the Windows Azure platform, I am becoming more and more impressed by the potential it offers. Microsoft has put a lot of money and resources into developing their infrastructure and have done an incredible job at creating an interesting and powerful architecture. If anything, their major fault is in advertising – with so many different technologies with ever-changing names, it’s hard for a newcomer to wrap their mind around the services. It is pricier than some options so it’s hard to really experiment in much depth, although they do offer a free option for developers to dig in and try the services.

Ensuring 99.95% SLA
One potential ‘gotcha’ is the SLA: it differs between their difference services but hovers in the 99.9% range. Most account properties include automatic scaling and failover, but Web Roles and Worker Roles do not – in order to qualify for the 99.95% SLA you need to deploy your solution to two or more instances. By doing that, you ensure that your application is load balanced across multiple fault domains therefore will remain available even during a failure.

Of course, this doubles the cost – each compute instance is billed separately.

Fault Domain vs Upgrade Domain
Azure’s logical server space is segmented into fault domains and upgrade domains. Translated into physical terms, a fault domain refers to a single rack containing multiple computers, while an upgrade domain is one or more computers that receive software upgrades at the same time. Multiple upgrade domains may exist within any fault domain.

For our application to remain online, we need to consider where it is hosted. Although the fault and upgrade domains are largely abstracted away from developers, the SAL guarantees that 2 or more instances are split across two or more upgrade and fault domains. At a minimum, this means if you have two Web Role instances they will exist on different computers in different racks. Beyond that, Azure decides where instances are initialized and hosted from.

High Availability Concerns
Now let’s consider our application which has been split across two instances. We are guaranteed to have 99.95% network availability to our application, but there are more factors to consider.

When one of the instances goes offline – due to a server fault or OS upgrade – Azure automatically starts it up on a working server. As long as the remaining instance is able to handle all of your application’s needs, this process is transparent and results in no performance degradation.

What happens if your application is spread across two instances, each under 70% load? The response times might be good, but if one of the instances goes offline even temporarily your remaining healthy deployment will trying to handle 140% of the overall traffic. If you want your business to stay on line, you need to plan for these periods of downtime.

This is where the abstraction hurts – we can’t control the number of fault domains in our cluster but we can control the number of upgrade domains (essentially forcing each instance to load up on a new machine). Worst case scenario is half of your cluster dies when the Azure fault domain goes down – something that should never happen, but anything can happen online.

Note, however, that this kind of planning is what’s needed in general cases when your traffic might suddenly jump due to customer interest. We’re not just mitigating against host failure, but also against reputation damage from inadequate planning.

Keep Sessions in Mind
One last thing to mention: when we split our application into multiple instances, we need to consider how we handle session data. Since Azure uses a round robin situation for their load balancing meaning the end user could potentially hit a different instance each time they access a resource on your site. If you are using traditional session handlers, session data is not shared between the instances so user data is not guaranteed to be persisted on each request.

This is actually huge benefit to cloud computing, and something I’ve praised other frameworks for providing seamless support for. Our web application should be designed to avoid sharing data where possible. If session data is needed, we can use AppFabric’s session cache providers to provide fast network-based session management.