Always Get Better

Never stop looking for ways to improve

November 4th, 2011

Hopefully when you do web work, you’re not developing code on the same server your users are accessing. Most organizations have at least some kind of separation for their development and production code, but it’s possible to go far further. Separating environments allows you to achieve multiple threads of continuous integration for all kinds of cool.

These normally break down as follows:

Development
Working code copy. Changes made by developers are deployed here so integration and features can be tested. This environment is rapidly updated and contains the most recent version of the application.

Quality Assurance (QA)
Not all companies will have this. Environment for quality assurance; this provides a less frequently changed version of the application which testers can perform checks against. This allows reporting on a common revision so developers know whether particular issues found by testers has already been corrected in the development code.

Staging/Release Candidate
This is the release candidate, and this environment is normally a mirror of the production environment. The staging area contains the “next” version of the application and is used for final stress testing and client/manager approvals before going live.

Production
This is the currently released version of the application, accessible to the client/end users. This version preferably does not change except for during scheduled releases. There may be differences in the production environment but generally it should be the same as the staging environment.

Having separation between the different environments is not tricky, but managing your data environment can be. There are, of course, all kinds of ways to solve the problem.

April 21st, 2011

The best way to keep visitors engaged in your website is by delivering your experience in as little time as possible. The average visitor will only stick around for a few seconds, so it is important to get them interacting with your content fast. The first thing to check for, of course, is any bottlenecks in the initial page generation. Once the web page is being generated quickly, we can turn our attention to the next biggest culprit: the connection to your client.

Downloading files directly from a web server is costly, even if you’re using an efficient server like nginx for static files.

A content delivery network (CDN) can help speed up the process by storing your content in data centres around the world so they get served to your visitors from locations that are physically close to them. This results in fewer network hops which makes the files download faster, and reduces the overall load on your web server so you can focus on doing more interesting dynamic application stuff.

At one point, CDN services were only available to companies with deep pockets and huge websites, but these days anyone can set up and use an inexpensive service with their regular hosting provider.

Check with your host to see if they offer a content delivery solution. The two providers I use for my blog, Media Temple and Rackspace both have excellent services. If you are using a WordPress site, check out W3 Total Cache, which provides an all-in-one package for managing your files and optimizing the overall speed of your site.

April 19th, 2011
Tortoise
Creative Commons License photo credit: Eric Kilby

Why is my website loading so slowly?!?

There are a few common culprits behind website speed issues. When diagnosing problems, the best bet is to start at the worst performers and move up. Some suggestions, in order from slowest to fastest, are:

1. Internet Traffic
If your web page is downloading anything over the internet during each page request, stop right now. This is the most expensive operation you can perform. Example: Downloading a photo from Flickr and loading it into memory in order to determine its width and height dimensions.

2. Network Traffic
Local network traffic is generally very fast, but still involves transmitting information outside your computer. In some cases, such as web clusters with a shared session cache, the network performance cost is worth it for the overall application.

3. Database
Databases are fast, particularly when the data you need is already stored in a memory cache – which you generally can’t control. When paired with a key-value memory store like memcache, the majority of your database calls can come straight from memory.

4. Disk I/O
Even with the incredible access times found in today’s hard drives, reading and writing from the disk is an expensive operation (and why databases lose points, except for their memory caching abilities). Sometimes reading from disk is the better choice – YMMV.

5. Script Caching
Implement a tool like xcache (PHP). This will keep your code in binary bytecode format which is much faster to execute since it doesn’t have to be re-processed by the web server.

April 18th, 2011

MySQL+Memcache have been bedfellows for awhile and at this point are the de facto standard for highly-available, scalable websites. Even with other SQL and NoSQL solutions starting to become popular, this pair holds on as the winner for LAMP programmers. Is the complexity of working with this technology pair worth the investment?

Read vs Write
Traditional relational databases place the burden of computation on read operations. In mainframe environments with powerful servers and relatively few users, this made sense. Database normalization prevents redundancy, and data can be joined together when needed to produce the desired results.

In a web application with 1,000,000 users, the normalized transactional model does not perform. Generally speaking it is way faster to make two queries to a small subset of data rather than attempt expensive joins in a client-facing web site.

Enter memcache: by storing the result of our SQL queries in memory, we improve the speed of subsequent requests by pulling the data from memory as well as avoiding a hit to the database entirely, freeing it to process urgent or real-time requests.

Anatomy of an SQL Query
When we run an SQL query, we are actually asking the server to perform a lot of work:

  1. Break down the query into object references: The DBMS needs to understand which tables, columns, and filters you are using by tokenizing your SQL (by breaking out names from the keywords like SELECT, FROM, WHERE).
  2. Identify which indexes (if any) are most relevant to the data: This is harder with more complex queries which must be broken down or which depend on outer tables for their values.
  3. Read the source tables from the hard drive: Most DBMS implementations include some kind of memory caching which partially avoids this expensive read step, but some disk IO is a normal part of operation
  4. Join Columns: If we specify a join, especially a LEFT or RIGHT join, the DBMS has to create a pseudo table from the joined sources in memory before it can do any additional processing.
  5. Sub-selects: Any sub-select statements need to be processed. Depending on how the statement was written, this needs to be done for every row returned in the result set.
  6. Filter and sort: Anything in the ‘WHERE’ clause needs to be filtered out of the result set. This is where we are going to start seeing performance improvements by narrowing our result set.
  7. Aggregation: Once the database has its final result set it can do all of the aggregation we ask of it, both calculations and grouping

As we can imagine, this can be a time-consuming process. If it is repeated thousands of times in a short period, we will see significant performance loss.

Anatomy of a Cache Request
By contrast, when we perform a request for data from a cache like memcache, we do this:

  1. Check the index for the presence of the supplied key
  2. If a result exists, return it

In the case of memcache, this happens entirely in memory with no hit to the database whatsoever, resulting in a blindingly fast result set.

Speed Over Persistance
The reason “>memcache and MySQL work well as a pair is because they provide the tools needed to have reliable, persistent transactional storage (through MySQL) along with lightning fast data retrieval (through Memcache) especially for rarely-changing results.

April 12th, 2011

One of my favourite aspects of the cloud is the ease with which we can create new VMs to test our wacky architecture theories. It’s so easy (and cheap!) to spin up a small server cluster for some serious load testing, and then destroy it again when done.

If nothing else, it provides a safety net and teaches you how to squeeze every ounce of performance out of big and small server instances. Let’s examine ways in which we can make our dynamic Apache settings much faster.

Turn Off Modules You’re Not Using
This should be fairly obvious, but Apache ships with a number of modules which can affect performance but which most of us never need. Check your /etc/apache/mods-enabled folder to see what can be removed.

Never Trust Defaults
The default Apache settings are optimized for a website serving static files only. Booorrring! Never be afraid to question what you see in the configuration files; the more you understand about the inner workings of the system, the better you will be able to improve its performance.

RAM is good, Swap is Bad
Running out of physical memory (RAM) and hitting the hard drive’s swap space is bad, especially in the Virtual Machine world. When this happens your performance will nose dive; your machine may even crash. The simplest solution is to increase the amount of RAM available to your server, but if that is too costly or impossible, read on.

Kill the KeepAlive
Whenever a request is made to the web server, it keeps the network connection open for a small amount of time (often 15 seconds). During that time, if the visitor’s web browser needs to get another file, it goes through the same connection thereby avoiding wasting time re-connecting to your server. The problem is the open connection will use up space in your connection pool so if your site is under heavy load new visitors will get queued up and may experience slowdowns trying to access your content.

If Apache is your front-end web server, set the KeepAliveTimeout to 2 seconds. This will keep the number of requests fluid even under heavy load.

If your server is behind a firewall like nginx or HAProxy where KeepAlives are not honoured, turn this setting off entirely.

Don’t Serve Static Files
Apache is a memory hog. Since each hit to the server is relatively heavy in terms of threads and memory, we are in better shape when we serve non-changing static content like images, stylesheets and javascript using a single-threaded server like nginx or lighttpd or even a memory server like varnish (bonus points for using a CDN to serve static files, avoiding the hit to your server at all).

Turn off HostnameLookups
This should already be done by default in your Apache configuration; if it isn’t, do it now. When HostnameLookups is on, Apache checks every incoming request’s IP address for its host name. This can dramatically increase your latency, and isn’t healthy for DNS servers either.

Disable AllowOverride
It is tempting to set AllowOverride to All in order to give your .htaccess files free reign to do as they please. The downside of this directive is that every time anything is requested Apache will need to check that folder and every one of its parents all the way down to the site root in order to check for .htaccess commands. Apache recommends setting AlloverOverride to none globally, enabling access for .htaccess files that can’t be set in the site configuration.

August 6th, 2010

So I wiped my hard drive and installed Ubuntu. After struggling with the decision to switch from Windows for some time, I finally resolved to move.

So far the results have been very good. My system boots up and is ready to use in less than a minute, there is no lag loading and switching programs, and everything I need for my day-to-day programming is available much more readily than it was with the other operating system.

The most striking difference to me is the amount of disk space I now have available to me. With all of my software, work projects, and operating system overhead, Windows left 80Gb free from my 285Gb drive. With all of my projects, code libraries, files and operating system installed, Ubuntu uses just 6.7Gb, leaving 97% of the drive available for my use. I am blown away by how much less clutter I have now.

I haven’t tried to do very much with Mono yet; we’ll see how it works when I try making improvements to my SiteAssistant project. I’ve been reading about Mono’s Winforms capabilities and so far am impressed by the possibilities. We’ll see how well it works with my fairly simple project; with any luck I may have found a cross-platform .NET solution with this one. Maybe the Winforms explorations will be a good topic for a future post.

Not missing Office yet, either. My Quicken financial software has been running perfectly under Wine, and all of my files appear to have made the move intact. I still own licenses to all my software, so on those rare instances if I really need it I can install Windows with VirtualBox and fill up some of that hard drive space I’ve earned.