Always Get Better

Never stop looking for ways to improve

April 24th, 2011

There is a lot of debate among newcomers to Azure whether to use Azure SQL or file-based Table Storage for data persistence. Azure itself does not make the differences very clear, so let’s take a closer look at each option and try to understand the implications of each.

Azure SQL
Azure SQL is very similar to SQL Server Express. It is meant as a replacement for SQL Server Standard and Enterprise, but the feature set is not there yet. Reporting services, in particular, are in community technology preview (CTP) phase and at the time of writing are not ready for prime time.

Programmatically, Azure SQL is very similar to SQL in existing web applications; any ODBC classes you already wrote will work directly on Azure with no changes.

Size and cost are the major limitations with Azure SQL in its current incarnation. The largest supported data size is 50GB which runs for $499/month. Any databases larger than 50GB would need to be split across multiple instances – this requires knowledge of sharing at the application level and is not easy surgery to perform.

Table Storage
Table storage uses a key-value pair to retrieve data stored on Azure’s disk system, similar to the way memcache and MySQL work together to provide the requested data at fast speeds. Each storage container supports 100TB of data at incredibly cheap ($0.15/GB) rates.

Working with table storage involves accessing them directly from your application differently than you may be accustomed to with SQL. Going all-in ties you to the Azure platform – which is probably not a problem if you’re already developing for Azure as you will likely be trying to squeeze every ounce of performance out of all areas of the platform anyway.

Table storage does not support foreign key references, joins, or any of the other SQL-stuff we usually use. It is up to the programmer to compensate by making wide de-normalized tables and build their lists in memory. If you’re already building clustered applications, this is not a new design pattern as developers typically want to cache data in this manner.

Besides the larger space limits, table storage affords us automatic failover. Microsoft’s SLA guarantees we will always be able to access the data, and this is accomplished by replicating everything across at least three nodes. Compared to self-managing replication and failover with the SQL service, this is a huge advantage as it keeps the complexity out of our hands.

Difference in Evolution
If Azure SQL seems somewhat stunted compared to Table Storage, it’s not an accident: it is a newcomer who was not planned during the original construction of the Azure platform. Microsoft carefully considered the high-availabilty trends used for application development and found that the NoSQL way would most easily scale to their platform. Developer outrage prompted the company to develop Azure SQL, and its service offerings are improving rapidly.

Depending on your storage needs, your course of action may be to store as much data for cheap in Table Storage, and use SQL to index everything. Searching the SQL database will be incredibly fast, and can be done in parallel with any loads against persistent tables – everybody wins.

April 5th, 2011

Back in 2009 I was tired of hearing the phrases “cloud computing” and “in the cloud”. These days I’m so numb to their meaninglessness that it doesn’t even phase me anymore. Somewhere along the way marketers took over the internet and ‘social media’ became a job position.

So what do I have against cloud computing? Would I rather build servers, deal with co-location, and suffer massive downtimes in order to change hardware specs? Of course not.

Let’s not lose sight of the big picture: virtualized servers are still servers. From a remote perspective the management is all the same and from a hardware perspective you still need to be responsible for your data in the event of a catastrophic failure.

While I am a huge proponent of “cloud” providers like Rackspace (heck I host all of my web sites on Cloud Server instances), let’s call a spade a spade: there is nothing magical about servers in the cloud, they are just virtualized instances running on a massively powerful hardware architecture.

Why go with virtualization over a dedicated box? Virtual servers are cheap – I don’t need to incur the startup costs that I would from a dedicated server. For a small business this is a huge deal; for larger business with intense data needs the dedicated solution will always provide the most security but for anything from tiny, small to very large applications the virtualized way is the ticket. Add more servers, remove them, reconfigure: you don’t get that kind of flexibility from traditional server hosting.

Long live cloud computing; but the name has to go. Did the term come from network diagrams where the Internet was represented as a cloud? I don’t think it’s a particularly clever analogy to consider your business assets living as disembodied entities “somewhere” in the networking cloud.

We’re fighting a losing battle if we believe we’re going to get the marketers to back off the internet now. But on the tech side let’s keep calling it what it is and try not to let the marketing buzz cloud our opinion of the technologies we use.

November 14th, 2010

Awhile ago I wrote about how I was using nginx to serve static files rather than letting the more memory-intensive Apache handle the load for files that don’t need its processing capabilities. The basic premise is that nginx is the web-facing daemon and handles static files directly from the file system, while shipping any other request off to Apache on another port.

What if Apache is on a different server entirely? Unless you have the luxury of an NAS device, your options are:

1. Maintain a copy of the site’s assets separate from the web site
There are two problems with this approach: maintainability, and synchronization. You’ll have to remember to deploy any content changes separately to the rest of the site, which is counter-intuitive and opens up your process to human error. User-generated content stays on the Apache server and would be inaccessible to nginx.

2. Use a replicating network file system like GlusterFS
Network-based replication systems are advanced and provide amazing redundancy. Any changes you make to one server can be replicated to the others very quickly, so any user generated content will be available to your content servers, and you only have to deploy your web site once.

The downside is that many NFS solutions are optimized for larger (>50Mb) filesizes. If you rely on your content server for small files (images, css, js), the read performance may decline when your traffic numbers increase. For high availability systems where it is critical for each server to have a full set of up-to-date files, this is probably the best solution.

3. Use an rsync-based solution
This is the method I’ve chosen to look at here. It’s important that my content server is updated as fast as possible, and I would like to know that when I perform disaster recovery or make backups of my web site the files will be reasonably up to date. If a single file takes a few seconds to appear on any of my servers, it isn’t a huge deal (I’m just running WordPress).

The Delivery Mechanism
rsync is fast and installed by default on most servers. Pair it with ssh and use password-less login keys, and you have an easy solution for script-able file replication. The only missing piece is the “trigger” – whenever the filesystem is updated, we need to run our update script in order to replicate to our content server.

Icrond is one possible solution – whenever a directory is updated icrond can run our update script. The problem here is that service does not act upon file updates recursively. fsniper is our solution.

The process flow should look like this.
1. When the content directory is updated (via site upload or user file upload), fsniper initiates our update script.
2. Update script connects to the content server via ssh, and issues an rsync command between our content directory and the server’s content directory.
3. Hourly (or whatever), initiate an rsync command from the content server to any web servers – this will keep all the nodes fairly up-to-date for backup and disaster recovery purposes.

August 6th, 2010

So I wiped my hard drive and installed Ubuntu. After struggling with the decision to switch from Windows for some time, I finally resolved to move.

So far the results have been very good. My system boots up and is ready to use in less than a minute, there is no lag loading and switching programs, and everything I need for my day-to-day programming is available much more readily than it was with the other operating system.

The most striking difference to me is the amount of disk space I now have available to me. With all of my software, work projects, and operating system overhead, Windows left 80Gb free from my 285Gb drive. With all of my projects, code libraries, files and operating system installed, Ubuntu uses just 6.7Gb, leaving 97% of the drive available for my use. I am blown away by how much less clutter I have now.

I haven’t tried to do very much with Mono yet; we’ll see how it works when I try making improvements to my SiteAssistant project. I’ve been reading about Mono’s Winforms capabilities and so far am impressed by the possibilities. We’ll see how well it works with my fairly simple project; with any luck I may have found a cross-platform .NET solution with this one. Maybe the Winforms explorations will be a good topic for a future post.

Not missing Office yet, either. My Quicken financial software has been running perfectly under Wine, and all of my files appear to have made the move intact. I still own licenses to all my software, so on those rare instances if I really need it I can install Windows with VirtualBox and fill up some of that hard drive space I’ve earned.

August 5th, 2010

Here’s how to install Git on Ubuntu 10.04


sudo apt-get install git-core

(The package name is git-core, not git)

August 2nd, 2010
Ubuntu-logo-unique-image
Creative Commons License photo credit: Jeffpro57

In the last number of weeks I have been seriously considering taking the plunge and wiping my Windows laptop clean in order to switch to Ubuntu as my primary machine. Although Windows 7 has gone a long way toward smoothing over the problems Vista brought, it isn’t perfect.

Windows isn’t a bad operating system, by any means. Like OS X and Ubuntu, it has its strengths and weaknesses. However, as a developer whose primary work involves web pages, I definitely see Windows as more of a barrier to efficient workflow. There are a few pieces of software that have kept me on Windows for awhile but which just don’t hold me back anymore:

1. Microsoft Office
I definitely qualify as a power user for this software. Yes, OpenOffice can do most of what MS Office can, but I will sorely miss the features that “most people” don’t use. However, the majority of my work doesn’t touch Office – in fact, it’s relatively rare that I will need Access, or Word, or even Excel. When I do use these programs, it only tends to be in support of a client who has used them inappropriately for some data storage.

I’ve long outgrown Outlook due to the amount of mail I keep; I don’t like to delete anything because true professionals are able to refer back to projects no matter how old. I don’t have a replacement mail program yet; but so far the Gmail interface has been more than sufficient.

2. Visual Studio
This software is giving me pause. If I switch over, I will be giving up my ability to truly work in the .NET world, which is where I have largely been for the past decade. Most of my workflow recently has been with the open source, PHP-driven web world and I’m not sure that I’m excited about going back to a pure Microsoft environment. That said, I want to be sure I’m not closing any doors.

Mono has made great strides in bringing the .NET platform, specifically C#, over to Mac and Unix, but the more Windows-centric database and GUI interfaces don’t translate over very well. I can always run a Windows Virtual Machine for the rare instances I will need to work on that platform, but it seems a bit counter-productive to keep around an environment that I don’t use for the sake of a few days each year.

3. Quicken
My other strong reason for staying with Windows has been my love for Quicken – the Mac version just doesn’t compare to the Windows version – and my inability to manage my finances on paper after years of dependence. Since so much of my data is tied into this software, any switch will involve either years of data entry or a major hit to my forecasting abilities.

Fortunately, Wine has come to the rescue – last night I was able to get a full install of Quicken on my test/Ubuntu machine with absolutely no problems! The fonts looked a little weird in the reports, but otherwise all of the functionality was there and working beautfully. It even looked like a Linux app – unbelievable!

Should I Stay or Should I Go?
So the big thing I’m weighing in my head right now is whether I can stand to give up the .NET programming I have been involved with for so long and switch to a full Linux environment. I still love my Windows environment and software, but it just doesn’t seem to make sense to keep it given my current open source focus.

January 24th, 2010

When running a default installation, WAMP Server’s Apache crashes when installing X-Cart. The error happens immediately after setting up the MySQL tables and is caused by the curl extension in PHP.

To resolve, click on the WAMP Server icon in the task tray, go to Apache -> Version -> Get More. Download one of the 2.0 series Apache servers and install it.

Repeat the process for PHP; download one of the 5.0 series PHP versions and install it.

Switch WAMP Server to the new versions and run the install again – it should work with no problems.