Always Get Better

Archive for April, 2011

Finding Your Ubuntu Version

Friday, April 29th, 2011

Apart from the login screen, if you are using a Ubuntu computer and want to know the version number:

> cat /etc/lsb-release

Pint-Sized Mobile Devices

Thursday, April 28th, 2011

Lately I’m fascinated by the Dell Streak and am thinking it would be fun to get one. But really, what is the use case? Do tablets have a place in everyday life?

The tablet computer is an interesting device because it fills that no-man’s land between the phone and the computer, and has so much potential as a conveyor of meta experience to users of the other platforms.

The problem so far is no manufacturer has provided compelling reasons why we should be buying tablets and adding them into our routines. It’s almost as if everyone is blindly following Apple’s lead regardless of whether doing so makes sense. Now we have a flood of iPad-like devices, where the original iPad is, let’s be honest, interesting but not really useful.

My prediction: There is going to be a killer app for tablets and it is going to be immediately obvious to everyone why we need to use this medium. In the meantime, developers with mobile programming skill are going to be writing their own ticket – so learn how to build apps for Windows Phone, Android, iOS, or (to a lesser extent) WebOS.

It’s not a four letter word

Wednesday, April 27th, 2011

Sometimes it seems we have an un-natural fear of the word “no”. This might be rooted in childhood memory: when I tell my son “no”, I am preventing him from doing something that he wants to do. As a responsible adult, I know the limits I place upon him are for his own protection. As a toddler, he sees “no” as a limiter, something that ends a course of inquiry.

I think we have a hard time letting go of our memories. So when a friend, a boss, a client asks us for an unreasonable favour, we are inclined to bend over backwards to deliver. Maybe we’re afraid that if we say no, our value will somehow be dimished.

I see “no” as an invitation. Now we know where the limit is, so we can work to get to a comfortable “yes”.

I can’t help feeling that I once read an article with a similar message to this; if anyone knows what that is, do send me a link!

Protect Your SSH Server with RSA Keys

Tuesday, April 26th, 2011

If it’s possible to log into your web server over SSH with a username and password, you may not be properly secured. Even if root access is impossible, a username and password combination can be broken with brute force; once your server has been compromised it’s only a matter of time before a rootkit installation attempt is successful.

Even password-less RSA keys provide better protection than a password because they are long encrypted strings that cannot be guessed from a dictionary. Although a brute force attack against an RSA key is still possible, it requires a much more sophisticated attacker and takes many more attempts. As encryption technologies improve, the length of keys need to increase; but even a key with no password attached is way more secure than a human-typed password.

Ensure SLA with Multiple Web Role Instances on Windows Azure

Monday, April 25th, 2011

As I dig deeper into the Windows Azure platform, I am becoming more and more impressed by the potential it offers. Microsoft has put a lot of money and resources into developing their infrastructure and have done an incredible job at creating an interesting and powerful architecture. If anything, their major fault is in advertising – with so many different technologies with ever-changing names, it’s hard for a newcomer to wrap their mind around the services. It is pricier than some options so it’s hard to really experiment in much depth, although they do offer a free option for developers to dig in and try the services.

Ensuring 99.95% SLA
One potential ‘gotcha’ is the SLA: it differs between their difference services but hovers in the 99.9% range. Most account properties include automatic scaling and failover, but Web Roles and Worker Roles do not – in order to qualify for the 99.95% SLA you need to deploy your solution to two or more instances. By doing that, you ensure that your application is load balanced across multiple fault domains therefore will remain available even during a failure.

Of course, this doubles the cost – each compute instance is billed separately.

Fault Domain vs Upgrade Domain
Azure’s logical server space is segmented into fault domains and upgrade domains. Translated into physical terms, a fault domain refers to a single rack containing multiple computers, while an upgrade domain is one or more computers that receive software upgrades at the same time. Multiple upgrade domains may exist within any fault domain.

For our application to remain online, we need to consider where it is hosted. Although the fault and upgrade domains are largely abstracted away from developers, the SAL guarantees that 2 or more instances are split across two or more upgrade and fault domains. At a minimum, this means if you have two Web Role instances they will exist on different computers in different racks. Beyond that, Azure decides where instances are initialized and hosted from.

High Availability Concerns
Now let’s consider our application which has been split across two instances. We are guaranteed to have 99.95% network availability to our application, but there are more factors to consider.

When one of the instances goes offline – due to a server fault or OS upgrade – Azure automatically starts it up on a working server. As long as the remaining instance is able to handle all of your application’s needs, this process is transparent and results in no performance degradation.

What happens if your application is spread across two instances, each under 70% load? The response times might be good, but if one of the instances goes offline even temporarily your remaining healthy deployment will trying to handle 140% of the overall traffic. If you want your business to stay on line, you need to plan for these periods of downtime.

This is where the abstraction hurts – we can’t control the number of fault domains in our cluster but we can control the number of upgrade domains (essentially forcing each instance to load up on a new machine). Worst case scenario is half of your cluster dies when the Azure fault domain goes down – something that should never happen, but anything can happen online.

Note, however, that this kind of planning is what’s needed in general cases when your traffic might suddenly jump due to customer interest. We’re not just mitigating against host failure, but also against reputation damage from inadequate planning.

Keep Sessions in Mind
One last thing to mention: when we split our application into multiple instances, we need to consider how we handle session data. Since Azure uses a round robin situation for their load balancing meaning the end user could potentially hit a different instance each time they access a resource on your site. If you are using traditional session handlers, session data is not shared between the instances so user data is not guaranteed to be persisted on each request.

This is actually huge benefit to cloud computing, and something I’ve praised other frameworks for providing seamless support for. Our web application should be designed to avoid sharing data where possible. If session data is needed, we can use AppFabric’s session cache providers to provide fast network-based session management.

Azure Table Storage vs Azure SQL

Sunday, April 24th, 2011

There is a lot of debate among newcomers to Azure whether to use Azure SQL or file-based Table Storage for data persistence. Azure itself does not make the differences very clear, so let’s take a closer look at each option and try to understand the implications of each.

Azure SQL
Azure SQL is very similar to SQL Server Express. It is meant as a replacement for SQL Server Standard and Enterprise, but the feature set is not there yet. Reporting services, in particular, are in community technology preview (CTP) phase and at the time of writing are not ready for prime time.

Programmatically, Azure SQL is very similar to SQL in existing web applications; any ODBC classes you already wrote will work directly on Azure with no changes.

Size and cost are the major limitations with Azure SQL in its current incarnation. The largest supported data size is 50GB which runs for $499/month. Any databases larger than 50GB would need to be split across multiple instances – this requires knowledge of sharing at the application level and is not easy surgery to perform.

Table Storage
Table storage uses a key-value pair to retrieve data stored on Azure’s disk system, similar to the way memcache and MySQL work together to provide the requested data at fast speeds. Each storage container supports 100TB of data at incredibly cheap ($0.15/GB) rates.

Working with table storage involves accessing them directly from your application differently than you may be accustomed to with SQL. Going all-in ties you to the Azure platform – which is probably not a problem if you’re already developing for Azure as you will likely be trying to squeeze every ounce of performance out of all areas of the platform anyway.

Table storage does not support foreign key references, joins, or any of the other SQL-stuff we usually use. It is up to the programmer to compensate by making wide de-normalized tables and build their lists in memory. If you’re already building clustered applications, this is not a new design pattern as developers typically want to cache data in this manner.

Besides the larger space limits, table storage affords us automatic failover. Microsoft’s SLA guarantees we will always be able to access the data, and this is accomplished by replicating everything across at least three nodes. Compared to self-managing replication and failover with the SQL service, this is a huge advantage as it keeps the complexity out of our hands.

Difference in Evolution
If Azure SQL seems somewhat stunted compared to Table Storage, it’s not an accident: it is a newcomer who was not planned during the original construction of the Azure platform. Microsoft carefully considered the high-availabilty trends used for application development and found that the NoSQL way would most easily scale to their platform. Developer outrage prompted the company to develop Azure SQL, and its service offerings are improving rapidly.

Depending on your storage needs, your course of action may be to store as much data for cheap in Table Storage, and use SQL to index everything. Searching the SQL database will be incredibly fast, and can be done in parallel with any loads against persistent tables – everybody wins.

Surviving Cloud Failures

Saturday, April 23rd, 2011
Creative Commons License photo credit: Don Fulano

Amazon is in the news today for the failure their Elastic Block Storage (EBS) service suffered, resulting in loss of service and/or extreme latency for hundreds of sites including some of their largest customers like Foursquare and reddit. AWS has been widely regarded as the most stable and overall leader of the cloud providers, so it was a great shock to many observers that they were able to suffer such a large failure.

I think the failure is not surprising, but rather the fact that it hasn’t happened before now is surprising. It underscores my message that cloud computing is not magical but is in fact an abstraction over very real hardware. There are bound to be flaws and issues just as with “real” hosting options, the difference is the end customer has less control over the hardware, hosting and networking environment.

Not every business can afford the overhead of a large dedicated solution, so what to do?

Spread the Load
The key is redundancy. Start by spreading your content across the internet rather than relying on single server to cough up all of your visitors’ needs. Things like content delivery networks (CDNs) will reduce the incoming load on the server and help it stay online.

How can we tell if a website is offloading the right amount of content? Perform regular speed testing and identify problem areas using tools like YSlow.

Redundancy! Eliminate Single Points of Failure
Whenever you have a single system servicing part of your application, you expose the entire application to failure.

For example, suppose you have four Apache servers and a load balancer sending equal traffic to each. If one of the Apache servers fails, the other three are able to compensate for the loss with no downtime for your visitors. But what happens if the load balancer fails? Even though all four web servers are in fine working order, your site is knocked offline.

Some systems are difficult to cluster: replication schemes in the various SQL servers are a huge drain on performance – newer solutions like MySQL Cluster or DrizzleDB aim to solve this problem, but at extra expense in terms of configuration and application design.

The key to successful redundancy is in scripting your software in such a way that failures can be recovered from fast and automatically. Having a hot spare in the group isn’t useful if you need to reach an administrator at 4am to activate – by that point you’ve already lost your overseas customers for the day.

Twilio has an excellent summary of the engineering process that goes into creating a scalable cloud-ready infrastructure.

Avoid the Cloud? Never
Despite some public failures, cloud computing has not suffered any kind of blow. Large organizations will always want their own private non-cloud hosting, small sites will always be looking for an inexpensive VWS. The middle-tier which is serviced by the cloud will continue to see cost savings that greatly outweigh any physical hosting options available at that level.

Because of the low server cost, cloud computing allows smart customers the freedom to build necessary redundancy without breaking the bank. Even though this pays off big time when catastrophic failures happen, there are longer term benefits of improved overall response times to the end users even when the hosting is working well.