Always Get Better

Never stop looking for ways to improve

November 4th, 2011

Hopefully when you do web work, you’re not developing code on the same server your users are accessing. Most organizations have at least some kind of separation for their development and production code, but it’s possible to go far further. Separating environments allows you to achieve multiple threads of continuous integration for all kinds of cool.

These normally break down as follows:

Development
Working code copy. Changes made by developers are deployed here so integration and features can be tested. This environment is rapidly updated and contains the most recent version of the application.

Quality Assurance (QA)
Not all companies will have this. Environment for quality assurance; this provides a less frequently changed version of the application which testers can perform checks against. This allows reporting on a common revision so developers know whether particular issues found by testers has already been corrected in the development code.

Staging/Release Candidate
This is the release candidate, and this environment is normally a mirror of the production environment. The staging area contains the “next” version of the application and is used for final stress testing and client/manager approvals before going live.

Production
This is the currently released version of the application, accessible to the client/end users. This version preferably does not change except for during scheduled releases. There may be differences in the production environment but generally it should be the same as the staging environment.

Having separation between the different environments is not tricky, but managing your data environment can be. There are, of course, all kinds of ways to solve the problem.

April 24th, 2011

There is a lot of debate among newcomers to Azure whether to use Azure SQL or file-based Table Storage for data persistence. Azure itself does not make the differences very clear, so let’s take a closer look at each option and try to understand the implications of each.

Azure SQL
Azure SQL is very similar to SQL Server Express. It is meant as a replacement for SQL Server Standard and Enterprise, but the feature set is not there yet. Reporting services, in particular, are in community technology preview (CTP) phase and at the time of writing are not ready for prime time.

Programmatically, Azure SQL is very similar to SQL in existing web applications; any ODBC classes you already wrote will work directly on Azure with no changes.

Size and cost are the major limitations with Azure SQL in its current incarnation. The largest supported data size is 50GB which runs for $499/month. Any databases larger than 50GB would need to be split across multiple instances – this requires knowledge of sharing at the application level and is not easy surgery to perform.

Table Storage
Table storage uses a key-value pair to retrieve data stored on Azure’s disk system, similar to the way memcache and MySQL work together to provide the requested data at fast speeds. Each storage container supports 100TB of data at incredibly cheap ($0.15/GB) rates.

Working with table storage involves accessing them directly from your application differently than you may be accustomed to with SQL. Going all-in ties you to the Azure platform – which is probably not a problem if you’re already developing for Azure as you will likely be trying to squeeze every ounce of performance out of all areas of the platform anyway.

Table storage does not support foreign key references, joins, or any of the other SQL-stuff we usually use. It is up to the programmer to compensate by making wide de-normalized tables and build their lists in memory. If you’re already building clustered applications, this is not a new design pattern as developers typically want to cache data in this manner.

Besides the larger space limits, table storage affords us automatic failover. Microsoft’s SLA guarantees we will always be able to access the data, and this is accomplished by replicating everything across at least three nodes. Compared to self-managing replication and failover with the SQL service, this is a huge advantage as it keeps the complexity out of our hands.

Difference in Evolution
If Azure SQL seems somewhat stunted compared to Table Storage, it’s not an accident: it is a newcomer who was not planned during the original construction of the Azure platform. Microsoft carefully considered the high-availabilty trends used for application development and found that the NoSQL way would most easily scale to their platform. Developer outrage prompted the company to develop Azure SQL, and its service offerings are improving rapidly.

Depending on your storage needs, your course of action may be to store as much data for cheap in Table Storage, and use SQL to index everything. Searching the SQL database will be incredibly fast, and can be done in parallel with any loads against persistent tables – everybody wins.

April 20th, 2011

Maintaining database schemas across development environments (especially in teams) and in production can be a real nightmare. Fortunately there are a number of solutions which make database management easier.

Migrations
This can be done manually or automatically. As database changes are made by developers, scripts are generated which can be run against a master database to bring it in line with the developer’s version. The most basic way to accomplish this is by writing a script manually, but frameworks like Django and Rails have built-in migration tools which manage this process. Rails in particular allows developers to move back and forth between snapshots of database schemas.

Evolutions
Evolutionary systems detect database schema changes against program code definitions. As of April 2011, Play Framework supports Evolutions.

Schema Versions
Microsoft SQL Server supports schema versions; wherein the underlying data remains the same, but multiple versions of the database schema rest on top and can be accessed simultaneously. This keeps older versions of the application or supporting clients working with the existing data set.

Keep Tracking…
Managing database changes can be a challenge for organizations of any size. The correct tool depends on a wide range of factors including your project size, number of team members, release schedule.

What kind of tools and processes do you use to manage database changes?

April 7th, 2011

I am seeing an alarming trend on my beloved Facebook. Several of my friends (ok, I haven’t really kept tabs on them for years) have become “social media experts”. You can tell who is pushing at this stuff because they start tweeting dozens of times per hour, washing out all relevant contact from your home feed. They start using @reply and #hashtags and linking to other “social media experts” blog postings about the importance of Social Media and oh-goodness-your-company-doesn’t-understand-this-like-I-do-but-I-guarantee-results-for-you!

Honestly, this kind of behaviour has become textbook newbie behaviour.

The Buzz Bin has put together a list of ways to vet would-be social media experts.

I swear if I ever see ‘social media guru’ on someone’s resume, I will not hire them.

October 18th, 2010

Here’s a thought: is it possible for a computer programmer to work part-time? It’s a serious question because programming is not like other trades – once you build a house, for example, it’s built; there is no going back and re-pouring the foundation and moving to make it a better, more efficient house. When a programmer is given a problem to solve, they can continue to improve, optimize and re-factor their solution nearly indefinitely.

In reality, software engineering is probably much more similar to writing than to engineering, despite what the academics may say. In theory the projects we build are finite, have measurable outcomes and test cases, and at the end of it the either “work properly” or they don’t. They can be planned to the most minute of details. But then you have to program them – a skilled programmer can make the pieces fit together like clockwork while a poor programmer can ruin the outcome.

Seeing a project through to fruition takes patience, time, communication and commitment. It’s nice to say the documentation is complete and the design is frozen but that ignores the human aspect of the business. As time passes, the client will think of new ways to integrate the software we write into their business – suggesting changes that might improve the quality and value of our code. If we are properly aware of our limits and open to suggestion this need not be scope creep.

So – can it be done part time?

October 10th, 2009
Get Yourself Out of Debt
Creative Commons License photo credit: faungg

When creating reports that are calculation-heavy, it’s tempting to create functions like ‘calculatePercent()’ or ‘calculatedMedian()’ so the correct numbers are available on demand.

Sounds good and convenient, but what happens when you have 100 different calculations to make across 50,000 data records? Each report will take 5 million passes to generate. That could take a long time especially if there are multiple reports being generated.

DRY – Don’t Repeat Yourself

Fortunately, the solution is straightforward. Rather than passing through those 50,000 records 100 times (once for each percentage needed), create an array for your values and calculate ALL of them in one shot. Then, just have calculatePercent() and calculateMedian() call from that array. Sounds simple, and it is as the pseudocode below shows:


for each record:
for each value:
valueList[value].append( record[value] )

July 12th, 2009

You can still see ghosts of the traditional “waterfall” method of software development in modern agile practices. The traditional model involved long periods of planning followed by development and extended maintenance periods – ideal for long-lived systems (I’m shuddering and thinking of COBOL apps running on mainframes).

With today’s rapidly-evolving platforms and business’ intolerance for risk, developers are called upon to deliver solutions faster on changing hardware and software. The focus has shifted toward quick development cycles and constant integration.

At the basic level, the process is the same: plan, build, deploy.

A full understanding of the traditional “waterfall” software development lifecycle (SDLC) can help any programming communicate more clearly with project managers or clients who are more inclined to understand projects in these terms.

Phase 1: Planning (Logic)

The planning phase of the SDLC involves communicating with the project’s key stakeholders in order to understand the project’s requirements. What are the goals of the project, and what are the expected costs?

At this stage there is no program code involved, nor is there discussion of any particular programming language or framework. The goal is to understand what the new software will do and why; not how.

The planning phase is also the time to assess other possible solutions that could meet the client’s recommendation. This is where most analysts fail – rather than let their project stop at this point, many organizations will endeavour to push their own solution. Try to ignore the dollar signs – if you can meet your clients needs by integrating an existing solution rather than developing something new, you will make them happier because they save money and end up with a product that is completely within their best interests.

Phase 2: Design

The design phase brings us closer to writing code, but we still haven’t opened our IDE yet. At this stage our job is to create the software on paper based upon the requirements we came up with in the previous step.

Many clients feel like they are in over their head when your design starts taking form, but you can’t let them off the hook. You need to take the time to explain your design and make sure the client is fully aware and in agreement with what you are doing. Teach them how to communicate with you; learn the terms they use so you can speak their language.

Phase 3: Implementation

When we start programming, this is the part we envisioned ourselves doing. In reality, this is the part we do the least (assuming we did our job right in phases 1 and 2). We’re talking about getting down and dirt with raw code.

Phase 4: Maintenance

This is the most expensive part of the project – keeping the software running. If you’re lucky you will be gone after phase 3 – if your successor (the maintenance programmer) is lucky you will have done a thorough job of your documentation in all of the previous phases.

Maintenance deserves special thought because it occurs over time, so it gets absorbed as an ongoing cost to the business. It can be hard to justify spending a lot up-front to develop a new system when the existing one “already works, and costs less”. Always weigh the ongoing costs of developing and supporting features for an aging system versus performance gains and optimizations possible with new software.

Sometimes it makes sense to keep existing software in operation; sometimes businesses hold onto decaying systems far too long. There will always be a point where the newer system costs more to operate than the old would have cost for the same stretch of time; however, the total cost of ownership – satisfaction, new features, bug fixes – needs to be considered, not just the cost of implementing the new system.