Always Get Better

Posts Tagged ‘planning’

Case-insensitive string comparison in PHP

Sunday, September 29th, 2013

This is a common situation: I needed to compare two strings with unknown capitalization – “country” versus “Country”. Since these words should be considered equal even though the second “Country” has a capital “C”, we can’t do a straight comparison on the two – we need to do a case-insensitive comparison.

Option 1: strcasecmp
Whenever possible developers should try to use built-in functions which are compiled code and (in general) run much faster than anything you could write. PHP has strcasecmp to case-insentively compare two strings. Sounds like a perfect match!


if ( strcasecmp('country','Country') != 0 ) {
// We have a match!
}

Option 2: strtolower
Always read the documentation, but draw your own conclusions. One commentator in the PHP documentation suggested developers never use strcasecmp, and use strtolower with regular equality like this:

if ( strtolower('country') === strtolower('Country') ) {
// We have a match
}

Test the Speed
Both methods accomplish the same thing, but do we really want to skip using strcasecmp()? Which is the better option?
I wrote this short script to run each option for 10 seconds and see which is faster:

And the results:

strtolower: Done 18440869 cycles in 10 seconds
strcasecmp: Done 22187773 cycles in 10 seconds

So strcasecmp has the edge speed-wise, but not so huge that I would care to favour one over the other.

Apparantly strcasecmp does not support multi-byte (e.g. Unicode) characters, but I haven’t tested this. Presumably that would give strtolower an advantage over projects dealing with non-English input, however that is not the case at all in my particular use case so I did not explore this avenue any further. I also didn’t try against non-ascii characters, such as latin accents; including those would be an improvement on this test.

Multiple Development Environments

Friday, November 4th, 2011

Hopefully when you do web work, you’re not developing code on the same server your users are accessing. Most organizations have at least some kind of separation for their development and production code, but it’s possible to go far further. Separating environments allows you to achieve multiple threads of continuous integration for all kinds of cool.

These normally break down as follows:

Development
Working code copy. Changes made by developers are deployed here so integration and features can be tested. This environment is rapidly updated and contains the most recent version of the application.

Quality Assurance (QA)
Not all companies will have this. Environment for quality assurance; this provides a less frequently changed version of the application which testers can perform checks against. This allows reporting on a common revision so developers know whether particular issues found by testers has already been corrected in the development code.

Staging/Release Candidate
This is the release candidate, and this environment is normally a mirror of the production environment. The staging area contains the “next” version of the application and is used for final stress testing and client/manager approvals before going live.

Production
This is the currently released version of the application, accessible to the client/end users. This version preferably does not change except for during scheduled releases. There may be differences in the production environment but generally it should be the same as the staging environment.

Having separation between the different environments is not tricky, but managing your data environment can be. There are, of course, all kinds of ways to solve the problem.

Azure Table Storage vs Azure SQL

Sunday, April 24th, 2011

There is a lot of debate among newcomers to Azure whether to use Azure SQL or file-based Table Storage for data persistence. Azure itself does not make the differences very clear, so let’s take a closer look at each option and try to understand the implications of each.

Azure SQL
Azure SQL is very similar to SQL Server Express. It is meant as a replacement for SQL Server Standard and Enterprise, but the feature set is not there yet. Reporting services, in particular, are in community technology preview (CTP) phase and at the time of writing are not ready for prime time.

Programmatically, Azure SQL is very similar to SQL in existing web applications; any ODBC classes you already wrote will work directly on Azure with no changes.

Size and cost are the major limitations with Azure SQL in its current incarnation. The largest supported data size is 50GB which runs for $499/month. Any databases larger than 50GB would need to be split across multiple instances – this requires knowledge of sharing at the application level and is not easy surgery to perform.

Table Storage
Table storage uses a key-value pair to retrieve data stored on Azure’s disk system, similar to the way memcache and MySQL work together to provide the requested data at fast speeds. Each storage container supports 100TB of data at incredibly cheap ($0.15/GB) rates.

Working with table storage involves accessing them directly from your application differently than you may be accustomed to with SQL. Going all-in ties you to the Azure platform – which is probably not a problem if you’re already developing for Azure as you will likely be trying to squeeze every ounce of performance out of all areas of the platform anyway.

Table storage does not support foreign key references, joins, or any of the other SQL-stuff we usually use. It is up to the programmer to compensate by making wide de-normalized tables and build their lists in memory. If you’re already building clustered applications, this is not a new design pattern as developers typically want to cache data in this manner.

Besides the larger space limits, table storage affords us automatic failover. Microsoft’s SLA guarantees we will always be able to access the data, and this is accomplished by replicating everything across at least three nodes. Compared to self-managing replication and failover with the SQL service, this is a huge advantage as it keeps the complexity out of our hands.

Difference in Evolution
If Azure SQL seems somewhat stunted compared to Table Storage, it’s not an accident: it is a newcomer who was not planned during the original construction of the Azure platform. Microsoft carefully considered the high-availabilty trends used for application development and found that the NoSQL way would most easily scale to their platform. Developer outrage prompted the company to develop Azure SQL, and its service offerings are improving rapidly.

Depending on your storage needs, your course of action may be to store as much data for cheap in Table Storage, and use SQL to index everything. Searching the SQL database will be incredibly fast, and can be done in parallel with any loads against persistent tables – everybody wins.

Database Migrations

Wednesday, April 20th, 2011

Maintaining database schemas across development environments (especially in teams) and in production can be a real nightmare. Fortunately there are a number of solutions which make database management easier.

Migrations
This can be done manually or automatically. As database changes are made by developers, scripts are generated which can be run against a master database to bring it in line with the developer’s version. The most basic way to accomplish this is by writing a script manually, but frameworks like Django and Rails have built-in migration tools which manage this process. Rails in particular allows developers to move back and forth between snapshots of database schemas.

Evolutions
Evolutionary systems detect database schema changes against program code definitions. As of April 2011, Play Framework supports Evolutions.

Schema Versions
Microsoft SQL Server supports schema versions; wherein the underlying data remains the same, but multiple versions of the database schema rest on top and can be accessed simultaneously. This keeps older versions of the application or supporting clients working with the existing data set.

Keep Tracking…
Managing database changes can be a challenge for organizations of any size. The correct tool depends on a wide range of factors including your project size, number of team members, release schedule.

What kind of tools and processes do you use to manage database changes?

How to Hire a Social Media Expert

Thursday, April 7th, 2011

I am seeing an alarming trend on my beloved Facebook. Several of my friends (ok, I haven’t really kept tabs on them for years) have become “social media experts”. You can tell who is pushing at this stuff because they start tweeting dozens of times per hour, washing out all relevant contact from your home feed. They start using @reply and #hashtags and linking to other “social media experts” blog postings about the importance of Social Media and oh-goodness-your-company-doesn’t-understand-this-like-I-do-but-I-guarantee-results-for-you!

Honestly, this kind of behaviour has become textbook newbie behaviour.

The Buzz Bin has put together a list of ways to vet would-be social media experts.

I swear if I ever see ‘social media guru’ on someone’s resume, I will not hire them.

Part-Time Programming Work

Monday, October 18th, 2010

Here’s a thought: is it possible for a computer programmer to work part-time? It’s a serious question because programming is not like other trades – once you build a house, for example, it’s built; there is no going back and re-pouring the foundation and moving to make it a better, more efficient house. When a programmer is given a problem to solve, they can continue to improve, optimize and re-factor their solution nearly indefinitely.

In reality, software engineering is probably much more similar to writing than to engineering, despite what the academics may say. In theory the projects we build are finite, have measurable outcomes and test cases, and at the end of it the either “work properly” or they don’t. They can be planned to the most minute of details. But then you have to program them – a skilled programmer can make the pieces fit together like clockwork while a poor programmer can ruin the outcome.

Seeing a project through to fruition takes patience, time, communication and commitment. It’s nice to say the documentation is complete and the design is frozen but that ignores the human aspect of the business. As time passes, the client will think of new ways to integrate the software we write into their business – suggesting changes that might improve the quality and value of our code. If we are properly aware of our limits and open to suggestion this need not be scope creep.

So – can it be done part time?

Speeding up Report Calculations

Saturday, October 10th, 2009
Get Yourself Out of Debt
Creative Commons License photo credit: faungg

When creating reports that are calculation-heavy, it’s tempting to create functions like ‘calculatePercent()’ or ‘calculatedMedian()’ so the correct numbers are available on demand.

Sounds good and convenient, but what happens when you have 100 different calculations to make across 50,000 data records? Each report will take 5 million passes to generate. That could take a long time especially if there are multiple reports being generated.

DRY – Don’t Repeat Yourself

Fortunately, the solution is straightforward. Rather than passing through those 50,000 records 100 times (once for each percentage needed), create an array for your values and calculate ALL of them in one shot. Then, just have calculatePercent() and calculateMedian() call from that array. Sounds simple, and it is as the pseudocode below shows:


for each record:
for each value:
valueList[value].append( record[value] )