Always Get Better

Posts Tagged ‘performance’

Case-insensitive string comparison in PHP

Sunday, September 29th, 2013

This is a common situation: I needed to compare two strings with unknown capitalization – “country” versus “Country”. Since these words should be considered equal even though the second “Country” has a capital “C”, we can’t do a straight comparison on the two – we need to do a case-insensitive comparison.

Option 1: strcasecmp
Whenever possible developers should try to use built-in functions which are compiled code and (in general) run much faster than anything you could write. PHP has strcasecmp to case-insentively compare two strings. Sounds like a perfect match!


if ( strcasecmp('country','Country') != 0 ) {
// We have a match!
}

Option 2: strtolower
Always read the documentation, but draw your own conclusions. One commentator in the PHP documentation suggested developers never use strcasecmp, and use strtolower with regular equality like this:

if ( strtolower('country') === strtolower('Country') ) {
// We have a match
}

Test the Speed
Both methods accomplish the same thing, but do we really want to skip using strcasecmp()? Which is the better option?
I wrote this short script to run each option for 10 seconds and see which is faster:

And the results:

strtolower: Done 18440869 cycles in 10 seconds
strcasecmp: Done 22187773 cycles in 10 seconds

So strcasecmp has the edge speed-wise, but not so huge that I would care to favour one over the other.

Apparantly strcasecmp does not support multi-byte (e.g. Unicode) characters, but I haven’t tested this. Presumably that would give strtolower an advantage over projects dealing with non-English input, however that is not the case at all in my particular use case so I did not explore this avenue any further. I also didn’t try against non-ascii characters, such as latin accents; including those would be an improvement on this test.

Observations From Mobile Development

Monday, September 10th, 2012

With just a single mobile release under my belt now, I’m hardly what you might call an expert on the subject. What I can say for certain is the past year has been an eye opener in terms of understanding the capabilities and limitations of mobile platforms in general.

So having gone from “reading about” to “doing” mobile development, these are some of the “aha” moments I’ve had:

Design for Mobile First
The biggest revelation: Even if you’re not setting out to develop a mobile application, your design absolutely must start from the point of view of a handheld.

Think about it for a second – how much information can you fit on a 3.5″ screen? Not much! So when you design for that form factor you have to go through a savage trimming exercise – everything from type, to layout, to navigation must communicate their intent in as tiny a space as possible. In other words, there’s no avoiding the need to effectively communicate your message.

When you build all of your applications this way, mobile or not, it’s hard not to come up with a user experience that has laser focus and goes straight for the mark. Once you have a bare minimum experience, now you can augment it with additional navigation and information (including advertising) for larger form factors like desktop and tablets.

Don’t Fret the Framework
Just as in web development, frameworks are powerful tools that come with sometimes painful trade-offs. Especially when you’re getting started, don’t worry about which framework you choose – just pick the one that caters most to the style of development you’re already familiar with and jump in. If that means Adobe Air for ActionScript, cool. If it means PhoneGap for JavaScript, great.

Most of the new devices coming onto the market have more than enough memory and processing horsepower to handle the massive extra overhead incurred through cross-platform development tools. If you’re starting a new project or iterating a prototype, don’t hesitate to jump on board a tool that will get you to a product faster. This is one of those areas where the short term gain is worth the longer term pain…

Native Wins
We’ve known since the 80s, when developers had to release to a boatload of PC platforms – IBM, Commodore, Amiga, Tandy, etc – that software written directly for a particular platform outperforms and outsells similar software written for a generic platform and ported across to others. The same idea is definitely the case now, even though our cross-platform tools are far more advanced, and our globally-usable app much higher in quality that what we could produce 30 years ago.

Some of the compelling reasons why you would want to take on the expense of building, testing and maintaining you app natively:

  • The UI of your application will integrate seamlessly into the underlying operating system – iOS widgets look like they belong, Android layouts are laid out consistently compared to other applications
  • Raw speed – you don’t need to go through someone else’s API shim to get at the underlying operating system features, you don’t have to develop custom native code since all the code is native; all CPU cycles are devoted to your application, resulting in much higher performance, particularly for graphic-intensive applications
  • Operating system features – each mobile operating system has its own paradigm and set of best practices which cross-platform tools gloss over to give you as a developer a consistent response. So your application misses the subtleties of the user’s hardware experience – for example Android uses Activities as its interaction model, but the Adobe Air framework squashes that instead of forcing developers to program in an Activity-centric way

In other words, cross-platform tools exist in order to give developers the best experience, not to give the user the best experience. Your customer doesn’t care if your app is available on Android, Windows, iPhone, Playbook and WebOS if all they have is an iPhone.

I believe cross-platform tools are the best way to get your project off the ground and usable fast, but right from the beginning you need to be thinking about converting your application to native code in order to optimize the experience for your customers.

Market Fragmentation
I bought an Android phone and have been enjoying developing for it. But I don’t think I would enjoy developing for Android at large because of the massive amount of devices and form factors I would need to support. This is where Apple has an edge – although the learning curve for Objective-C is higher, once I have an iOS application, I know it will run on an iPod Touch, and iPhone or and iPad. Not only that, but my guess is people are more likely to want to spend small amounts of money on app purchases, since they’ve been trained to do so from years of iTunes.

Backward Compatibility
Moving to the web was a huge advantage in terms of software support because if you ever found a bug in your program you could patch it and release it without affecting any of your users. No one has to upgrade a web page.

This isn’t true of mobile applications – ignoring mobile web applications – once someone downloads your app they may be slow to upgrade to newer verions, or they may never upgrade at all. This means any bugs that get released with your bundle are going to remain in the wild forever. If you have any kind of server-based presence, your server code needs to handle requests from all of those old app versions – so you need to make sure you get it right, and have good filtering and upgrade mechanisms in place.

Choosing a Platform
One thing that held me back from diving into mobile development was my hesitation to start. This is totally my fault – instead of just programming for WebOS when I had the Palm Pre, I thought it would be better/more accessible to use a more open JavaScript toolset so I could deploy my app to other phones. But really, what would have been the point? I only had a Palm Pre to run my software on and I definitely wasn’t going to buy new hardware to test other versions. Instead of getting locked in analysis paralysis I should have just started programming for the Pre right away and transferred those skills to a more mainstream platform later.

So if you don’t have a smartphone, go get one – it will change your life, or at least the way you interact with your phone. Then start building apps for it. That’s all it takes to get into the game. Don’t wait another second.

log4php Performance

Wednesday, February 29th, 2012

We can take for granted that whenever we introduce a library or framework to our application, we incur an overhead cost. The cost varies depending on what we’re trying to do, but we generally accept that the lost performance is worth it for the increased maintainability, functionality or ease of use.

For many teams, logging is something that gets thrown in the mix at the last minute rather than through of all the way through. That’s a shame because a well-implemented logging solution can make the difference between understanding what is going on in your system and having to guess by looking at the code. It needs to be lightweight enough that the overall performance is not affected, but feature-rich enough that important issues are surfaced properly.

Java programmers have had log4j for a long time, and log4net is a similarly mature solution in the .NET world. I’ve been watching log4php for awhile and now that it has escaped the Apache Incubator it is impressively full-featured and fast. But how much do all its features cost?

Benchmarks
I’ll be looking into different options as I go, but let’s consider a very basic case – you append all of your events to a text file. I’ve created a configuration that ignores all ‘trace’ and ‘debug’ events so only events with a severity of ‘INFO’ or above are covered.

In 5 seconds, this is what I saw:

Test Iterations
BASIC (direct PHP) 45,421
INFO STATIC 45,383
INFO DYNAMIC 41,847
INFO STATIC (no check) 51,801
INFO DYNAMIC (no check) 47,756
TRACE STATIC 310,255
TRACE DYNAMIC 213,554
TRACE STATIC (no check) 271,043
TRACE DYNAMIC (no check) 196,653

Tests
What is all that? There are two ways to initialize the logger class – statically, meaning declared once and used again and again; and dynamically, meaning declared each time. With log4X, we typically perform a log level check first, for example isTraceEnabled() to determine whether to proceed with the actual logging work.

Results
I was surprised by how little log4php actually lost in terms of speed versus raw PHP. The authors have clearly done a thorough job of optimizing their library because it runs at 90% of the speed of a direct access.

I’ve always intuitively used loggers as static variables – initialize once and use over and over. This seems to be the right way by a huge margin.

Checking for the log level before appending to the log was a big win for the INFO messages, which are always logged to the file due to the configuration settings. The intended use is to allow programmers to sprinkle their code with debug statements which don’t get processed – and therefore slow down – the production code. I would be very happen with this in my project. In the INFO metrics, the check slowed things down a bit – explained because the actual logging function performs the same check – so we are taking a double hit. But wait, there is a good reason…

The TRACE metric is interesting – these are events which are NOT appended to the log. In that case, when the check is not performed, we pass through the code more times. When the check is performed, the code has to execute deeper on the call stack before it figures out we aren’t doing any actual logging, taking more time.

Conclusion
If you know you will be logging every single event, don’t do a check. Otherwise do the check – it will save a lot of wasted cycles.

Multiple Development Environments

Friday, November 4th, 2011

Hopefully when you do web work, you’re not developing code on the same server your users are accessing. Most organizations have at least some kind of separation for their development and production code, but it’s possible to go far further. Separating environments allows you to achieve multiple threads of continuous integration for all kinds of cool.

These normally break down as follows:

Development
Working code copy. Changes made by developers are deployed here so integration and features can be tested. This environment is rapidly updated and contains the most recent version of the application.

Quality Assurance (QA)
Not all companies will have this. Environment for quality assurance; this provides a less frequently changed version of the application which testers can perform checks against. This allows reporting on a common revision so developers know whether particular issues found by testers has already been corrected in the development code.

Staging/Release Candidate
This is the release candidate, and this environment is normally a mirror of the production environment. The staging area contains the “next” version of the application and is used for final stress testing and client/manager approvals before going live.

Production
This is the currently released version of the application, accessible to the client/end users. This version preferably does not change except for during scheduled releases. There may be differences in the production environment but generally it should be the same as the staging environment.

Having separation between the different environments is not tricky, but managing your data environment can be. There are, of course, all kinds of ways to solve the problem.

Accelerate Your Site with a Content Delivery Network

Thursday, April 21st, 2011

The best way to keep visitors engaged in your website is by delivering your experience in as little time as possible. The average visitor will only stick around for a few seconds, so it is important to get them interacting with your content fast. The first thing to check for, of course, is any bottlenecks in the initial page generation. Once the web page is being generated quickly, we can turn our attention to the next biggest culprit: the connection to your client.

Downloading files directly from a web server is costly, even if you’re using an efficient server like nginx for static files.

A content delivery network (CDN) can help speed up the process by storing your content in data centres around the world so they get served to your visitors from locations that are physically close to them. This results in fewer network hops which makes the files download faster, and reduces the overall load on your web server so you can focus on doing more interesting dynamic application stuff.

At one point, CDN services were only available to companies with deep pockets and huge websites, but these days anyone can set up and use an inexpensive service with their regular hosting provider.

Check with your host to see if they offer a content delivery solution. The two providers I use for my blog, Media Temple and Rackspace both have excellent services. If you are using a WordPress site, check out W3 Total Cache, which provides an all-in-one package for managing your files and optimizing the overall speed of your site.

Tracking Down Website Speed Problems

Tuesday, April 19th, 2011
Tortoise
Creative Commons License photo credit: Eric Kilby

Why is my website loading so slowly?!?

There are a few common culprits behind website speed issues. When diagnosing problems, the best bet is to start at the worst performers and move up. Some suggestions, in order from slowest to fastest, are:

1. Internet Traffic
If your web page is downloading anything over the internet during each page request, stop right now. This is the most expensive operation you can perform. Example: Downloading a photo from Flickr and loading it into memory in order to determine its width and height dimensions.

2. Network Traffic
Local network traffic is generally very fast, but still involves transmitting information outside your computer. In some cases, such as web clusters with a shared session cache, the network performance cost is worth it for the overall application.

3. Database
Databases are fast, particularly when the data you need is already stored in a memory cache – which you generally can’t control. When paired with a key-value memory store like memcache, the majority of your database calls can come straight from memory.

4. Disk I/O
Even with the incredible access times found in today’s hard drives, reading and writing from the disk is an expensive operation (and why databases lose points, except for their memory caching abilities). Sometimes reading from disk is the better choice – YMMV.

5. Script Caching
Implement a tool like xcache (PHP). This will keep your code in binary bytecode format which is much faster to execute since it doesn’t have to be re-processed by the web server.

Using Memcache with MySQL

Monday, April 18th, 2011

MySQL+Memcache have been bedfellows for awhile and at this point are the de facto standard for highly-available, scalable websites. Even with other SQL and NoSQL solutions starting to become popular, this pair holds on as the winner for LAMP programmers. Is the complexity of working with this technology pair worth the investment?

Read vs Write
Traditional relational databases place the burden of computation on read operations. In mainframe environments with powerful servers and relatively few users, this made sense. Database normalization prevents redundancy, and data can be joined together when needed to produce the desired results.

In a web application with 1,000,000 users, the normalized transactional model does not perform. Generally speaking it is way faster to make two queries to a small subset of data rather than attempt expensive joins in a client-facing web site.

Enter memcache: by storing the result of our SQL queries in memory, we improve the speed of subsequent requests by pulling the data from memory as well as avoiding a hit to the database entirely, freeing it to process urgent or real-time requests.

Anatomy of an SQL Query
When we run an SQL query, we are actually asking the server to perform a lot of work:

  1. Break down the query into object references: The DBMS needs to understand which tables, columns, and filters you are using by tokenizing your SQL (by breaking out names from the keywords like SELECT, FROM, WHERE).
  2. Identify which indexes (if any) are most relevant to the data: This is harder with more complex queries which must be broken down or which depend on outer tables for their values.
  3. Read the source tables from the hard drive: Most DBMS implementations include some kind of memory caching which partially avoids this expensive read step, but some disk IO is a normal part of operation
  4. Join Columns: If we specify a join, especially a LEFT or RIGHT join, the DBMS has to create a pseudo table from the joined sources in memory before it can do any additional processing.
  5. Sub-selects: Any sub-select statements need to be processed. Depending on how the statement was written, this needs to be done for every row returned in the result set.
  6. Filter and sort: Anything in the ‘WHERE’ clause needs to be filtered out of the result set. This is where we are going to start seeing performance improvements by narrowing our result set.
  7. Aggregation: Once the database has its final result set it can do all of the aggregation we ask of it, both calculations and grouping

As we can imagine, this can be a time-consuming process. If it is repeated thousands of times in a short period, we will see significant performance loss.

Anatomy of a Cache Request
By contrast, when we perform a request for data from a cache like memcache, we do this:

  1. Check the index for the presence of the supplied key
  2. If a result exists, return it

In the case of memcache, this happens entirely in memory with no hit to the database whatsoever, resulting in a blindingly fast result set.

Speed Over Persistance
The reason memcache and MySQL work well as a pair is because they provide the tools needed to have reliable, persistent transactional storage (through MySQL) along with lightning fast data retrieval (through Memcache) especially for rarely-changing results.