Always Get Better

Archive for the ‘SQL’ Category

MySQL Pulled From Mac OS X Lion

Friday, August 5th, 2011

Apple-based LAMP developers be warned: the new version of OS X does not include MySQL, which was formerly part of the developer tools shipped with the operating system. In its place look for deliciously Oracle-free PostgreSQL. Of course, developers can and will continue to download MySQL and install it themselves, but the out-of-box experience moving forward will be with PostgreSQL.

Although it is still an extremely popular database, Oracle’s presence in the MySQL world has put a chill over business users considering using the product as the backbone of their data solutions. Other databases with similar purposes exists but none have the deep community boasted by PostgreSQL.

Database Migrations

Wednesday, April 20th, 2011

Maintaining database schemas across development environments (especially in teams) and in production can be a real nightmare. Fortunately there are a number of solutions which make database management easier.

This can be done manually or automatically. As database changes are made by developers, scripts are generated which can be run against a master database to bring it in line with the developer’s version. The most basic way to accomplish this is by writing a script manually, but frameworks like Django and Rails have built-in migration tools which manage this process. Rails in particular allows developers to move back and forth between snapshots of database schemas.

Evolutionary systems detect database schema changes against program code definitions. As of April 2011, Play Framework supports Evolutions.

Schema Versions
Microsoft SQL Server supports schema versions; wherein the underlying data remains the same, but multiple versions of the database schema rest on top and can be accessed simultaneously. This keeps older versions of the application or supporting clients working with the existing data set.

Keep Tracking…
Managing database changes can be a challenge for organizations of any size. The correct tool depends on a wide range of factors including your project size, number of team members, release schedule.

What kind of tools and processes do you use to manage database changes?

Using Memcache with MySQL

Monday, April 18th, 2011

MySQL+Memcache have been bedfellows for awhile and at this point are the de facto standard for highly-available, scalable websites. Even with other SQL and NoSQL solutions starting to become popular, this pair holds on as the winner for LAMP programmers. Is the complexity of working with this technology pair worth the investment?

Read vs Write
Traditional relational databases place the burden of computation on read operations. In mainframe environments with powerful servers and relatively few users, this made sense. Database normalization prevents redundancy, and data can be joined together when needed to produce the desired results.

In a web application with 1,000,000 users, the normalized transactional model does not perform. Generally speaking it is way faster to make two queries to a small subset of data rather than attempt expensive joins in a client-facing web site.

Enter memcache: by storing the result of our SQL queries in memory, we improve the speed of subsequent requests by pulling the data from memory as well as avoiding a hit to the database entirely, freeing it to process urgent or real-time requests.

Anatomy of an SQL Query
When we run an SQL query, we are actually asking the server to perform a lot of work:

  1. Break down the query into object references: The DBMS needs to understand which tables, columns, and filters you are using by tokenizing your SQL (by breaking out names from the keywords like SELECT, FROM, WHERE).
  2. Identify which indexes (if any) are most relevant to the data: This is harder with more complex queries which must be broken down or which depend on outer tables for their values.
  3. Read the source tables from the hard drive: Most DBMS implementations include some kind of memory caching which partially avoids this expensive read step, but some disk IO is a normal part of operation
  4. Join Columns: If we specify a join, especially a LEFT or RIGHT join, the DBMS has to create a pseudo table from the joined sources in memory before it can do any additional processing.
  5. Sub-selects: Any sub-select statements need to be processed. Depending on how the statement was written, this needs to be done for every row returned in the result set.
  6. Filter and sort: Anything in the ‘WHERE’ clause needs to be filtered out of the result set. This is where we are going to start seeing performance improvements by narrowing our result set.
  7. Aggregation: Once the database has its final result set it can do all of the aggregation we ask of it, both calculations and grouping

As we can imagine, this can be a time-consuming process. If it is repeated thousands of times in a short period, we will see significant performance loss.

Anatomy of a Cache Request
By contrast, when we perform a request for data from a cache like memcache, we do this:

  1. Check the index for the presence of the supplied key
  2. If a result exists, return it

In the case of memcache, this happens entirely in memory with no hit to the database whatsoever, resulting in a blindingly fast result set.

Speed Over Persistance
The reason memcache and MySQL work well as a pair is because they provide the tools needed to have reliable, persistent transactional storage (through MySQL) along with lightning fast data retrieval (through Memcache) especially for rarely-changing results.

Drizzle – MySQL for the Cloud

Thursday, March 24th, 2011

Drizzle – a lightweight fork of MySQL – has reached general availability. Drizzle’s design goals are to create a highly performant and module database engine tailored for cloud computing.

Some of the interesting features this database has to offer are:

  • No Views, Triggers, or Stored Procedures – I consider this a huge advantage. Stored Procedures were once a good thing when query optimizers were all but non-existant, but modern database systems perform this task extremely efficiently. Using stored procedures adds a second level of APIs to your application for what I would consider to be a rather dubious potential security benefit.
  • Sharding – The client protocol is designed to decide which database server to target based upon a hashing key. This is similar to how memcached handles its linear horizontal scaling – leaving this calculation to the client is a huge advantage to systems hosting a high number of concurrent visitors
  • Gearman Support – This is a terrific tool for spreading workload across machines. Use Gearman to handle logging in Drizzle, keeping the actual DB server available for database work

How Does Drizzle Compare to Percona?
Percona is a high-performance build of MySQL which promises to offer better performance than “stock” MySQL. It is built from a branch of publicly-available MySQL source code and enhanced by the folks at Percona. Percona maintains the same functionality as “real” MySQL, but attempts to achieve faster speeds.

How Does Drizzle Compare to MariaDB?
MariaDB is a different database engine started by the original creator of MySQL intended to replace MySQL using non-Oracle licensing. Anyone concerned with Oracle’s development practices or plans for the future of MySQL should consider switching to MariaDB. MariaDB implements all of the existing MySQL functionality plus more and provides a drop-in replacement for MySQL.

VISTA: How to fix SQL Server Express Error – CREATE DATABASE permission denied in database ‘master’

Thursday, November 5th, 2009

If you’re using SQL Server Management Studio Express under Windows Vista and see either of these errors:

CREATE DATABASE permission denied in database 'master'


The database [Name] is not accessible. (Microsoft.SqlServer.Express.ObjectExplorer)

Here’s the fix:

  1. Close SQL Server Management Studio Express
  2. Open your start menu and locate that program.
  3. Right-click on the Management Studio and choose ‘Run as Administrator’
  4. Fixed!

I swear the simplest solutions can be the hardest to find – hopefully this saves someone (or my forgetful self!) some aggravation.

Optimize SQL Queries by Using Aliases

Monday, June 30th, 2008

When joining tables in SQL, we often use aliases to shorten table names. Consider this query joining order lines (details) with orders inside the database:

SELECT O.OrderID, FirstName, LastName, ItemName, Quantity
FROM Orders O
INNER JOIN OrderDetails OD
ON O.OrderID = OD.OrderID;

The above may work, but behind the scenes the database’s query analyzer has to associate each of the selected columns with their respective tables. Although this is a fast process, it can be skipped entirely by simply fleshing out the query like this:

SELECT O.OrderID, O.FirstName, O.LastName, OD.ItemName, OD.Quantity
FROM Orders O
INNER JOIN OrderDetails OD
ON O.OrderID = OD.OrderID;

SELECT TOP N in Oracle

Friday, April 11th, 2008

Being used to SQL Server, I get messed up when moving to Oracle. For reference, here are equivalent top N queries in both environments:

SQL Server:
SELECT TOP 10 book_name, price*sales
FROM tblBooks;

SELECT book_name, price*sales
FROM tblBooks
WHERE ROWNUM <= 10; [/source]