September 17, 2013

Designing IDAA-Enabled Tables

Recently I wrote about managing IDAA with Data Studio. The marketing message around IDAA is that you don’t need to be a DBA to tune the queries because there are no indexes and queries are fast. While it's true that you don’t have indexes or need to worry about using SQL coding tricks to force different access paths, you can control the physical organization and placement of the data being stored on the Netezza box.

This is important because while in most cases using IDAA can massively reduce query times, in select circumstances, a query that took five hours to process might now take 12 hours. In those cases, you need to modify the physical organization.

When enabling a table to be accelerated, you can specify the distribution keys as well as the organization keys. By default the data will be loaded with random distribution. This is OK at the outset, or when you're only accessing a single table. Believe me, when you're doing an initial proof of concept (POC), you'll be very happy to see what was an 8-hour query complete in just minutes.

The next step is to examine the SQL that is joining tables and organize the data for these tables by the columns being used to join on. With IDAA, the time to conduct these queries can potentially be slashed from minutes to seconds.

The Data Studio help documentation describing IDAA implementation options is excellent. Here's a list of some major headings. In some cases I've added detail text:

            Distribution keys -- There are two methods for distributing table rows to the accelerator worker nodes: random distribution and distribution using a distribution key.

            Random distribution

            Use of a distribution key (hash partitioning) -- By default, random distribution is used to distribute table rows to worker nodes for query processing. That is, all tables are evenly distributed among the existing worker nodes.

            Impact of the distribution key on the query performance

            Impact of distribution key on join performance

            Best practices for selecting distribution keys

            Choosing an organizing key -- Organizing keys can further speed up accelerated queries by reducing the time necessary to scan the disks belonging to a single worker node.


If you have experience tuning queries for IDAA, please share some of your favorite tips in Comments.