December 17, 2013

Big Data and the Changing Role of DBAs

**Please note: This blog will be updated on January 7, 2014.**

In our industry it seems people are being asked to do more with less, while our employers focus on the present and near-term. Businesses aren't planning for the long-term when it comes to making employees productive and ensuring that the data being used to make critical business decisions is of the highest quality.

In the early 1990s, businesses invested in people and products to ensure that business rules were reflected in the logical database design. But it didn't end there. The physical database would also support this design as well as performing applications. There was no one person who specialized in these different disciplines. We relied on data administrators (DAs) as well as DBAs.

The DA understood the business and the rules governing the data used to run the business. Using tools such as the CA ERwin data modeler, the DA would create logical models that showed how the entities related to one another and enforced the business rules within the definition of the data. The DBA would then take this model and create the physical tables in the given RDBMS (DB2 for z/OS or DB2 for LUW).

However, after years of cost-cutting, the DA's role seems to have been disbursed to DBAs as well as business analysts. As businesses sprint toward big data and the use of low-cost Hadoop clusters, traditional DBAs are being asked to do much more. Increasingly, the DBA is being replaced with the BDA -- the big data administrator, aka the Hadoop cluster administrator.

The BDA's responsibilities combine those of traditional system administrators (install and configure software), DBAs (create and tune the database), developers (write SQL and programs) and operators (monitor system problems). You don't see the role of DA mentioned in job descriptions these days, but rest assured, the need to understand the data elements, the entities and where data is sourced and targeted is more critical than it's ever been.

Here's a job description I found for a Hadoop cluster administrator:

            What a Day in the Office Might Look Like

            You check your self-made dashboard of statistics from all sorts of data sources (database clusters) and processes to see if they are still healthy and have enough space to grow. Of course, on most days these are fine, but today you find an issue with the processing of data towards the Hadoop cluster. You discover that another process is holding a file, so you kill it and get the regular processing    routine up and running again. Of course, this is not where the job ends. Together with a few Business Intelligence developers you find a way to prevent this from happening in the future and implement this in the script. Later on, you come up with a possible (super-fast) technical solution to use real data in reports. These  should be immediately sent to commercial teams upon encountering certain           triggers in the data ...

In the past, employing separate people for each discipline allowed them to become expert enough at their responsibilities to address problems quickly. The BDA, however, will be a generalist, and as such, the BDA will need more time, research and testing to ensure that a quality product is delivered to the business.

As requests for data from the business lines inevitably increase, BDAs will find themselves overloaded, and the quality and speed at which data can be provided to the business will suffer. In response, vendors will rush to market with tools to help BDAs make higher-quality data available more quickly so that those critical business decisions can be made.

For years, businesses have tried to marginalize DAs for short-term cost savings. However, I believe that any operation that lacks DA skills will cost the business more in the long-run, because data quality due to lack of business knowledge will be sacrificed.

Are you a traditional DBA who's had to transition to the BDA role? Have you seen degrading quality in data due to the lack of data administrators? Please share your experiences in Comments.