The Problem
Every user wants real Ad-hoc reporting and data extracts from their database.
Which means they want limitless data access to their business information without having to wait for it. How else are they to make those all important business decisions in a timely manner?
Asking for more data is not an uncommon request to IT departments. In fact, it happens quite frequently.
Everyone knows;
Data stored in a database is difficult to get out.
Let's rephrase that;
Users can get to the data (physical data connection), with a third party visualization/reporting tool, but will have trouble understanding it.
In a word:
Context: the circumstances that form the setting for an event, statement, or idea, and in terms of which it can be fully understood and assessed.
To use data from a database intelligently you need to understand the context for which it is intended and traditionally only IT knows how to turn database tables into business context.
Not all systems are bad at reporting. Some do it very well. However, most systems do not provide good reporting or if they do, it's never all the data in the database. Somehow there's always restrictions. People running the business know this. IT? Not so much.
Many software systems evolve over time. As they evolve, the database model gets more and more complex. One of the last things that software developers think of when adding features to a system is reporting on the data they just created. This is because they don’t really need it. They're not the ones using the system for the business purpose it was designed.
The IT staff does not run the business. Everyone else does.
There are literally millions of computer systems today that request data from users either by having them enter it manually or import it from some other system. In the end, data is captured and then stored in a database.
There are thousands of ways to store data, however we are specifically referring to systems that store data in a relational database. Data that's accessible using the SQL Query Language. (Microsoft SQL Server, Oracle, Etc.) Primarily an RDBMS.
Data is not as complicated as the IT world would want you to believe. Your business may be complicated, but you know your business. So you probably understand your business data.
And yet, there's still this:
The reports that came with my system are missing a couple of fields I need to run my business. Vendor wants money to change them.
My software vendor does not have an Ad-hoc Reporting tool but says I can just buy a tool like Crystal Reports or Tableau, connect to the database and extract what I want need.
IT made me a View in my database that has 20 fields but today I need a couple of new fields that are not in the View. I can’t get anybody from IT to change it for me. I need to wait.
My BI vendor said they can extract data from any database. I can contract them to work with me to create the reports I need.
IT does not have time to create my new business report now. They will put it on the schedule for a later date. Low priority unless I can get executive approval to override their other priorities.
I was finally able to create the reports I needed with my custom reporting tool. I just got a new update from my software vendor and now all of my reports are failing.
Custom reporting vendors says I can build formulas with their tools but what does ‘Table1.fld1 * Table2.value’ mean anyway?
I created a cool report for me, but my colleague does not have permissions to some of the data. I can’t share it with them without breaking corporate regulations or the report failing with errors.
If any of this sounds like you, you’re not alone. Everybody has a different reason of why reporting from their relational database is difficult. The list above could go on forever.
About Computer Software
All computer software programs today work with the same 3 basic principles. Input, Process, Output. In the end the only thing that matters to the end-user is the output. The reporting.
IT professionals took the user's data in a format that the business owners understood and converted it into a format that can only be understood by IT. The relational database.
De-Mystifying the Relational Database
When building a computer system for a business or industry, whether a custom internal system or a commercial product sold by a vendor, the basic design process is the same.
The data that you have in your business industry (before it was computerized) is analyzed by a Software Engineer or Software Architect. With the Subject Matter Expert (SME), which is the business person (you), the architect designs a way to capture and organize your data for you.
This is usually an input form on the screen. With your help, the Architect breaks the data down into meaningful categories and business data elements.
The business data elements and categories are then separated into what are called Tables that they create in the database. Tables have fields where the data elements are stored. As they do this they create the relationships between the Tables. Hence the name Relational Database.
A Software Engineer is then tasked to write the code that is ultimately the software program that runs on your computer. In the program, the developer creates their representation of your business categories and refers to them as Objects. These Business Objects are not structured the same as your database tables but will more closely resemble your business. Once your data is captured from the input forms, it's validated and then converted from an Object into the relational model and written to the database the way the Architect intended.
Your important business data is safely stored in the database tables.
There are thousands of technical attributes surrounding software development that are outside the scope of this discussion. We will however mention Object Oriented Programming (OOP).
An RDBMS database structure and an Object Oriented structure in OOP are, let’s just say, different. This is referred to as an object-relational impedance mismatch.
Software Developers like OOP so they write OPPy code. They are forced to store and retrieve data in a relational format because of the popularity of SQL and relational databases.
RDBMS is much more conducive to SQL queries for reporting with a simple JOIN than a firm Object Relation.
For the same reasons most of this information is based, it's our belief that the OOP/RDBMS battle will continue for a long time.
The simple truth is that end-users want easy access to their data and don't care about OOP. Currently most systems store data in an RDBMS.
Unfortunately for you, the Architects have converted your data into what we will call database form. They have computerized it and used terms like Normalize, Denormalize, Reference Data, Foreign Key Constraints, Etc.
Even though a Software Engineer may have presented you good Business Objects on the screens in the application, they can be lost when converted and stored in the relational database model.
Data Architects most likely have taken some of your data elements and changed their names. Something you called Foreign Income Tax Credit Amount could be changed to FI_Value. Why? Could be anything from shorter to type, to fits on the screen better or just easier for them to remember.
Data Architects also hate to repeat data because it’s a waste of space and time if it changes. So they will reuse data for multiple purposes but only save it once.
Example:
An example of this would be a person’s name and address.
They may create a single PEOPLE table in the database (Reference Table) to store people data and then refer to the data in that table from multiple places.
You could have a customer that you do business with that have employees (who are people) and vendors you buy materials from (also people) which are all stored in the same table.
“They are distinctly different. Customer Employees and Vendor Contacts are contextually NOT the same thing.”
That doesn't matter to the Data Architects. IT will figure that out later for you with a filter when they write the SQL Query to get your data from the database to put on your reports.
The important thing to remember is that the relational structure of your business data in the database, could be similar, but will most likely be different, than the way you look at your business data. It will definitely be different than the way you entered it into the software program.
By now you may begin see a pattern.
Your original business data was transformed into some weird database form in the database.
Weird for you maybe, but not for the IT professionals. This format is perfectly normal for them. It does however, explain why you need them to get your data back out of the database in a form that you can understand.
Many IT professionals don't see a problem with this model of reporting. Since reporting from a relational database is not recognized as a problem to them, there is no reason to create a solution. When they want the data from the database they simply write a query.
This is analogous to; The fox guarding the hen house.
After all; IT is paid to write the SQL Queries needed to get the data back out of the database aren't they?
The problem isn't the database. The problem is, the business user doesn't have an intelligent and easy way to get the data back out of the database in the same form that it was understood going in. Not without IT's help, that is.
To many, the Structured Query Language (SQL) "is" the solution.
Who has heard this before:
“Just connect to the database, write an SQL Query and your good to go.”
Some power users can write SQL but most end-users are not capable, nor do they want to be. IT departments write queries. That’s why you need them to help you access your data. Besides, even if you know SQL you may not know the data model of the database.
When your database holds tens of thousands of data elements there are billions of possible SQL Query permutations to access your data.
There is no way that IT can create every view of the data ahead of time that users may need to run their business.
As you need new data elements for reporting, you’re forced to contract the IT crew to write queries for you. The business is spoon fed the data as needed. Usually having to pay for it with time or money. Users know that this is not the best solution, but have no choice.
Relational
We have used the word “Relational” a lot so far. Although we understand the basic definition of the word, what does it really mean when we say; Relational Database?
You may have seen an Entity Relationship Diagram (ERD).
If not, that’s okay. It’s like a road-map of your data in a relational database. The ERD image above, depicts the table relationships for the Microsoft SQL sample database, Adventure Works.
If the architect properly created the Foreign Key Constraints (Relationships) in the database between the tables when it was created, then there are software utilities that can be used to create an ERD.
The relationships are the lines between the boxes in the diagram. That’s the good news and very helpful in your mission of accessing your data for reporting.
Then of course there is the bad news; The Adventure Works database is a sample database created by Microsoft.
Sample=Simple.
Your database is probably much larger and much more complex. To add insult to injury, a relational database doesn't need a single Foreign Key Constraint (no known relationships in the database) and yet still be a valid relational database containing thousands of relationships between the tables! It's quite likely that the queries you currently use for reporting don't follow any of the Foreign Key Constraints, even if they did exist. (Which they probably don't.)
In other words, your diagram may have no lines connecting the tables but still be a perfectly valid relational database.
If a road-map was missing the street you live on, that doesn't mean the road is non-existent. The road is there, it just wasn't mapped.
That’s the same with relationships between tables in the database. They can exist even if there is no line (Foreign Key Constraint) on the ERD.
I am not trying to make excuses for the IT industry but there are many reasons a relational database can end up with no relationships. You may think this is bad, but in the long run, it doesn't really matter.
Having no relationships does NOT affect the software system’s ability to access the relational database, because it’s very simple for a Software Engineer to maintain the relationships in the actual software product with total disregard to the database.
Just like you know you can drive down the street you live on even if the street is not on a road map. Software Engineers and query writers can create their own relationships between tables as needed simply by adding their own JOIN statements.
Relationships between tables when writing SQL Queries can be created whenever and however the writer chooses. Whether the results of that data is correct or not is up to the designer of the query. The relational database does not determine the business relationship, the query does.
Usually the relationships follow closely to the business purpose the database was designed for. If it’s an Accounting system then the table names and relationships are probably structured to some likeness of an accounting model but not always. How your database is structured is considered the data model or database schema.
When you use a third party visualization/reporting tool and connect to your database, it's worthless unless somebody has created the business relationship query for you.
Other tools expect you to know what all of this relationship jargon means. You're also expected to know the Structured Query Language to make it all happen!
The problem with accessing your data easily for reporting, is that the IT professionals have taken your business data that you used to understand and wrapped it into a world of technical jargon only they understand. The relational database only accessible by using the Structured Query Language.
So there you have it. This is why reporting data from a relational database has historically been a difficult task.
As you may have already guessed, DSF solves this problem for the Business User..