Discovering Data : Data Selections

DSF has numerous ways to discover data, however, the Data Wizard is the fastest way to expose the most data in the least amount of time.

First Discovery

During the early stages of discovery, it’s best to start slow and progressively use the other methods to teach DSF about your data.

DSF is a progressive learner, the more methods applied, the more DSF learns about how you use your data. There is no right or wrong way to teach DSF. You can’t break anything by discovering data regardless of the method used.

Each method has its pros and cons. It up to you to adjust the discovery or bend the teachings as you see fit. DSF will store a Snapshot of what is discovered every step of the way. You can always revert to a previous Snapshot or start over altogether.

CONNECTION

The Wizard

When the wizard runs you will see two (2) available options to choose from. Most databases will default to the Normal Option, which is usually the best option to start your first discovery.

Unless you know why you would be selecting the None method, use the default method. This is the best option to get the “first cut” of any database. It’s possible that on some databases this option will not be available. In this case you will be using the None option. Contact Data Selections for help on using this selection.

Database Statistics

This is can be important information to look at before you discover your data. Shown above is a sample wizard screen when connecting to the AdventureWorks database from Microsoft. The Database Statistics at the bottom of your screen will give you a good indication of how long it could take to discover the database.

The sample database shown has 71 Tables, 21 Views and 91 Relationships. The more Views found in the database the better your discovery process will be. Views teach DSF the business context in which you use the data. However, if there are thousands of Views in your database it could take hours for the discovery process to complete. The discovery process can be cancelled at any point. Any data discovered up to the cancellation point will be retained.

Some databases have hundreds of duplicate Views with nothing more than a change in the filter criteria for no contextual business purpose other than maybe a date change. This type of scenario is a waste of time for DSF to learn a hundred times over. It will create hundreds of useless Entities with different names. If your statistics show an extremely large number of Views you may want to consider a different discovery method and not use the default Normal Option (Unless you know that the Views are unique). Contact Data Selections for help if you are not sure how to proceed.

In some instances, it may be better to not use the Data Wizard at all and discover data using one of the more specific Discovery Managers (By View, Table, or SQL Query). These managers allow you to discover data in small blocks at a time to progressively teach DSF about your database.

There is no direct correlation between the number of objects discovered and how many data elements will be available for reporting. In general, the more objects discovered the better but that’s not always true. All databases are different.

There are hundreds of reason why specific database objects might not be discovered. Analysis and explanation of missing elements will probably require the help of a Data Selections consultant.

Normal (Best Option)

This is the best Data Discovery option to get the most accurate and useful data from your database. However, it also takes the longest to complete. The Normal option builds the best business “context” of your data. (It should make the most business sense to the user.)

Normal will only analyze the views found in your database. Each successfully discovered view will result in an Entity in DSF that can be used for reporting. This method will not discover ALL data table elements from a database. It will only discover the most useful data elements based on the learned business context from the database views.

None (Manual)

This option is only used when creating a new Data Source. It will create a DSF structure that you can use to manual establish data elements with specific Discovery Managers (By View, Table, or SQL Query). It will not discover any data elements. Use this option if you have SQL queries in text form that you use to currently access your database that are not views in the database.

The wizard was designed to make data discovery as painless as possible. Due to the progressive learning nature of DSF, you can combine multiple discovery methods or run the wizard multiple times. If you ever need to undo a discovery you can always revert to a previous Snapshot .

Discovering Data Print

First Discovery

The Wizard

Database Statistics

Normal (Best Option)

None (Manual)

Related Articles