Flexibility and scalability rethought

Guide to an agile data warehouse with Data Vault 2.0

A rigid data warehouse system acts like a bottleneck and hinders the flow of data. Data Vault 2.0 provides companies with a flexible and scalable framework that meets modern requirements.

In the world of trade, the Suez Canal is a crucial waterway that can act as a bottleneck when ships block the canal. Such delays have a negative impact on global supply chains and trade.

A similar effect occurs when a data warehouse system is built inflexibly and rigidly. Without a scalable and agile architecture, "data congestion" and inefficient processes are created that impede the flow of data. Data Vault 2.0 offers a solution to this problem by providing a flexible and scalable framework that meets the challenges of modern data landscapes.

In this blog post, we highlight the basic principles and elements of Data Vault 2.0 and illustrate them with a practical example.

What is Data Vault 2.0?

Data Vault 2.0 is a method for building data warehouse systems that is designed for scalability, agility and auditability. It was developed by Dan Linstedt and is based on an extensible architecture that aims to overcome the challenges of traditional data warehouse models such as rigid structures, difficult scaling and lack of flexibility.

Data Vault 2.0 is more than just a data model. It comprises three core areas:

  • Modelling: A flexible and historicizing data model.
  • Methodology: An agile approach to data integration.
  • Technology: The use of modern tools and technologies such as automation and parallel processing.

Incidentally, the name "2.0" indicates that this method is a further development of the original Data Vault approach. While Data Vault 1.0 focused exclusively on the data model, Data Vault 2.0 expands the approach to include modern technologies, agile methods and a holistic framework that covers the entire life cycle of a data warehouse. 

This further development makes the method more powerful and more adaptable to the requirements of modern data landscapes.

Adrian
Adrian Liebetrau

Your expert in agile data warehousing

Do you have questions about "Agile Data Warehouses"? Our expert, Adrian Liebetrau, brings years of experience in digital transformation projects to the table and would be happy to discuss this topic with you in a no-obligation consultation.

Contact us now without obligation

The principles of Data Vault 2.0

Data Vault 2.0 is based on the following central principles:

  • Scalability: The architecture supports the growth of data volumes and increasing requirements.
  • Flexibility: Changes in the source system can be easily integrated without destroying existing structures.
  • Audit compliance: All data is stored in its original state, which enables detailed traceability.
  • Automation: Many processes, such as loading data, can be automated to save time and resources.
  • Agility: It enables fast development cycles and adaptations.

The elements of Data Vault 2.0

Data Vault 2.0 consists of three main components: Hubs, Links and Satellites.

Each of these components fulfills a specific function within the data model.

Hubs
Hubs represent the central business concepts, such as customers or articles. They contain a unique business key and are used to identify data.
Example: A "customer" hub could contain customer numbers as keys.

Links
Links model the relationships between the hubs. They represent links that take place between different business units.
Example: A link could represent which customers have purchased which items.

Satellites
Satellites contain the context data, i.e. all attributes and historicized information about the hubs or links. For example, names, addresses or changes to the data are stored here.
Example: A satellite for a "Customer" hub could contain the name, address and date of birth.

BI Bytes Podcast Episode #4: From Traditional to Agile Data Warehouses

Practical example

Table structure and data entry in Data Vault 2.0

An innovative retailer that offers high-quality products for its customers has decided to introduce Data Vault 2.0 in order to optimize its data landscape. The first step was to record all relevant data on orders and customers in the data warehouse.

In the first step, the data model was visualized using the following diagram.

According to this, there should be the Customer and Order hubs. These are connected to the Customer_Order link. The customer and order information should be stored in additional satellites.


But what does this mean in concrete terms in the database?

The table structure and data entry in Data Vault 2.0 always follow the principles mentioned above. Accordingly, the structure of the hubs, links and satellites follows the same specifications.

Definition of hubs


Each hub receives an artificially generated hash key, which acts as an artificial key. In addition, there are the functional keys that originate from the source systems as well as a timestamp (LDTS) and the source system designation (RSCR).

Definition of the links
The link connects the two hubs HUB_CUSTOMER and HUB_ORDER and thus enables the mapping of an m:n relationship.

Each link also contains an artificially generated HASH key and the foreign keys to the linked hubs. A timestamp and the data source are also recorded.

If a new hub is added, for example for articles, an additional link is created that connects all three hubs. The existing link is retained in order to ensure the traceability and historization of the data.

Link #1: LNK_CUSTOMER_ORDER

Definition of satellites

The satellite contains the foreign key for the respective hub and a timestamp for when the data record first appeared in the data warehouse. Both columns result in the unique key.

The satellites in Data Vault 2.0 enable the implementation of slow change dimensions. Accordingly, historization can also be selected for each business concept.

The separation of business key (Hub) and business attribute (Sat) enables subsequent expansion of business information on the basis of existing Hub keys. New satellites are added instead of deleting or adapting existing satellites.

Satellite #1: SAT_CUSTOMER

In our example, the customer name (CUSTNAME) and the segment (CUSTSEGM) are recorded in the satellite.

Satellite #2: SAT_LNK_CUSTOMER_ORDER

The total turnover (TOTALSALES) is saved in the satellite at the link.

In addition to the tables mentioned above, there are further tables that are required to fully cover the requirements described at the beginning. The complete table structure looks as follows in the first step:

In Data Vault 2.0, there are also the so-called reference tables (REF for short). A reference table could contain all valid country or currency details, which are then used by hubs such as HUB_CUSTOMER or HUB_ORDER. This means that these values do not have to be saved multiple times in each hub or satellite.


Reference table

Conclusion

This example illustrates how Data Vault 2.0 provides a flexible and scalable basis. It also shows that the modeling approach leads to an increasing number of tables. To maximize efficiency during development, we recommend the use of DWH Builder to increase the degree of automation. Only the combination of Data Vault 2.0 with automation unfolds its full potential in the development of agile DWH systems.

How does this data model with hubs, links and satellites, which is often referred to as the "core", become the widely used star schema with dimensions and facts? We answer this question in the second part of the "Agile Data Warehouse" topic block.

And if you would like to know more about this topic, you are welcome to listen to our podcast episode on "Agile Data Warehouse".



Get in touch now!

Do you have any questions about us, our solutions and services or do you have any other concerns? Simply fill out the contact form and send us your request.

Download Whitepaper

Data protection