December 2022 – Mark Nagelberg

One thing that has become clear to me as I get older and raise a family is that human beings need a bigger purpose in life. We need big goals and big dreams that we can endlessly pursue. Without this, we inevitably stagnate and decline.

You can clearly see our need for a higher purpose throughout society. You see this in the great religious traditions that provide followers with meaning, structure, and direction. You see this in the advice of successful and influential people such as Jordan Peterson that encourage people to “aim high”. You see this in the success of industry titans like Steve Jobs or Elon Musk, who propelled their workers and themselves to great heights using a big vision and persistence in reiterating it at every step of the journey.

Clearly, there are many ways of working toward big goals and there are many goals you can work toward. I want to focus on one technique that I’ll call “The Big Project”.

Characteristics of the Big Project

The Big Project is a significant project that provides you with a platform to learn many new things. A good Big Project should have a few defining characteristics.

It is Ambitious. The project will be ideally working towards a broad, lofty, and exciting goal. As Peter Drucker says in his book “The Effective Executive“: “Aim high, aim for something that will make a difference, rather than for something that is “safe” and easy to do.” (pg. 111)
It Matters to You. The goals of The Big Project should be aligned with your personal values and beliefs, and propel you toward an outcome you care about. If you don’t care about the Big Project, you won’t stick with it. Similarly, you want The Big Project to be building skills that are actually relevant to you.
It Matters to Others: The Big Project should have some tangible real-world relevance to others, rather than just being a “toy” project for you.
It has Identifiable Milestones and Subtasks: Although the project should be large in scope with no clear “project complete” criterion, you should still be able to complete well-defined subtasks along the way to help you stay motivated and ensure the project is meeting your goals. These subtasks should be achievable, but also provide room for growth. You want a challenge, but not so much difficulty that you get overwhelmed and demotivated.
It is flexible and adaptable: A good Big Project should be flexible and adaptable, and allow for changes and adjustments as needed. This can help you stay on track and achieve your goals even if unexpected challenges or roadblocks arise.

Why Pursue a Big Project?

First and foremost, working on a Big Project is satisfying. It can provide excitement and some sense of meaning if you are thoughtful about what you choose to work on. It can boost your confidence and self-esteem, and provide a sense of accomplishment.

The Big Project takes advantage of the power of compounding, as the philosophy is to consistently build something over a long period of time. Through the magic of compounding, you may reach heights you didn’t expect and couldn’t achieve if you only worked on quick and shallow projects.

The Big Project also provides focus. With a clear overarching goal that inspires you, you will less likely be led astray by activities that don’t propel you toward that vision. You also become less likely to fall into the trap of never-ending planning, because it becomes much more clear what you should work on.

Examples

The Big Project can take many forms, depending on your goals and interests. Some more common Big Projects would include raising a family or living your life in accordance with a particular religious moral code. Both of these things are Big Projects in my life and many others.

In my professional life, one of my main Big Projects is my website, which includes Articles on efficient learning, spaced repetition, productivity, programming, and data science, as well as Download Mark’s Brain, which is a full stack web application that I built for sharing notes and flash cards, synced with my personal collections in Anki and Roam Research.

Together, this Big Project meets all the characteristics required.

First of all, It’s ambitious and large in scope: there is no limit to articles I can write and articles and books I can learn and take notes on. I also have long-term ambitions for Download Mark’s Brain to develop it into a collaborative flashcard development site with the lofty goal of converting all of human knowledge into testable Q+A format.

Furthermore, it matters to me personally for many reasons:

Writing is a skill I want to develop and it also clarifies my thinking and learning
I want to develop my skills in the myriad of programming patterns and technologies involved in developing a full-stack application
I want to grow and maintain my personal knowledge management system
I want to maintain a consistent and fast-paced learning / reading cadence
I want to eventually build a self-sustaining business

It also matters to anyone else that benefits from my writing and shared notes and flashcards, it is easy to create clear milestones and subtasks, and there is great flexibility in the direction I can take the project.

Conclusion

I challenge you to find at least one Big Project in your life. Whether it’s raising a family, living according to a particular moral code, or working towards a professional goal, a big project can be a powerful tool for your well-being, development, and growth.

Author:: [[Ralph Kimball]], [[Margy Ross]]
Reading Status:: #complete
Review Status:: #[[complete]]
Tags:: #books #[[dimensional modeling]]
Source:: link
Roam Notes URL:: link
Anki Tag:: kimball_group_reader kimball_group_reader_ch_1
Anki Deck Link:: link
Setting up for Success
- 1.1 Resist the Urge to Start Coding ([[Ralph Kimball]], DM Review, November 2007) (Location 944)
  - Before writing any code or doing any modelling or purchasing related to your data warehouse, make sure you have a good answer to the following 10 questions:
    - [[Business Requirements]]: do you understand them? (Most fundamental and far-reaching question)
    - [[Strategic Data Profiling]]: are data assets available to support business requirements?
    - [[Tactical Data Profiling]]: Is there executive buy-in to support business process changes to improve data quality?
    - [[Integration]]: Is there executive buy-in and communication to define common descriptors and measures?
    - [[Latency]]: Do you know how quickly data must be published by the data warehouse?
    - [[Compliance]]: which data is compliance-sensitive, and where must you have protected chain of custody?
    - [[Data Security]]: How will you protect confidential or proprietary data?
    - [[Archiving Data]]: How will you do long-term archiving of important data and which data must be archived?
    - [[Business User Support]]: Do you know who the business users are, their requirements and skill level?
    - [[IT Support]]: Can you rely on existing licenses in your organization, and do IT staff have skills to support your technical decisions?
- 1.2 Set Your Boundaries ([[Ralph Kimball]], DM Review, December 2007) (Location 1003) #[[Business Requirements]] #[[setting boundaries]]
  - This article is a discussion of setting clear boundaries in your data warehousing project to avoid taking on too many requirements.
Tackling DW/BI Design and Development
- This group of articles focuses on the big issues that are part of every DW/BI system design. (Location 1071)
- 1.3 Data Wrangling ([[Ralph Kimball]], DM Review, January 2008) (Location 1074) #[[data wrangling]] #[[data extraction]] #[[data staging]] #[[change data capture]]
  - [[data wrangling]] is the first stage of the data pipeline from operational sources to final BI user interfaces in a data warehouse. It includes [[change data capture]], [[data extraction]], [[data staging]], and [[data archiving]] (Location 1076)
  - [[change data capture]] is the process of figuring out exactly what data changed on the source system that you need to extract. Ideally this step would be done on source production system. (Location 1080). Two approaches:
    - Using a change_date_time field in the source: a good option, but will miss record deletion and any override of the trigger producing the change_date_time field. (Location 1089)
    - Production system daemon capturing every input command: This detects data deletion but there are still DBA overrides to worry about. (Location 1089)
    - Ideally you will also get your source production system to provide a reason data changed, which tells you how the attribute should be treated as a [[slowly changing dimension (SCD)]]. (Location 1098) #[[change data capture]]
    - If you can’t do [[change data capture]] on source production system, you’ll have to do it after extraction, which means downloading larger data sets. (Location 1103)
      - Consider using [[cyclic redundancy checksum (CRC)]] to significant improve performance of the data comparison step here.
  - [[data extraction]]: The transfer of data from the source system into the DW/BI environment. (Location 1117)
    - Two main goals in the [[data extraction]] step:
      - Remove proprietary data formats
      - Move data into [[flat files]] or [[relational tables]] (eventually everything loaded into relational tables, but flat tiles can be processed very quickly)
  - [[data staging]]: [[Ralph Kimball]] recommends staging ALL data: save the data the DW/BI system just received in original target format you chose before doing anything else to it. (Location 1126)
  - [[data archiving]]: this is important for compliance-sensitive data where you have to prove data received hasn’t been tampered with. Techniques here include using a [[hash code]] to show data hasn’t changed. (Location 1129)
- 1.4 Myth Busters ([[Ralph Kimball]], DM Review, February 2008) (Location 1135) #[[dimensional modeling]]
  - Addresses various myths related to [[dimensional modeling]]
  - Myth: A dimensional model could be missing key relationships that exist only in a true relational view. (Location 1142)
    - In fact, dimensional models contain all the data relationships that normalized models have.
  - Myth: dimensional models are not sufficiently extensible and do not accommodate changing [[business requirements]]. (Location 1149)
    - It’s the opposite: normalized models are much harder to change when data relationships change. [[slowly changing dimension (SCD)]] techniques provide the basis for models to meet changing [[business requirements]].
  - Myth: dimensional models don’t capture data at sufficient level of granularity / detail (Location 1177)
    - In fact, models should capture measurement events in [[fact tables]] at the lowest possible grain.
- 1.5 Dividing the World ([[Ralph Kimball]], DM Review, March 2008) (Location 1188)
  - Two main entities in [[dimensional modeling]] (Kimball estimates 98% of data can be immediately and obviously categorized as one of these):
    - [[dimension ([[dimensional modeling]])]]: the basic stable entities in our environment, such as customers, products, locations, marketing promotions, and calendars. In end user BI tools, dimensions are primarily used for constraints and row headers.
    - [[fact ([[dimensional modeling]])]]: Numeric measurements or observations gathered by all of our transaction processing systems and other systems. In end user BI tools, dimensions are primarily used for computations.
      - [[fact table grain]]: Description of measurement in physical, real-world terms – a description of what each row in the [[fact table]] represents. (Location 1209) There is sometimes a temptation to add facts not true to the grain to shortcut a query, but this often introduces complexity and confusion for business users. (Location 1218)
      - A [[fact ([[dimensional modeling]])]] should be additive whenever possible – it should make sense to add facts across records. A common example here is storing extended price (i.e. price * quantity) instead of just price in a fact table where the measurement is retail sale (Location 1224)
  - A distinct characteristic of [[dimensional modeling]] is not using [[normalized data]]. Normalized models are great in transaction processing systems, but they are not understandable by business users. Dimensional models, correctly designed, contain exactly the same data and reflect the same business rules, but are more understandable. [[understandability]] is a central goal of a BI system used by business users. (Location 1249)
- 1.6 Essential Steps for the Integrated [[Enterprise Data Warehouse (EDW)]] ([[Ralph Kimball]], DM Review, April 2008 and May 2008) (Location 1254) #[[data integration]]
  - This section provides an overall architecture for building an integrated [[Enterprise Data Warehouse (EDW)]] which supports [[Master Data Management (MDM)]] and and has the mission of providing a consistent business analysis platform for an organization. (Location 1258)
  - Essential act of the [[Enterprise Data Warehouse (EDW)]] is [[drilling across]]: gathering results from separate [[business process subject area]]s and combine them into a single analysis. (Location 1282)
  - A key prerequisite to developing the [[Enterprise Data Warehouse (EDW)]] is a significant commitment and support from top-level mangement on the value of data integration. (Location 1303)
  - Having an existing [[Master Data Management (MDM)]] project is a good sign of executive buy-in for data integration, and significantly simplifies data warehouse [[data integration]]. (Location 1308)
  - [[conformed dimensions]] and [[confirmed facts]] provide the basis for [[data integration]] (Location 1316)
    - [[conformed dimensions]]: two dimensions are conformed if they contain one or more common fields whose contents are drawn from the same domains. (Location 1318) Typical examples: customer, product, service, location, employee, promotion, vendor, and calendar. (Location 1367)
    - [[conformed facts]]: numeric measures that have the same business and mathematical interpretations so that they may be compared and computed against each other consistently. (Location 1320)
  - [[enterprise data warehouse (EDW) bus matrix]]: two-dimensional matrix with [[business process subject area]] on the vertical axis and [[dimension tables]] on horizontal axis. (Location 1324) An X in the matrix represents where a subject area uses a dimension. It helps you prioritize development of separate subject areas and identify possible scope of [[conformed dimensions]]. "The columns of the bus matrix are the invitation list to the conformed dimension design meeting." (Location 1333) This is an important item to send to senior management to review before conformed dimension design meetings. "If senior management is not interested in what the bus matrix implies, then to make a long story short, you have no hope of building an integrated EDW." (Location 1335)
    - Note that the different stakeholders don’t have to give up their domain specific private attributes that they need – stakeholders just need to agree on the [[conformed dimensions]]. (Location 1341)
    - Even when you get senior management full buy-in, there is a lot of operational management involved in the [[Enterprise Data Warehouse (EDW)]], including two abstract figures: the [[dimension manager]] (builds and distributes a conformed dimension to the rest of the enterprise) and the [[fact provider]] (downstream client to the dimension manager who receives and utilizes the conformed dimension, almost always while managing one or more fact tables within a subject area). (Location 1347)
- 1.7 Drill Down to Ask Why [[Ralph Kimball]], DM Review, July 2008 and August 2008 (Location 1481) #[[decision making]]
  - Important to understand how your data warehousing system drives decision-making, not just your technical architecture.
  - [[Bill Schmarzo]] architecture for decision making, aka [[analytic application process]]: (Location 1489)
    1. Publish reports.
    2. Identify exceptions.
    3. Determine causal factors. Seek to understand the “why” or root causes behind the identified exceptions. Main ways you might do this: #[[causality]] #[[determining causality]]
      - Get more detail
      - Get a comparison
      - Search other data sets
      - Search the web for information about the problem
    4. Model alternatives. Provide a backdrop to evaluate different decision alternatives.
    5. Track actions. Evaluate the effectiveness of the recommended actions and feed the decisions back to both the operational systems and DW, against which published reporting will occur, thereby closing the loop.
- 1.8 Slowly Changing Dimensions [[Ralph Kimball]], DM Review, September 2008 and October 2008 (Location 1557) #[[slowly changing dimension (SCD)]]
  - The Original Three Types of [[slowly changing dimension (SCD)]] cover all the responses required for a revised or updated description of a dimension member (Location 1569)
    - [[type 1 slowly changing dimension (SCD)]]: Overwrite
    - [[type 2 slowly changing dimension (SCD)]]: Add a New Dimension Record
    - [[type 3 slowly changing dimension (SCD)]]: Add a New Field
- 1.9 Judge Your BI Tool through Your Dimensions – [[Ralph Kimball]], DM Review, November 2008 (Location 1650) #[[BI tools]] #[[BI tool selection]] #[[dimension tables]]
  - [[dimension tables]] implement the [[UI]] of your BI system: they provide the labels, the groupings, the drill-down paths.
  - This article describes [[requirements]] a BI tool should be able to meet with dimensions:
    - Assemble a BI query or report request by first selecting [[dimension table attributes]] and then selecting [[facts (dimensional modelling)]] to be summarized.
    - [[drilling down]] by adding a row header
    - Browse a dimension to preview permissible values and set constraints
    - Restrict the results of a dimension browse with other constraints in effect
    - [[drilling across]] by accumulating measures under labels defined by conformed dimension attributes
- 1.10 [[fact tables]] – [[Ralph Kimball]], DM Review, December 2008 (Location 1707)
  - [[fact tables]] contain the fundamental measurements of the enterprise and are the target of most data warehouse queries.
  - Design rules for [[fact tables]]:
    - Stay true to the [[fact table grain]] – take care in defining the [[grain (dimensional modelling)]] – what a single record in the fact table represents. This is the first and most important design step. It ensures the [[foreign keys]] in the fact table are grounded and precise.
    - Build up from the lowest possible [[fact table grain]]. This ensures you have the most complete set of [[dimension tables]] that can describe the fact table and enables detailed [[drilling down]] for the user.
  - 3 types of [[fact tables]]: (Location 1741)
    - [[transaction grain [[fact table]]]]: Measurement taken at a single instance (e.g. each cash register beep). Transactions can happen after a millisecond or next month or never – they’re unpredictably sparse or dense.
    - [[periodic snapshot grain [[fact table]]]]: Facts cover a predefined span of time. Powerful guarantee: all reporting entities will appear in each snapshot, even if there is no activity – it’s predictably dense and applications can rely on certain key combinations being available.
    - [[accumulating snapshot grain [[fact table]]]]: Rows represent a predictable process with a well-defined beginning and end (e.g. order processing, claims processing).
- 1.11 Exploit Your [[fact tables]] – [[Ralph Kimball]], DM Review, January/February 2009 (Location 1765)
  - This article describes basic ways to exploit the 3 main fact table designs in the front room and in the back room.
  - Front Room: [[aggregate navigation]] – choosing to give the user pre-aggregated data at run time, without without the end user knowing the difference. Seamlessly provide aggregated and detailed atomic data.
  - Front Room: [[drilling across]] Multiple Fact Tables at Different Grains – you can do this as long as you choose [[conformed dimensions]] for the answer set row headers that exist for all the fact tables in your integrated query.
  - Front Room: Exporting Constraints to Different Business Processes – building connections to other [[business process subject area]] in the [[UI]] so you can explore related data in a single click or swipe.
  - Back Room: [[fact table surrogate keys (FSKs)]] – sometimes you want to do this for one of the following benefits
    - Uniquely and immediately identify single fact records.
    - FSKs assigned sequentially so a load job inserting new records will have FSKs in a contiguous range.
    - An FSK allows updates to be replaced by insert-deletes.
    - An FSK can become a foreign key in a fact table at a lower grain.

Month: December 2022

What are your “Big Projects”?

Characteristics of the Big Project

Why Pursue a Big Project?

Examples

Conclusion

Notes on The Kimball Group Reader Chapter 1: The Reader at a Glance