258 real exam-style questions from the official question bank. Choose your topics, set the count, and start quizzing.
Questions
258
Topics
8
Multi-Answer
14
🎓 Choose Mode
🎯 Select Topics
Choose one or more topics, or leave all unselected to include every topic.
⚙️ Quiz Options
Order
Number of Questions
max: 258
📋 Exam Simulation Settings
90
Questions
90
Minutes
75%
Passing Score
Questions are randomly selected from all 8 topic areas. No feedback is shown during the exam. Full explanations are available in the review at the end.
Question 1 of 90Score: 0 / 0⏱ 0:00
📚 Training
✅ Why this is correct
❌ Why the others are wrong
Q#
Quiz Complete!
📊 Score by Topic
❌ Incorrect Answers
📖 Full Question Review
Click any question to expand the explanation for why each answer is correct or incorrect.
Terminology Reference
Plain-language definitions for every key term across all 5 exam domains. Click a domain to expand, or search for any term.
🔍
No terms match your search.
Official Acronym Flashcards
All 26 acronyms from the official CompTIA DA0-001 exam objectives. Master every one before exam day.
Total
26
Source
Official
🃏 Acronym Flashcards — Click to Flip
1 / 26
Acronym
ETL
Stands For
Extract, Transform, Load
📋 Full Acronym Reference
⚡ Quick-Match Quiz — Acronyms
Chapter Notes
Your personal notes for this chapter
📝 My Notes
CompTIA Data+ Study Buddy
Your all-in-one study companion for the DA0-001 exam. Work through all 8 chapters with topics, flashcards, quizzes, and personal notes.
📚
8
Total Chapters
🃏
40+
Flashcards
✅
0
Chapters Studied
🎯
—
Quiz Score
All Chapters
Ch 1Today's Data Analyst
Introduction to analytics roles, processes, and the data landscape.
Analytics ProcessDescriptivePredictiveAI/ML
Ch 2Understanding Data
Data types, structures, and common file formats used in analytics.
Structured DataJSONXMLData Types
Ch 3Databases & Data Acquisition
Relational models, NoSQL, OLTP/OLAP, ETL/ELT, and SQL fundamentals.
SQLETLOLAPNoSQL
Ch 4Data Quality
Identifying and resolving data quality issues; manipulation techniques.
Reasonable expectations, Data profiling, Data audits
Objective 5.3
Explain master data management (MDM) concepts.
Processes
Consolidation of multiple data fields
Standardization of data field names
Data dictionary
Compliance with policies and regulations
Streamline data access
Circumstances for MDM
Mergers and acquisitions
📋 Exam Format
90 questions total
90 minutes to complete
Passing score: 675 (scale 100–900)
Performance-based assessment format
Multiple-choice, fill-in-the-blank
Multiple-response, drag-and-drop
Image-based problems
💡 Exam Tips
CompTIA includes vague questions — use logic
Some questions may have two correct answers
Item seeding: some questions are unscored
18–24 months hands-on experience recommended
Focus on scenario-based learning
Review all 5 domains proportionally
Data Mining (25%) is the heaviest domain
Chapter 1
Today's Data Analyst
Introduction to the world of analytics — the roles, tools, processes, and techniques that define modern data work.
🎯 Exam Objectives Covered
Domain 1.0 — Data Concepts & Environments (15%)
Domain 2.0 — Data Mining (25%)
Domain 3.0 — Data Analysis (23%)
Domain 4.0 — Visualization (23%)
Domain 5.0 — Data Governance, Quality & Controls (14%)
This chapter introduces all 5 domains at a high level
📖 Key Topics
Topic 1 of 7
Loading topics…
0 of 7 topics covered
🏋️ Training Scenarios
Click any scenario to reveal the answer.
▶📌 A retail company wants to understand why sales dipped last quarter. Which analytics type applies — descriptive, predictive, or prescriptive?
✅ AnswerDescriptive analytics — it describes what happened (sales drop) and examines the past. It does not predict or recommend action.
▶📌 A logistics company wants to recommend optimal shipping routes to minimise delivery time and cost. Which analytics type is this?
✅ AnswerPrescriptive analytics — it recommends the best action to take (which route to use). It combines predictions and optimisation algorithms to produce actionable recommendations.
▶📌 A bank wants to identify which loan applicants are likely to default in the next 12 months. Which analytics type applies?
✅ AnswerPredictive analytics — it uses historical data and models to forecast a future outcome (probability of default). It does not recommend action, only predicts likelihood.
▶📌 List the 5 steps of the analytics process in order.
✅ Answer1. Data Acquisition → 2. Cleaning & Manipulation → 3. Analysis → 4. Visualization → 5. Reporting & Communication. This sequence appears directly on the CompTIA Data+ exam.
▶📌 What is the difference between a Data Analyst and a Data Scientist?
✅ AnswerA Data Analyst focuses on describing and interpreting existing data through reporting, dashboards, and SQL queries. A Data Scientist builds predictive and statistical models using machine learning algorithms, typically with Python or R.
▶📌 A streaming music service suggests songs a user might like based on their listening history. Which analytics type drives this feature?
✅ AnswerPrescriptive analytics (recommendation engine) — it recommends specific songs to listen to. This is built on predictive models that estimate what the user will enjoy, then prescribes what to play next.
▶📌 A marketing manager receives a PDF report every Monday showing last week's campaign KPIs. What type of report is this?
✅ AnswerA recurring static report — it is generated on a fixed weekly schedule and delivered in a non-interactive format (PDF). It is descriptive in nature, summarising historical performance.
▶📌 Which analytics tool would be most appropriate for a business user who needs to build an interactive dashboard to share with executives without coding?
✅ AnswerA Business Intelligence platform like Tableau or Microsoft Power BI — they provide drag-and-drop dashboard creation, interactive filters, and publishing capabilities without requiring programming skills.
🧪 Practice Labs
🔢 Put the Analytics Process in Order
Click & Order
Use the ↑ ↓ buttons to arrange the five steps in the correct sequence.
🎯 Match Analytics Types to Scenarios
Click & Match
Click a card to select it (it highlights), then click the correct category zone to place it.
🃏 Flashcards — Click to Flip
1 / 10
Term
Descriptive Analytics
Definition
Analyzes historical data to understand what has happened in the past.
❓ Quick Quiz
📝 My Study Notes
Chapter 2
Understanding Data
Explores data types, structures, and file formats — the foundational vocabulary every data analyst must know.
🎯 Exam Objectives Covered
Obj 1.2 — Compare and contrast different data types
▶📌 A dataset has columns: CustomerID (integer), Name (text), Email (text), SignupDate (date), OrderCount (integer). Classify each by data type.
✅ AnswerCustomerID: Numeric (discrete/integer). Name: Alphanumeric/Text. Email: Alphanumeric/Text. SignupDate: Date/Time. OrderCount: Numeric (discrete/integer). All are structured data in defined columns.
▶📌 You receive IoT sensor logs as pipe-delimited text files: "2024-01-15|23.7|OK". What data structure type is this?
✅ AnswerStructured data in a flat/delimited file format. Each record has a consistent structure (timestamp | value | status) with a known delimiter. It is not truly unstructured — it has implicit structure, even without a formal schema.
▶📌 You need to exchange data between two web APIs. Which file format is most appropriate and why?
✅ AnswerJSON — it is the standard for REST API data exchange. It supports nested structures, is human-readable, lightweight, and natively understood by all modern programming languages and web frameworks.
▶📌 Is "Number of website visits per day" discrete or continuous? What about "Page load time in seconds"?
✅ AnswerVisits per day: Discrete — you cannot have 3.7 visits; it is a whole number count. Page load time: Continuous — it can be 1.237 seconds, 2.5 seconds, any decimal value within a range.
▶📌 A data warehouse stores sales data in a fact table linked to Product, Store, Date, and Customer dimension tables. What schema design is this?
✅ AnswerStar schema — one central fact table (sales transactions with numeric measures) surrounded by dimension tables (Product, Store, Date, Customer). This is the standard design for OLAP/data warehouse analytics.
▶📌 What is the difference between strong typing and weak typing in databases?
✅ AnswerStrong typing: the database strictly enforces data types — inserting text into a numeric column fails with an error. Weak typing: the system auto-coerces types, potentially allowing "5" (string) + 3 (integer) = 8. Relational databases are strongly typed; spreadsheets often use weak typing.
▶📌 A big data platform stores 10 TB of raw log files, images, video, and JSON from 50 different systems. What type of storage system fits best?
✅ AnswerA data lake — it stores raw data in native format (structured, semi-structured, and unstructured) without requiring a predefined schema. It is cost-effective for large volumes and supports schema-on-read when analysis is needed later.
▶📌 A "Customer Satisfaction Rating" field contains values: Excellent, Good, Fair, Poor. What type of data is this?
✅ AnswerOrdinal qualitative data — it is categorical (labels, not numbers) with a meaningful rank order (Excellent > Good > Fair > Poor). The gaps between categories are not necessarily equal, so arithmetic (calculating an average rating) is statistically inappropriate.
🧪 Practice Labs
📓 Jupyter Notebook Available
Download and run hands-on Python exercises for this chapter in Jupyter.
🗂️ Classify These Data Examples
Click & Match
Drag each item to the correct data structure category (Structured, Semi-structured, or Unstructured).
✏️ Fill in the Data Type Definitions
Fill in Blank
Type the correct CompTIA data type term for each definition.
🐍 Python: Exploring Data Types
Code Lab
Run Python cells to explore data types. Click ▶ Run to execute each cell.
🃏 Flashcards — Click to Flip
1 / 10
Term
Structured Data
Definition
Data organized in a defined schema with rows and columns — typically stored in relational databases or spreadsheets. Easy to query with SQL.
❓ Quick Quiz
📝 My Study Notes
Chapter 3
Databases & Data Acquisition
Covers the relational model, non-relational databases, OLTP/OLAP patterns, ETL/ELT, and core SQL operations.
🎯 Exam Objectives Covered
Obj 1.1 — Identify basic concepts of data schemas and dimensions (Relational & NoSQL; Data Mart/Warehouse/Lake; OLTP, OLAP; Star & Snowflake)
Obj 2.1 — Explain data acquisition concepts (ETL, ELT, Delta load, APIs, web scraping, surveys, sampling)
▶📌 A company runs 24/7 point-of-sale transactions across 500 stores. OLTP or OLAP?
✅ AnswerOLTP — it requires high-speed inserts and updates for individual transactions. The system must handle thousands of concurrent writes. OLAP would be used separately to analyse the accumulated transaction history.
▶📌 A data team loads daily order changes instead of reloading the entire 5-year history every night. What technique is this?
✅ AnswerDelta load (incremental load) — only records that are new or changed since the last load are processed. This is more efficient than full-refresh loads and reduces pipeline runtime and system load.
▶📌 Write the SQL structure to retrieve all customers who placed orders in the last 30 days, grouped by customer, sorted by total spend descending.
✅ AnswerSELECT c.name, SUM(o.amount) AS total_spend FROM customers c INNER JOIN orders o ON c.id = o.customer_id WHERE o.order_date >= CURRENT_DATE - 30 GROUP BY c.name ORDER BY total_spend DESC
▶📌 When would you choose ELT over ETL?
✅ AnswerChoose ELT when using a modern cloud data warehouse (Snowflake, BigQuery, Redshift) that has sufficient compute power to transform data in-place. ELT preserves raw data, enables iterative transformation with dbt, and avoids maintaining a separate transformation server.
▶📌 A social network stores user profiles with variable fields. What database type fits best?
✅ AnswerA NoSQL document store (like MongoDB) — it handles flexible, variable schemas naturally. Each user document can have different fields without requiring schema alteration or NULL columns for every optional field.
▶📌 What is the difference between a Data Warehouse and a Data Lake?
✅ AnswerData Warehouse: stores structured, cleaned data with a defined schema; optimised for BI queries; schema-on-write. Data Lake: stores raw data in native format; cheap storage; schema-on-read. Warehouses serve BI; lakes serve data science and exploration.
▶📌 INNER JOIN vs LEFT JOIN — what is returned by each?
✅ AnswerINNER JOIN: returns only rows where both tables have matching values. LEFT JOIN: returns all rows from the left table plus matched rows from the right — unmatched right-side rows appear as NULL.
▶📌 A fraud detection system needs to find accounts connected through shared addresses. What database type is optimal?
✅ AnswerA graph database (Neo4j, Amazon Neptune) — it stores relationships as first-class objects and can traverse networks of connections efficiently. Relational databases require complex self-joins for relationship traversal at scale.
🧪 Practice Labs
📓 Jupyter Notebook Available
Download and run hands-on Python exercises for this chapter in Jupyter.
🗄️ Match Database Concepts
Click & Match
Drag each item to its correct database concept category.
⚙️ Put the ETL Steps in Order
Click & Match
Drag the ETL steps into the correct order.
🐍 Python + SQLite
Code Lab
Run Python cells to explore SQLite. Click ▶ Run to execute each cell.
🃏 Flashcards — Click to Flip
1 / 10
Term
ETL
Definition
Extract, Transform, Load — a pipeline that extracts data from sources, transforms it (clean, format, enrich), then loads it into a target data warehouse.
❓ Quick Quiz
📝 My Study Notes
Chapter 4
Data Quality
Learn to identify data quality challenges and apply manipulation techniques to clean, transform, and validate data.
🎯 Exam Objectives Covered
Obj 2.2 — Identify common reasons for cleansing and profiling (duplicate, missing, invalid, outliers, spec mismatch)
▶📌 You want to scale the "Revenue" column to a 0-1 range for use in a machine learning model. Which technique?
✅ AnswerMin-max normalization: (value - min) / (max - min). This ensures no variable dominates due to scale differences.
▶📌 A "Gender" field contains: M, Male, male, 1, F, Female, female, 0. What technique fixes this?
✅ AnswerRecoding (standardization) — mapping all variants to a single consistent value set (e.g., M and F).
▶📌 What is the difference between data profiling and data auditing?
✅ AnswerProfiling is exploratory — generating column statistics to discover quality issues. Auditing is systematic and scheduled — checking data against documented business rules and producing formal reports.
🧪 Practice Labs
📓 Jupyter Notebook Available
Download and run hands-on Python exercises for this chapter in Jupyter.
✏️ Data Quality Dimensions
Fill in Blank
Type the correct data quality term for each definition.
🧹 Match Data Issues to Fixes
Click & Match
Drag each data issue to its correct fix or solution.
🐍 Python: Data Cleaning Pipeline
Code Lab
Run Python cells to practice data cleaning. Click ▶ Run to execute each cell.
🃏 Flashcards — Click to Flip
1 / 10
Term
Imputation
Definition
The process of replacing missing data values with substitute estimates, such as the mean, median, mode, or values derived from regression models.
❓ Quick Quiz
📝 My Study Notes
Chapter 5
Data Analysis & Statistics
Master descriptive and inferential statistics — the mathematical foundation of data analysis and decision-making.
🎯 Exam Objectives Covered
Obj 3.1 — Explain the purpose of a variety of statistical methods (measures of central tendency, dispersion, distribution, hypothesis testing)
Obj 3.2 — Explain the purpose of data sampling techniques (random, stratified, cluster, systematic, convenience)
Obj 3.3 — Explain the purpose of various analysis and reporting techniques (regression, correlation, trend, cohort)
📖 Key Topics
Topic 1 of 8
Loading topics…
0 of 8 topics covered
🏋️ Training Scenarios
Click any scenario to reveal the answer.
▶📌 A dataset of house prices has mean=$450K but median=$320K. What does this tell you?
✅ AnswerThe distribution is right-skewed (positively skewed) — a few very expensive houses pull the mean up significantly above the median. Use median for a typical price.
▶📌 You conduct an A/B test and get p-value = 0.03 with α=0.05. What do you conclude?
✅ AnswerReject the null hypothesis — the p-value (0.03) is less than the significance level (0.05), indicating the observed difference is statistically significant and unlikely due to chance.
▶📌 What does a Z-score of -2.5 mean?
✅ AnswerThe value is 2.5 standard deviations below the mean. It is an unusual observation. Values with |Z| > 3 are typically flagged as outliers.
▶📌 Correlation coefficient r = 0.92 between advertising spend and revenue. Can you conclude advertising causes revenue increases?
✅ AnswerNo — correlation does not imply causation. A confounding variable (e.g., seasonality) may drive both. Only a randomised controlled experiment can establish causation.
▶📌 What is the difference between Type I and Type II errors?
✅ AnswerType I (false positive): rejecting a true null hypothesis — detecting an effect that doesn't exist. Type II (false negative): failing to reject a false null — missing a real effect. α controls Type I error rate.
▶📌 You want to survey 1,000 customers from 5 regions ensuring proportional representation from each. Which sampling method?
✅ AnswerStratified sampling — divide the population into strata (regions) and sample proportionally from each. Ensures all groups are represented.
▶📌 A company tracks monthly revenue for 3 years. What analysis technique identifies a long-term upward trend vs. seasonal fluctuations?
✅ AnswerTime series decomposition — separating the series into Trend, Seasonality, Cyclical, and Residual components. Moving averages smooth out seasonal and irregular components to reveal the trend.
▶📌 R² = 0.85 for a regression model predicting sales from advertising. What does this mean?
✅ AnswerThe model explains 85% of the variance in sales. 15% of variance remains unexplained by the model. R² = 1 would be a perfect fit; R² = 0 means the model explains nothing.
🧪 Practice Labs
📓 Jupyter Notebook Available
Download and run hands-on Python exercises for this chapter in Jupyter.
✏️ Statistics Formulas & Concepts
Fill in Blank
Type the correct statistics term for each definition or formula.
🐍 Python: Statistics from Scratch
Code Lab
Run Python cells to compute statistics from scratch. Click ▶ Run to execute each cell.
🃏 Flashcards — Click to Flip
1 / 10
Term
Null Hypothesis (H₀)
Definition
A statement assuming no effect, no difference, or no relationship between variables — the default position that must be disproven through evidence.
❓ Quick Quiz
📝 My Study Notes
Chapter 6
Data Analytics Tools
Survey the landscape of tools — from spreadsheets and programming languages to BI suites and ML platforms.
🎯 Exam Objectives Covered
Obj 4.1 — Given a scenario, translate business requirements to form a report (report types, delivery methods, design elements)
Obj 4.2 — Given a scenario, use appropriate design components for reports and dashboards (charts, KPIs, filters)
Obj 4.3 — Given a scenario, use appropriate methods for dashboard development workflow (wireframe, mock-up, prototyping)
📖 Key Topics
Topic 1 of 7
Loading topics…
0 of 7 topics covered
🏋️ Training Scenarios
Click any scenario to reveal the answer.
▶📌 A business user wants to explore sales data by region, product, and time period without asking IT each time. What tool category fits best?
✅ AnswerA self-service BI platform (Tableau, Power BI) — provides interactive filters, drill-down, and slicers that let users explore data independently without technical skills.
▶📌 You need to build a machine learning model to predict customer churn, handling feature engineering, model training, and evaluation. Which tool is most appropriate?
✅ AnswerPython with pandas and scikit-learn — provides full programmatic control for complex ML pipelines, feature engineering, cross-validation, and model evaluation.
▶📌 A pharmaceutical company needs to run clinical trial statistical analysis that regulators will accept. Which tool?
✅ AnswerSAS — it is the gold standard for regulated industries, FDA-accepted for clinical trial analysis, and has built-in audit trails for regulatory compliance.
▶📌 A data analyst needs to query a Snowflake data warehouse directly. What interface do they use?
✅ AnswerSQL — either via Snowflake's web UI, a SQL client (DBeaver), or a BI tool that connects directly to Snowflake using a JDBC/ODBC connector.
▶📌 What is the difference between Python and R for data analysis?
✅ AnswerPython: general-purpose programming language with excellent data analysis libraries (pandas, scikit-learn); better for ML, automation, web scraping, and production deployment. R: statistics-first language with extensive academic statistical packages; better for advanced statistics, academic research, and publication-quality charts.
▶📌 When should you use Excel vs. Python for data analysis?
✅ AnswerExcel: best for small-medium datasets (<1M rows), ad hoc analysis, sharing with non-technical users, and quick calculations. Python: best for large datasets, reproducible pipelines, automation, machine learning, and processing data that changes regularly.
▶📌 A company wants to build cloud analytics that automatically scales with query demand and charges only per query. Which service fits?
✅ AnswerA serverless query service — AWS Athena (queries S3 data) or Google BigQuery. These charge per TB scanned and require no infrastructure management.
🧪 Practice Labs
📓 Jupyter Notebook Available
Download and run hands-on Python exercises for this chapter in Jupyter.
🛠️ Match Tools to Their Use Case
Click & Match
Drag each analytics tool to its correct use case.
🐍 Python Analytics Pipeline
Code Lab
Run Python cells to build an analytics pipeline. Click ▶ Run to execute each cell.
🃏 Flashcards — Click to Flip
1 / 10
Term
tidyverse
Definition
A collection of R packages (including ggplot2, dplyr, tidyr) designed to facilitate data manipulation and visualization using consistent, readable syntax.
❓ Quick Quiz
📝 My Study Notes
Chapter 7
Data Visualization with Reports & Dashboards
Learn how to translate business requirements into compelling visualizations, reports, and interactive dashboards.
Obj 4.3 — Dashboard workflow: wireframe, mock-up, prototype, stakeholder review, data story
📖 Key Topics
Topic 1 of 8
Loading topics…
0 of 8 topics covered
🏋️ Training Scenarios
Click any scenario to reveal the answer.
▶📌 Monthly website traffic over the past 2 years. Which chart type?
✅ AnswerLine chart — it shows a continuous trend over time with many data points. Each point connects to the next, making rate of change and trend direction immediately visible.
▶📌 Revenue breakdown by product category (6 categories). Which chart?
✅ AnswerHorizontal bar chart — comparing discrete categories. Bars are easier to compare than pie slices, especially with 6+ categories. Sort by value descending for maximum clarity.
▶📌 Show how gross revenue becomes net profit through a series of additions and subtractions (returns, COGS, operating expenses). Which chart?
✅ AnswerWaterfall chart — each bar "floats" at the level where the previous value ended, making the contribution of each factor intuitively clear.
▶📌 Compare the distribution of exam scores across three different training cohorts. Which chart?
✅ AnswerBox plot — shows min, Q1, median, Q3, max for each group simultaneously, enabling direct comparison of distributions including outliers.
▶📌 You have 20 KPIs to display for an executive. A user asks you to put them all on one dashboard. What should you advise?
✅ AnswerRecommend limiting to 5-7 most critical KPIs per dashboard view. Too many KPIs create cognitive overload. Use drill-down to secondary dashboards for additional metrics. Prioritise the metrics most directly linked to executive decisions.
▶📌 A dashboard filter lets users select time period, region, and product category simultaneously. What is this feature called?
✅ AnswerA slicer (or filter panel) — interactive controls that allow users to dynamically subset the data displayed in all connected charts on the dashboard simultaneously.
▶📌 Relationship between 50 products' advertising spend (X axis) and revenue (Y axis). Which chart?
✅ AnswerScatter plot — each product is one point at coordinates (ad spend, revenue). A trend line can be added to show the direction and strength of correlation.
▶📌 What is the difference between a static report, an interactive report, and an ad hoc report?
✅ AnswerStatic: fixed PDF/printout, no interaction. Interactive: live dashboard with filters and drill-down (Tableau, Power BI). Ad hoc: custom one-time report built by an analyst to answer a specific question not covered by existing reports.
🧪 Practice Labs
📓 Jupyter Notebook Available
Download and run hands-on Python exercises for this chapter in Jupyter.
📊 Pick the Right Chart
Click & Match
Drag each scenario to the chart type that best represents it.
✏️ Dashboard & Report Vocabulary
Fill in Blank
Type the correct visualization or reporting term for each definition.
🐍 Python: Chart Logic
Code Lab
Run Python cells to practice chart selection logic. Click ▶ Run to execute each cell.
🃏 Flashcards — Click to Flip
1 / 10
Term
Waterfall Chart
Definition
A visualization that shows the cumulative effect of sequentially introduced positive or negative values — ideal for financial analysis (e.g., profit/loss breakdown).
❓ Quick Quiz
📝 My Study Notes
Chapter 8
Data Governance
Understand the policies, roles, and frameworks that ensure data is secure, compliant, and used appropriately.
🎯 Exam Objectives Covered
Obj 5.1 — Summarise important data governance concepts (roles, data classification, policies, master data management)
Obj 5.2 — Apply data quality control concepts (validation, profiling, auditing, data lineage)
▶📌 A business executive is accountable for a customer dataset — approving who can access it and setting the retention policy. What is their governance role?
✅ AnswerData Owner — accountable for the data's use, access approvals, and lifecycle policies. They don't manage data day-to-day but are ultimately responsible.
▶📌 A database administrator manages backups, storage systems, and security controls for a dataset. What is their role?
✅ AnswerData Custodian — responsible for the physical infrastructure (storage, backups, security). Not accountable for data content or quality.
▶📌 Your organisation must notify customers within a specific timeframe after a personal data breach. Which regulation mandates this?
✅ AnswerGDPR (72 hours to notify the supervisory authority, then without undue delay for affected individuals) applies in the EU. HIPAA requires notification within 60 days for US healthcare breaches.
▶📌 A developer needs to test an application using realistic customer data without exposing real PII. What technique should be used?
✅ AnswerData masking — replace real PII (names, emails, SSNs) with fictitious but realistic values. The data structure and format are preserved but real values are protected.
▶📌 A company stores customer records in 5 different systems, all with slightly different versions of the same customer. What governance practice creates a single authoritative record?
✅ AnswerMaster Data Management (MDM) — the process of matching, merging, and creating a "golden record" that becomes the single source of truth for each customer entity across all systems.
▶📌 "Financial reporting data must be retained for 7 years then securely disposed of." What governance policy area is this?
✅ AnswerData retention and disposal policy — defines how long each data type must be kept (driven by legal/regulatory requirements) and how it must be destroyed at the end of its lifecycle.
▶📌 A financial analyst exports a spreadsheet of customer PII to their personal email to work from home. A governance tool detects and blocks this. What technology is this?
✅ AnswerData Loss Prevention (DLP) — monitors data flows and blocks or alerts on potential exfiltration of sensitive data to unauthorised destinations.
▶📌 What is the difference between data anonymisation and pseudonymisation under GDPR?
✅ AnswerAnonymisation: irreversibly removes all identifying information — GDPR no longer applies (not personal data). Pseudonymisation: replaces identifiers with pseudonyms (hashes) but can be reversed with the key — GDPR still applies (still personal data).
🧪 Practice Labs
📓 Jupyter Notebook Available
Download and run hands-on Python exercises for this chapter in Jupyter.
🔒 Governance Roles & Responsibilities
Click & Match
Drag each governance role to its correct responsibility.
✏️ Data Governance Vocabulary
Fill in Blank
Type the correct data governance term for each definition.
🐍 Python: Governance in Practice
Code Lab
Run Python cells to explore governance concepts in code. Click ▶ Run to execute each cell.
🃏 Flashcards — Click to Flip
1 / 10
Term
Data Steward
Definition
The role responsible for leading an organization's data governance activities, ensuring data quality, security, privacy, and regulatory compliance across the enterprise.
❓ Quick Quiz
📝 My Study Notes
📦 Question Bank
Instructor question management & import pipeline
—Published
—Drafts
—Pending Review
—Failed Jobs
Recent Import Jobs
Loading…
⬆ Import Questions
Upload a PDF, Word document, or scanned exam sheet
📄
Drag & drop a file here, or
Accepted: PDF, DOCX, PNG, JPG, TIFF · Max 50 MB
Uploading…
🔬 Review Studio
Reviewing import job
No items loaded.
Select a question from the list to edit it.
📝 Draft Queue
Questions pending review across all import jobs
Loading…
🔍 Browse Question Bank
Search, filter, edit, and manage published questions