Dashboard

Practice Quiz

258 real exam-style questions from the official question bank. Choose your topics, set the count, and start quizzing.

Questions
258
Topics
8
Multi-Answer
14

🎓 Choose Mode

🎯 Select Topics

Choose one or more topics, or leave all unselected to include every topic.

⚙️ Quiz Options

Order
Number of Questions
max: 258

Terminology Reference

Plain-language definitions for every key term across all 5 exam domains. Click a domain to expand, or search for any term.

🔍

Official Acronym Flashcards

All 26 acronyms from the official CompTIA DA0-001 exam objectives. Master every one before exam day.

Total
26
Source
Official

🃏 Acronym Flashcards — Click to Flip

1 / 26
Acronym
ETL
Stands For
Extract, Transform, Load

📋 Full Acronym Reference

Quick-Match Quiz — Acronyms

Chapter Notes

Your personal notes for this chapter

📝 My Notes

CompTIA Data+ Study Buddy

Your all-in-one study companion for the DA0-001 exam. Work through all 8 chapters with topics, flashcards, quizzes, and personal notes.

📚
8
Total Chapters
🃏
40+
Flashcards
0
Chapters Studied
🎯
Quiz Score
All Chapters
Ch 1Today's Data Analyst
Introduction to analytics roles, processes, and the data landscape.
Analytics ProcessDescriptivePredictiveAI/ML
Ch 2Understanding Data
Data types, structures, and common file formats used in analytics.
Structured DataJSONXMLData Types
Ch 3Databases & Data Acquisition
Relational models, NoSQL, OLTP/OLAP, ETL/ELT, and SQL fundamentals.
SQLETLOLAPNoSQL
Ch 4Data Quality
Identifying and resolving data quality issues; manipulation techniques.
ImputationNormalizationOutliersValidation
Ch 5Data Analysis & Statistics
Descriptive & inferential statistics, hypothesis testing, regression.
Hypothesis TestingRegressionMean/Median
Ch 6Data Analytics Tools
Spreadsheets, programming languages, BI suites, and ML platforms.
PythonRPower BITableau
Ch 7Visualization & Dashboards
Report design, dashboard development, and visualization best practices.
ChartsDashboardsInfographicsKPIs
Ch 8Data Governance
Governance roles, access control, data classification, and MDM.
Data StewardPIIMDMCompliance

Exam Domains & Weightings

The DA0-001 exam covers five domains. Understanding the weight of each domain helps you prioritize your study time.

🎯 Domain Weightings

1.0 Data Concepts & Environments
15%
2.0 Data Mining
25%
3.0 Data Analysis
23%
4.0 Visualization
23%
5.0 Data Governance, Quality & Controls
14%
Exam Objectives — Click Any Domain to Expand
1.0 Data Concepts and Environments — 15% of exam
Objective 1.1
Identify basic concepts of data schemas and dimensions.
  • Databases
  • Relational
  • Non-relational
  • Data mart / data warehousing / data lake
  • Online transactional processing (OLTP)
  • Online analytical processing (OLAP)
  • Schema concepts
  • Snowflake
  • Star
  • Slowly changing dimensions
  • Keep current information
  • Keep historical and current information
Objective 1.2
Compare and contrast different data types.
  • Date
  • Numeric
  • Alphanumeric
  • Currency
  • Text
  • Discrete vs. continuous
  • Categorical / dimension
  • Images
  • Audio
  • Video
Objective 1.3
Compare and contrast common data structures and file formats.
  • Structures
  • Structured — defined rows/columns; key-value pairs
  • Unstructured — undefined fields; machine data
  • Data file formats
  • Text/Flat file — Tab delimited, Comma delimited
  • JavaScript Object Notation (JSON)
  • Extensible Markup Language (XML)
  • Hypertext Markup Language (HTML)
2.0 Data Mining — 25% of exam
Objective 2.1
Explain data acquisition concepts.
  • Integration
  • Extract, transform, load (ETL)
  • Extract, load, transform (ELT)
  • Delta load
  • Application programming interfaces (APIs)
  • Data collection methods
  • Web scraping
  • Public databases
  • Application programming interface (API) / web services
  • Survey
  • Sampling
  • Observation
Objective 2.2
Identify common reasons for cleansing and profiling datasets.
  • Duplicate data
  • Redundant data
  • Missing values
  • Invalid data
  • Non-parametric data
  • Data outliers
  • Specification mismatch
  • Data type validation
Objective 2.3
Given a scenario, execute data manipulation techniques.
  • Recoding data
  • Numeric
  • Categorical
  • Derived variables
  • Data merge
  • Data blending
  • Concatenation
  • Data append
  • Imputation
  • Reduction / aggregation
  • Transpose
  • Normalize data
  • Parsing / string manipulation
Objective 2.4
Explain common techniques for data manipulation and query optimization.
  • Data manipulation
  • Filtering
  • Sorting
  • Date functions
  • Logical functions
  • Aggregate functions
  • System functions
  • Query optimization
  • Parametrization
  • Indexing
  • Temporary table in the query set
  • Subset of records
  • Execution plan
3.0 Data Analysis — 23% of exam
Objective 3.1
Given a scenario, apply the appropriate descriptive statistical methods.
  • Measures of central tendency
  • Mean
  • Median
  • Mode
  • Distribution
  • Measures of dispersion
  • Range, Max, Min
  • Variance
  • Standard deviation
  • Frequencies / percentages
  • Percent change
  • Percent difference
  • Confidence intervals
Objective 3.2
Explain the purpose of inferential statistical methods.
  • t-tests
  • Z-score
  • p-values
  • Chi-squared
  • Hypothesis testing
  • Type I error
  • Type II error
  • Simple linear regression
  • Correlation
Objective 3.3
Summarize types of analysis and key analysis techniques.
  • Process to determine type of analysis
  • Review / refine business questions
  • Determine data needs and sources
  • Scoping / gap analysis
  • Type of analysis
  • Trend analysis
  • Comparison of data over time
  • Performance analysis
  • Tracking measurements against defined goals
  • Basic projections to achieve goals
  • Link analysis — connection of data points or pathway
  • Exploratory data analysis
  • Use of descriptive statistics to determine observations
Objective 3.4
Identify common data analytics tools.

Note: The intent of this objective is NOT to test specific vendor feature sets nor the purposes of the tools.

  • Structured Query Language (SQL)
  • Python
  • Microsoft Excel
  • R
  • RapidMiner
  • IBM Cognos
  • IBM SPSS Modeler
  • IBM SPSS
  • SAS
  • Tableau
  • Power BI
  • Qlik
  • MicroStrategy
  • BusinessObjects
  • Apex, Dataroma, Domo, AWS QuickSight, Stata, Minitab
4.0 Visualization — 23% of exam
Objective 4.1
Given a scenario, translate business requirements to form a report.
  • Data content
  • Filtering
  • Views
  • Date range
  • Frequency
  • Audience for report
  • Distribution list
Objective 4.2
Given a scenario, use appropriate design components for reports and dashboards.
  • Report cover page
  • Instructions
  • Summary
  • Observations and insights
  • Design elements
  • Color schemes, Layout
  • Font size and style
  • Key chart elements: Titles, Labels, Legends
  • Corporate reporting standards / style guide
  • Branding, Color codes, Logos/trademarks, Watermark
  • Report cover page (cont.)
  • Executive summary
  • FAQs
  • Appendix
  • Documentation elements
  • Version number
  • Reference data sources
  • Reference dates
  • Report run date
  • Data refresh date
Objective 4.3
Given a scenario, use appropriate methods for dashboard development.
  • Dashboard considerations
  • Data sources and attributes, Field definitions
  • Dimensions and Measures
  • Continuous/live data feed vs. static data
  • Consumer types: C-level executives, Management, External vendors/stakeholders, General public, Technical experts
  • Development process
  • Mockup/wireframe → Layout/presentation
  • Flow/navigation → Data story planning
  • Approval granted → Develop dashboard → Deploy to production
  • Delivery considerations
  • Subscription, Scheduled delivery
  • Interactive (drill down / roll up)
  • Saved searches, Filtering
  • Static, Web interface
  • Dashboard optimization
  • Access permissions
Objective 4.4
Given a scenario, apply the appropriate type of visualization.
  • Line chart
  • Pie chart
  • Bubble chart
  • Scatter plot
  • Bar chart
  • Histogram
  • Waterfall
  • Heat map
  • Geographic map
  • Tree map
  • Stacked chart
  • Infographic
  • Word cloud
Objective 4.5
Compare and contrast types of reports.
  • Static vs. dynamic reports
  • Point-in-time
  • Real time
  • Ad-hoc / one-time report
  • Self-service / on demand
  • Recurring reports
  • Compliance reports (financial, health, safety)
  • Risk and regulatory reports
  • Operational reports (performance, KPIs)
  • Tactical / research report
5.0 Data Governance, Quality, and Controls — 14% of exam
Objective 5.1
Summarize important data governance concepts.
  • Access requirements
  • Role-based
  • User group-based
  • Data use agreements
  • Release approvals
  • Security requirements
  • Data encryption
  • Data transmission
  • De-identify data / data masking
  • Storage environment requirements
  • Shared drive vs. cloud based vs. local storage
  • Use requirements
  • Acceptable use policy
  • Data processing — Data deletion, Data retention
  • Entity relationship requirements
  • Record link restrictions, Data constraints, Cardinality
  • Data classification
  • Personally identifiable information (PII)
  • Personal health information (PHI)
  • Payment card industry (PCI)
  • Jurisdiction requirements
  • Impact of industry and governmental regulations
  • Data breach reporting — Escalate to appropriate authority
Objective 5.2
Given a scenario, apply data quality control concepts.
  • Circumstances to check for quality
  • Data acquisition / data source
  • Data transformation / intrahops: Pass through, Conversion
  • Data manipulation
  • Final product (report / dashboard)
  • Automated validation
  • Data field to data type validation
  • Number of data points
  • Data quality dimensions
  • Data consistency
  • Data accuracy
  • Data completeness
  • Data integrity
  • Data attribute limitations
  • Data quality rule and metrics
  • Conformity, Non-conformity, Rows passed, Rows failed
  • Methods to validate quality
  • Cross-validation, Sample/spot check
  • Reasonable expectations, Data profiling, Data audits
Objective 5.3
Explain master data management (MDM) concepts.
  • Processes
  • Consolidation of multiple data fields
  • Standardization of data field names
  • Data dictionary
  • Compliance with policies and regulations
  • Streamline data access
  • Circumstances for MDM
  • Mergers and acquisitions

📋 Exam Format

  • 90 questions total
  • 90 minutes to complete
  • Passing score: 675 (scale 100–900)
  • Performance-based assessment format
  • Multiple-choice, fill-in-the-blank
  • Multiple-response, drag-and-drop
  • Image-based problems

💡 Exam Tips

  • CompTIA includes vague questions — use logic
  • Some questions may have two correct answers
  • Item seeding: some questions are unscored
  • 18–24 months hands-on experience recommended
  • Focus on scenario-based learning
  • Review all 5 domains proportionally
  • Data Mining (25%) is the heaviest domain
Chapter 1

Today's Data Analyst

Introduction to the world of analytics — the roles, tools, processes, and techniques that define modern data work.

🎯 Exam Objectives Covered

  • Domain 1.0 — Data Concepts & Environments (15%)
  • Domain 2.0 — Data Mining (25%)
  • Domain 3.0 — Data Analysis (23%)
  • Domain 4.0 — Visualization (23%)
  • Domain 5.0 — Data Governance, Quality & Controls (14%)
  • This chapter introduces all 5 domains at a high level

📖 Key Topics

Topic 1 of 7
Loading topics…
0 of 7 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

📌 A retail company wants to understand why sales dipped last quarter. Which analytics type applies — descriptive, predictive, or prescriptive?
✅ AnswerDescriptive analytics — it describes what happened (sales drop) and examines the past. It does not predict or recommend action.
📌 A logistics company wants to recommend optimal shipping routes to minimise delivery time and cost. Which analytics type is this?
✅ AnswerPrescriptive analytics — it recommends the best action to take (which route to use). It combines predictions and optimisation algorithms to produce actionable recommendations.
📌 A bank wants to identify which loan applicants are likely to default in the next 12 months. Which analytics type applies?
✅ AnswerPredictive analytics — it uses historical data and models to forecast a future outcome (probability of default). It does not recommend action, only predicts likelihood.
📌 List the 5 steps of the analytics process in order.
✅ Answer1. Data Acquisition → 2. Cleaning & Manipulation → 3. Analysis → 4. Visualization → 5. Reporting & Communication. This sequence appears directly on the CompTIA Data+ exam.
📌 What is the difference between a Data Analyst and a Data Scientist?
✅ AnswerA Data Analyst focuses on describing and interpreting existing data through reporting, dashboards, and SQL queries. A Data Scientist builds predictive and statistical models using machine learning algorithms, typically with Python or R.
📌 A streaming music service suggests songs a user might like based on their listening history. Which analytics type drives this feature?
✅ AnswerPrescriptive analytics (recommendation engine) — it recommends specific songs to listen to. This is built on predictive models that estimate what the user will enjoy, then prescribes what to play next.
📌 A marketing manager receives a PDF report every Monday showing last week's campaign KPIs. What type of report is this?
✅ AnswerA recurring static report — it is generated on a fixed weekly schedule and delivered in a non-interactive format (PDF). It is descriptive in nature, summarising historical performance.
📌 Which analytics tool would be most appropriate for a business user who needs to build an interactive dashboard to share with executives without coding?
✅ AnswerA Business Intelligence platform like Tableau or Microsoft Power BI — they provide drag-and-drop dashboard creation, interactive filters, and publishing capabilities without requiring programming skills.

🧪 Practice Labs

🔢 Put the Analytics Process in Order

Click & Order

Use the ↑ ↓ buttons to arrange the five steps in the correct sequence.

🎯 Match Analytics Types to Scenarios

Click & Match

Click a card to select it (it highlights), then click the correct category zone to place it.

🃏 Flashcards — Click to Flip

1 / 10
Term
Descriptive Analytics
Definition
Analyzes historical data to understand what has happened in the past.

Quick Quiz

After each run you can review highlighted topics (below your score) and use Retake quiz for another pass.

📝 My Study Notes

Chapter 2

Understanding Data

Explores data types, structures, and file formats — the foundational vocabulary every data analyst must know.

🎯 Exam Objectives Covered

  • Obj 1.2 — Compare and contrast different data types
  • Date, Numeric, Alphanumeric, Currency, Text, Discrete, Continuous, Categorical
  • Obj 1.3 — Compare and contrast common data structures and file formats
  • Structured, Semi-structured, Unstructured; CSV, JSON, XML, Parquet, XLSX

📖 Key Topics

Topic 1 of 8
Loading topics…
0 of 8 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

📌 A dataset has columns: CustomerID (integer), Name (text), Email (text), SignupDate (date), OrderCount (integer). Classify each by data type.
✅ AnswerCustomerID: Numeric (discrete/integer). Name: Alphanumeric/Text. Email: Alphanumeric/Text. SignupDate: Date/Time. OrderCount: Numeric (discrete/integer). All are structured data in defined columns.
📌 You receive IoT sensor logs as pipe-delimited text files: "2024-01-15|23.7|OK". What data structure type is this?
✅ AnswerStructured data in a flat/delimited file format. Each record has a consistent structure (timestamp | value | status) with a known delimiter. It is not truly unstructured — it has implicit structure, even without a formal schema.
📌 You need to exchange data between two web APIs. Which file format is most appropriate and why?
✅ AnswerJSON — it is the standard for REST API data exchange. It supports nested structures, is human-readable, lightweight, and natively understood by all modern programming languages and web frameworks.
📌 Is "Number of website visits per day" discrete or continuous? What about "Page load time in seconds"?
✅ AnswerVisits per day: Discrete — you cannot have 3.7 visits; it is a whole number count. Page load time: Continuous — it can be 1.237 seconds, 2.5 seconds, any decimal value within a range.
📌 A data warehouse stores sales data in a fact table linked to Product, Store, Date, and Customer dimension tables. What schema design is this?
✅ AnswerStar schema — one central fact table (sales transactions with numeric measures) surrounded by dimension tables (Product, Store, Date, Customer). This is the standard design for OLAP/data warehouse analytics.
📌 What is the difference between strong typing and weak typing in databases?
✅ AnswerStrong typing: the database strictly enforces data types — inserting text into a numeric column fails with an error. Weak typing: the system auto-coerces types, potentially allowing "5" (string) + 3 (integer) = 8. Relational databases are strongly typed; spreadsheets often use weak typing.
📌 A big data platform stores 10 TB of raw log files, images, video, and JSON from 50 different systems. What type of storage system fits best?
✅ AnswerA data lake — it stores raw data in native format (structured, semi-structured, and unstructured) without requiring a predefined schema. It is cost-effective for large volumes and supports schema-on-read when analysis is needed later.
📌 A "Customer Satisfaction Rating" field contains values: Excellent, Good, Fair, Poor. What type of data is this?
✅ AnswerOrdinal qualitative data — it is categorical (labels, not numbers) with a meaningful rank order (Excellent > Good > Fair > Poor). The gaps between categories are not necessarily equal, so arithmetic (calculating an average rating) is statistically inappropriate.

🧪 Practice Labs

📓 Jupyter Notebook Available

Download and run hands-on Python exercises for this chapter in Jupyter.

🗂️ Classify These Data Examples

Click & Match

Drag each item to the correct data structure category (Structured, Semi-structured, or Unstructured).

✏️ Fill in the Data Type Definitions

Fill in Blank

Type the correct CompTIA data type term for each definition.

🐍 Python: Exploring Data Types

Code Lab

Run Python cells to explore data types. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10
Term
Structured Data
Definition
Data organized in a defined schema with rows and columns — typically stored in relational databases or spreadsheets. Easy to query with SQL.

Quick Quiz

After each run you can review highlighted topics (below your score) and use Retake quiz for another pass.

📝 My Study Notes

Chapter 3

Databases & Data Acquisition

Covers the relational model, non-relational databases, OLTP/OLAP patterns, ETL/ELT, and core SQL operations.

🎯 Exam Objectives Covered

  • Obj 1.1 — Identify basic concepts of data schemas and dimensions (Relational & NoSQL; Data Mart/Warehouse/Lake; OLTP, OLAP; Star & Snowflake)
  • Obj 2.1 — Explain data acquisition concepts (ETL, ELT, Delta load, APIs, web scraping, surveys, sampling)
  • Obj 2.4 — Data manipulation & query optimization (filtering, sorting, aggregate functions, indexing, execution plans)

📖 Key Topics

Topic 1 of 8
Loading topics…
0 of 8 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

📌 A company runs 24/7 point-of-sale transactions across 500 stores. OLTP or OLAP?
✅ AnswerOLTP — it requires high-speed inserts and updates for individual transactions. The system must handle thousands of concurrent writes. OLAP would be used separately to analyse the accumulated transaction history.
📌 A data team loads daily order changes instead of reloading the entire 5-year history every night. What technique is this?
✅ AnswerDelta load (incremental load) — only records that are new or changed since the last load are processed. This is more efficient than full-refresh loads and reduces pipeline runtime and system load.
📌 Write the SQL structure to retrieve all customers who placed orders in the last 30 days, grouped by customer, sorted by total spend descending.
✅ AnswerSELECT c.name, SUM(o.amount) AS total_spend FROM customers c INNER JOIN orders o ON c.id = o.customer_id WHERE o.order_date >= CURRENT_DATE - 30 GROUP BY c.name ORDER BY total_spend DESC
📌 When would you choose ELT over ETL?
✅ AnswerChoose ELT when using a modern cloud data warehouse (Snowflake, BigQuery, Redshift) that has sufficient compute power to transform data in-place. ELT preserves raw data, enables iterative transformation with dbt, and avoids maintaining a separate transformation server.
📌 A social network stores user profiles with variable fields. What database type fits best?
✅ AnswerA NoSQL document store (like MongoDB) — it handles flexible, variable schemas naturally. Each user document can have different fields without requiring schema alteration or NULL columns for every optional field.
📌 What is the difference between a Data Warehouse and a Data Lake?
✅ AnswerData Warehouse: stores structured, cleaned data with a defined schema; optimised for BI queries; schema-on-write. Data Lake: stores raw data in native format; cheap storage; schema-on-read. Warehouses serve BI; lakes serve data science and exploration.
📌 INNER JOIN vs LEFT JOIN — what is returned by each?
✅ AnswerINNER JOIN: returns only rows where both tables have matching values. LEFT JOIN: returns all rows from the left table plus matched rows from the right — unmatched right-side rows appear as NULL.
📌 A fraud detection system needs to find accounts connected through shared addresses. What database type is optimal?
✅ AnswerA graph database (Neo4j, Amazon Neptune) — it stores relationships as first-class objects and can traverse networks of connections efficiently. Relational databases require complex self-joins for relationship traversal at scale.

🧪 Practice Labs

📓 Jupyter Notebook Available

Download and run hands-on Python exercises for this chapter in Jupyter.

🗄️ Match Database Concepts

Click & Match

Drag each item to its correct database concept category.

⚙️ Put the ETL Steps in Order

Click & Match

Drag the ETL steps into the correct order.

🐍 Python + SQLite

Code Lab

Run Python cells to explore SQLite. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10
Term
ETL
Definition
Extract, Transform, Load — a pipeline that extracts data from sources, transforms it (clean, format, enrich), then loads it into a target data warehouse.

Quick Quiz

After each run you can review highlighted topics (below your score) and use Retake quiz for another pass.

📝 My Study Notes

Chapter 4

Data Quality

Learn to identify data quality challenges and apply manipulation techniques to clean, transform, and validate data.

🎯 Exam Objectives Covered

  • Obj 2.2 — Identify common reasons for cleansing and profiling (duplicate, missing, invalid, outliers, spec mismatch)
  • Obj 2.3 — Execute data manipulation techniques (recoding, imputation, normalization, parsing, deduplication)
  • Obj 5.2 — Apply data quality control concepts (dimensions: accuracy, completeness, consistency, timeliness, uniqueness, validity)

📖 Key Topics

Topic 1 of 7
Loading topics…
0 of 7 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

📌 A dataset has 15% missing values in "Income". You want to preserve the distribution shape. Which imputation method is best?
✅ AnswerMedian imputation — it is robust to outliers and preserves the distribution shape better than mean when data is skewed.
📌 Customer IDs appear multiple times with slightly different name spellings. What quality issue is this?
✅ AnswerDuplicate/redundant data combined with inconsistent data (data entry variation). Deduplication with fuzzy matching is needed.
📌 You need to combine FirstName and LastName into a single FullName field. Which technique?
✅ AnswerConcatenation — joining two string fields into one.
📌 A "Salary" field contains the value "N/A". What type of data quality issue is this?
✅ AnswerInvalid data / data type validation failure — text in a numeric field.
📌 What are the six dimensions of data quality?
✅ AnswerAccuracy, Completeness, Consistency, Timeliness, Uniqueness, Validity.
📌 You want to scale the "Revenue" column to a 0-1 range for use in a machine learning model. Which technique?
✅ AnswerMin-max normalization: (value - min) / (max - min). This ensures no variable dominates due to scale differences.
📌 A "Gender" field contains: M, Male, male, 1, F, Female, female, 0. What technique fixes this?
✅ AnswerRecoding (standardization) — mapping all variants to a single consistent value set (e.g., M and F).
📌 What is the difference between data profiling and data auditing?
✅ AnswerProfiling is exploratory — generating column statistics to discover quality issues. Auditing is systematic and scheduled — checking data against documented business rules and producing formal reports.

🧪 Practice Labs

📓 Jupyter Notebook Available

Download and run hands-on Python exercises for this chapter in Jupyter.

✏️ Data Quality Dimensions

Fill in Blank

Type the correct data quality term for each definition.

🧹 Match Data Issues to Fixes

Click & Match

Drag each data issue to its correct fix or solution.

🐍 Python: Data Cleaning Pipeline

Code Lab

Run Python cells to practice data cleaning. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10
Term
Imputation
Definition
The process of replacing missing data values with substitute estimates, such as the mean, median, mode, or values derived from regression models.

Quick Quiz

After each run you can review highlighted topics (below your score) and use Retake quiz for another pass.

📝 My Study Notes

Chapter 5

Data Analysis & Statistics

Master descriptive and inferential statistics — the mathematical foundation of data analysis and decision-making.

🎯 Exam Objectives Covered

  • Obj 3.1 — Explain the purpose of a variety of statistical methods (measures of central tendency, dispersion, distribution, hypothesis testing)
  • Obj 3.2 — Explain the purpose of data sampling techniques (random, stratified, cluster, systematic, convenience)
  • Obj 3.3 — Explain the purpose of various analysis and reporting techniques (regression, correlation, trend, cohort)

📖 Key Topics

Topic 1 of 8
Loading topics…
0 of 8 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

📌 A dataset of house prices has mean=$450K but median=$320K. What does this tell you?
✅ AnswerThe distribution is right-skewed (positively skewed) — a few very expensive houses pull the mean up significantly above the median. Use median for a typical price.
📌 You conduct an A/B test and get p-value = 0.03 with α=0.05. What do you conclude?
✅ AnswerReject the null hypothesis — the p-value (0.03) is less than the significance level (0.05), indicating the observed difference is statistically significant and unlikely due to chance.
📌 What does a Z-score of -2.5 mean?
✅ AnswerThe value is 2.5 standard deviations below the mean. It is an unusual observation. Values with |Z| > 3 are typically flagged as outliers.
📌 Correlation coefficient r = 0.92 between advertising spend and revenue. Can you conclude advertising causes revenue increases?
✅ AnswerNo — correlation does not imply causation. A confounding variable (e.g., seasonality) may drive both. Only a randomised controlled experiment can establish causation.
📌 What is the difference between Type I and Type II errors?
✅ AnswerType I (false positive): rejecting a true null hypothesis — detecting an effect that doesn't exist. Type II (false negative): failing to reject a false null — missing a real effect. α controls Type I error rate.
📌 You want to survey 1,000 customers from 5 regions ensuring proportional representation from each. Which sampling method?
✅ AnswerStratified sampling — divide the population into strata (regions) and sample proportionally from each. Ensures all groups are represented.
📌 A company tracks monthly revenue for 3 years. What analysis technique identifies a long-term upward trend vs. seasonal fluctuations?
✅ AnswerTime series decomposition — separating the series into Trend, Seasonality, Cyclical, and Residual components. Moving averages smooth out seasonal and irregular components to reveal the trend.
📌 R² = 0.85 for a regression model predicting sales from advertising. What does this mean?
✅ AnswerThe model explains 85% of the variance in sales. 15% of variance remains unexplained by the model. R² = 1 would be a perfect fit; R² = 0 means the model explains nothing.

🧪 Practice Labs

📓 Jupyter Notebook Available

Download and run hands-on Python exercises for this chapter in Jupyter.

✏️ Statistics Formulas & Concepts

Fill in Blank

Type the correct statistics term for each definition or formula.

🐍 Python: Statistics from Scratch

Code Lab

Run Python cells to compute statistics from scratch. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10
Term
Null Hypothesis (H₀)
Definition
A statement assuming no effect, no difference, or no relationship between variables — the default position that must be disproven through evidence.

Quick Quiz

After each run you can review highlighted topics (below your score) and use Retake quiz for another pass.

📝 My Study Notes

Chapter 6

Data Analytics Tools

Survey the landscape of tools — from spreadsheets and programming languages to BI suites and ML platforms.

🎯 Exam Objectives Covered

  • Obj 4.1 — Given a scenario, translate business requirements to form a report (report types, delivery methods, design elements)
  • Obj 4.2 — Given a scenario, use appropriate design components for reports and dashboards (charts, KPIs, filters)
  • Obj 4.3 — Given a scenario, use appropriate methods for dashboard development workflow (wireframe, mock-up, prototyping)

📖 Key Topics

Topic 1 of 7
Loading topics…
0 of 7 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

📌 A business user wants to explore sales data by region, product, and time period without asking IT each time. What tool category fits best?
✅ AnswerA self-service BI platform (Tableau, Power BI) — provides interactive filters, drill-down, and slicers that let users explore data independently without technical skills.
📌 You need to build a machine learning model to predict customer churn, handling feature engineering, model training, and evaluation. Which tool is most appropriate?
✅ AnswerPython with pandas and scikit-learn — provides full programmatic control for complex ML pipelines, feature engineering, cross-validation, and model evaluation.
📌 A pharmaceutical company needs to run clinical trial statistical analysis that regulators will accept. Which tool?
✅ AnswerSAS — it is the gold standard for regulated industries, FDA-accepted for clinical trial analysis, and has built-in audit trails for regulatory compliance.
📌 A data analyst needs to query a Snowflake data warehouse directly. What interface do they use?
✅ AnswerSQL — either via Snowflake's web UI, a SQL client (DBeaver), or a BI tool that connects directly to Snowflake using a JDBC/ODBC connector.
📌 What is the difference between Python and R for data analysis?
✅ AnswerPython: general-purpose programming language with excellent data analysis libraries (pandas, scikit-learn); better for ML, automation, web scraping, and production deployment. R: statistics-first language with extensive academic statistical packages; better for advanced statistics, academic research, and publication-quality charts.
📌 When should you use Excel vs. Python for data analysis?
✅ AnswerExcel: best for small-medium datasets (<1M rows), ad hoc analysis, sharing with non-technical users, and quick calculations. Python: best for large datasets, reproducible pipelines, automation, machine learning, and processing data that changes regularly.
📌 A company wants to build cloud analytics that automatically scales with query demand and charges only per query. Which service fits?
✅ AnswerA serverless query service — AWS Athena (queries S3 data) or Google BigQuery. These charge per TB scanned and require no infrastructure management.

🧪 Practice Labs

📓 Jupyter Notebook Available

Download and run hands-on Python exercises for this chapter in Jupyter.

🛠️ Match Tools to Their Use Case

Click & Match

Drag each analytics tool to its correct use case.

🐍 Python Analytics Pipeline

Code Lab

Run Python cells to build an analytics pipeline. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10
Term
tidyverse
Definition
A collection of R packages (including ggplot2, dplyr, tidyr) designed to facilitate data manipulation and visualization using consistent, readable syntax.

Quick Quiz

After each run you can review highlighted topics (below your score) and use Retake quiz for another pass.

📝 My Study Notes

Chapter 7

Data Visualization with Reports & Dashboards

Learn how to translate business requirements into compelling visualizations, reports, and interactive dashboards.

🎯 Exam Objectives Covered

  • Obj 4.1 — Report types, layouts, delivery (static, interactive, ad hoc, recurring, pixel-perfect)
  • Obj 4.2 — Design components: appropriate chart types, KPIs, conditional formatting, reference lines
  • Obj 4.3 — Dashboard workflow: wireframe, mock-up, prototype, stakeholder review, data story

📖 Key Topics

Topic 1 of 8
Loading topics…
0 of 8 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

📌 Monthly website traffic over the past 2 years. Which chart type?
✅ AnswerLine chart — it shows a continuous trend over time with many data points. Each point connects to the next, making rate of change and trend direction immediately visible.
📌 Revenue breakdown by product category (6 categories). Which chart?
✅ AnswerHorizontal bar chart — comparing discrete categories. Bars are easier to compare than pie slices, especially with 6+ categories. Sort by value descending for maximum clarity.
📌 Show how gross revenue becomes net profit through a series of additions and subtractions (returns, COGS, operating expenses). Which chart?
✅ AnswerWaterfall chart — each bar "floats" at the level where the previous value ended, making the contribution of each factor intuitively clear.
📌 Compare the distribution of exam scores across three different training cohorts. Which chart?
✅ AnswerBox plot — shows min, Q1, median, Q3, max for each group simultaneously, enabling direct comparison of distributions including outliers.
📌 You have 20 KPIs to display for an executive. A user asks you to put them all on one dashboard. What should you advise?
✅ AnswerRecommend limiting to 5-7 most critical KPIs per dashboard view. Too many KPIs create cognitive overload. Use drill-down to secondary dashboards for additional metrics. Prioritise the metrics most directly linked to executive decisions.
📌 A dashboard filter lets users select time period, region, and product category simultaneously. What is this feature called?
✅ AnswerA slicer (or filter panel) — interactive controls that allow users to dynamically subset the data displayed in all connected charts on the dashboard simultaneously.
📌 Relationship between 50 products' advertising spend (X axis) and revenue (Y axis). Which chart?
✅ AnswerScatter plot — each product is one point at coordinates (ad spend, revenue). A trend line can be added to show the direction and strength of correlation.
📌 What is the difference between a static report, an interactive report, and an ad hoc report?
✅ AnswerStatic: fixed PDF/printout, no interaction. Interactive: live dashboard with filters and drill-down (Tableau, Power BI). Ad hoc: custom one-time report built by an analyst to answer a specific question not covered by existing reports.

🧪 Practice Labs

📓 Jupyter Notebook Available

Download and run hands-on Python exercises for this chapter in Jupyter.

📊 Pick the Right Chart

Click & Match

Drag each scenario to the chart type that best represents it.

✏️ Dashboard & Report Vocabulary

Fill in Blank

Type the correct visualization or reporting term for each definition.

🐍 Python: Chart Logic

Code Lab

Run Python cells to practice chart selection logic. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10
Term
Waterfall Chart
Definition
A visualization that shows the cumulative effect of sequentially introduced positive or negative values — ideal for financial analysis (e.g., profit/loss breakdown).

Quick Quiz

After each run you can review highlighted topics (below your score) and use Retake quiz for another pass.

📝 My Study Notes

Chapter 8

Data Governance

Understand the policies, roles, and frameworks that ensure data is secure, compliant, and used appropriately.

🎯 Exam Objectives Covered

  • Obj 5.1 — Summarise important data governance concepts (roles, data classification, policies, master data management)
  • Obj 5.2 — Apply data quality control concepts (validation, profiling, auditing, data lineage)
  • Obj 5.3 — Explain master data management (MDM) concepts (golden record, entities, workflows)

📖 Key Topics

Topic 1 of 7
Loading topics…
0 of 7 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

📌 A business executive is accountable for a customer dataset — approving who can access it and setting the retention policy. What is their governance role?
✅ AnswerData Owner — accountable for the data's use, access approvals, and lifecycle policies. They don't manage data day-to-day but are ultimately responsible.
📌 A database administrator manages backups, storage systems, and security controls for a dataset. What is their role?
✅ AnswerData Custodian — responsible for the physical infrastructure (storage, backups, security). Not accountable for data content or quality.
📌 Your organisation must notify customers within a specific timeframe after a personal data breach. Which regulation mandates this?
✅ AnswerGDPR (72 hours to notify the supervisory authority, then without undue delay for affected individuals) applies in the EU. HIPAA requires notification within 60 days for US healthcare breaches.
📌 A developer needs to test an application using realistic customer data without exposing real PII. What technique should be used?
✅ AnswerData masking — replace real PII (names, emails, SSNs) with fictitious but realistic values. The data structure and format are preserved but real values are protected.
📌 A company stores customer records in 5 different systems, all with slightly different versions of the same customer. What governance practice creates a single authoritative record?
✅ AnswerMaster Data Management (MDM) — the process of matching, merging, and creating a "golden record" that becomes the single source of truth for each customer entity across all systems.
📌 "Financial reporting data must be retained for 7 years then securely disposed of." What governance policy area is this?
✅ AnswerData retention and disposal policy — defines how long each data type must be kept (driven by legal/regulatory requirements) and how it must be destroyed at the end of its lifecycle.
📌 A financial analyst exports a spreadsheet of customer PII to their personal email to work from home. A governance tool detects and blocks this. What technology is this?
✅ AnswerData Loss Prevention (DLP) — monitors data flows and blocks or alerts on potential exfiltration of sensitive data to unauthorised destinations.
📌 What is the difference between data anonymisation and pseudonymisation under GDPR?
✅ AnswerAnonymisation: irreversibly removes all identifying information — GDPR no longer applies (not personal data). Pseudonymisation: replaces identifiers with pseudonyms (hashes) but can be reversed with the key — GDPR still applies (still personal data).

🧪 Practice Labs

📓 Jupyter Notebook Available

Download and run hands-on Python exercises for this chapter in Jupyter.

🔒 Governance Roles & Responsibilities

Click & Match

Drag each governance role to its correct responsibility.

✏️ Data Governance Vocabulary

Fill in Blank

Type the correct data governance term for each definition.

🐍 Python: Governance in Practice

Code Lab

Run Python cells to explore governance concepts in code. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10
Term
Data Steward
Definition
The role responsible for leading an organization's data governance activities, ensuring data quality, security, privacy, and regulatory compliance across the enterprise.

Quick Quiz

After each run you can review highlighted topics (below your score) and use Retake quiz for another pass.

📝 My Study Notes

📦 Question Bank

Instructor question management & import pipeline

Published
Drafts
Pending Review
Failed Jobs

Recent Import Jobs

Loading…

⬆ Import Questions

Upload a PDF, Word document, or scanned exam sheet

📄

Drag & drop a file here, or

Accepted: PDF, DOCX, PNG, JPG, TIFF · Max 50 MB

🔬 Review Studio

Reviewing import job

No items loaded.

Select a question from the list to edit it.

📝 Draft Queue

Questions pending review across all import jobs

Loading…

🔍 Browse Question Bank

Search, filter, edit, and manage published questions

Loading…