Dashboard

Practice Quiz

258 real exam-style questions from the official question bank. Choose your topics, set the count, and start quizzing.

Questions
258
Topics
8
Multi-Answer
14

🎓 Choose Mode

🎯 Select Topics

Choose one or more topics, or leave all unselected to include every topic.

⚙️ Quiz Options

Order
Number of Questions
max: 258

Terminology Reference

Plain-language definitions for every key term across all 5 exam domains. Click a domain to expand, or search for any term.

🔍

Official Acronym Flashcards

All 26 acronyms from the official CompTIA DA0-001 exam objectives. Master every one before exam day.

Total
26
Source
Official

🃏 Acronym Flashcards — Click to Flip

1 / 26
Acronym
ETL
Stands For
Extract, Transform, Load

📋 Full Acronym Reference

Quick-Match Quiz — Acronyms

Chapter Notes

Your personal notes for this chapter

📝 My Notes

CompTIA Data+ Study Buddy

Your all-in-one study companion for the DA0-001 exam. Work through all 8 chapters with topics, flashcards, quizzes, and personal notes.

📚
8
Total Chapters
🃏
40+
Flashcards
0
Chapters Studied
🎯
Quiz Score
All Chapters
Ch 1Today's Data Analyst
Introduction to analytics roles, processes, and the data landscape.
Analytics ProcessDescriptivePredictiveAI/ML
Ch 2Understanding Data
Data types, structures, and common file formats used in analytics.
Structured DataJSONXMLData Types
Ch 3Databases & Data Acquisition
Relational models, NoSQL, OLTP/OLAP, ETL/ELT, and SQL fundamentals.
SQLETLOLAPNoSQL
Ch 4Data Quality
Identifying and resolving data quality issues; manipulation techniques.
ImputationNormalizationOutliersValidation
Ch 5Data Analysis & Statistics
Descriptive & inferential statistics, hypothesis testing, regression.
Hypothesis TestingRegressionMean/Median
Ch 6Data Analytics Tools
Spreadsheets, programming languages, BI suites, and ML platforms.
PythonRPower BITableau
Ch 7Visualization & Dashboards
Report design, dashboard development, and visualization best practices.
ChartsDashboardsInfographicsKPIs
Ch 8Data Governance
Governance roles, access control, data classification, and MDM.
Data StewardPIIMDMCompliance

Exam Domains & Weightings

The DA0-001 exam covers five domains. Understanding the weight of each domain helps you prioritize your study time.

🎯 Domain Weightings

1.0 Data Concepts & Environments
15%
2.0 Data Mining
25%
3.0 Data Analysis
23%
4.0 Visualization
23%
5.0 Data Governance, Quality & Controls
14%
Exam Objectives — Click Any Domain to Expand
1.0 Data Concepts and Environments — 15% of exam
Objective 1.1
Identify basic concepts of data schemas and dimensions.
  • Databases
  • Relational
  • Non-relational
  • Data mart / data warehousing / data lake
  • Online transactional processing (OLTP)
  • Online analytical processing (OLAP)
  • Schema concepts
  • Snowflake
  • Star
  • Slowly changing dimensions
  • Keep current information
  • Keep historical and current information
Objective 1.2
Compare and contrast different data types.
  • Date
  • Numeric
  • Alphanumeric
  • Currency
  • Text
  • Discrete vs. continuous
  • Categorical / dimension
  • Images
  • Audio
  • Video
Objective 1.3
Compare and contrast common data structures and file formats.
  • Structures
  • Structured — defined rows/columns; key-value pairs
  • Unstructured — undefined fields; machine data
  • Data file formats
  • Text/Flat file — Tab delimited, Comma delimited
  • JavaScript Object Notation (JSON)
  • Extensible Markup Language (XML)
  • Hypertext Markup Language (HTML)
2.0 Data Mining — 25% of exam
Objective 2.1
Explain data acquisition concepts.
  • Integration
  • Extract, transform, load (ETL)
  • Extract, load, transform (ELT)
  • Delta load
  • Application programming interfaces (APIs)
  • Data collection methods
  • Web scraping
  • Public databases
  • Application programming interface (API) / web services
  • Survey
  • Sampling
  • Observation
Objective 2.2
Identify common reasons for cleansing and profiling datasets.
  • Duplicate data
  • Redundant data
  • Missing values
  • Invalid data
  • Non-parametric data
  • Data outliers
  • Specification mismatch
  • Data type validation
Objective 2.3
Given a scenario, execute data manipulation techniques.
  • Recoding data
  • Numeric
  • Categorical
  • Derived variables
  • Data merge
  • Data blending
  • Concatenation
  • Data append
  • Imputation
  • Reduction / aggregation
  • Transpose
  • Normalize data
  • Parsing / string manipulation
Objective 2.4
Explain common techniques for data manipulation and query optimization.
  • Data manipulation
  • Filtering
  • Sorting
  • Date functions
  • Logical functions
  • Aggregate functions
  • System functions
  • Query optimization
  • Parametrization
  • Indexing
  • Temporary table in the query set
  • Subset of records
  • Execution plan
3.0 Data Analysis — 23% of exam
Objective 3.1
Given a scenario, apply the appropriate descriptive statistical methods.
  • Measures of central tendency
  • Mean
  • Median
  • Mode
  • Distribution
  • Measures of dispersion
  • Range, Max, Min
  • Variance
  • Standard deviation
  • Frequencies / percentages
  • Percent change
  • Percent difference
  • Confidence intervals
Objective 3.2
Explain the purpose of inferential statistical methods.
  • t-tests
  • Z-score
  • p-values
  • Chi-squared
  • Hypothesis testing
  • Type I error
  • Type II error
  • Simple linear regression
  • Correlation
Objective 3.3
Summarize types of analysis and key analysis techniques.
  • Process to determine type of analysis
  • Review / refine business questions
  • Determine data needs and sources
  • Scoping / gap analysis
  • Type of analysis
  • Trend analysis
  • Comparison of data over time
  • Performance analysis
  • Tracking measurements against defined goals
  • Basic projections to achieve goals
  • Link analysis — connection of data points or pathway
  • Exploratory data analysis
  • Use of descriptive statistics to determine observations
Objective 3.4
Identify common data analytics tools.

Note: The intent of this objective is NOT to test specific vendor feature sets nor the purposes of the tools.

  • Structured Query Language (SQL)
  • Python
  • Microsoft Excel
  • R
  • RapidMiner
  • IBM Cognos
  • IBM SPSS Modeler
  • IBM SPSS
  • SAS
  • Tableau
  • Power BI
  • Qlik
  • MicroStrategy
  • BusinessObjects
  • Apex, Dataroma, Domo, AWS QuickSight, Stata, Minitab
4.0 Visualization — 23% of exam
Objective 4.1
Given a scenario, translate business requirements to form a report.
  • Data content
  • Filtering
  • Views
  • Date range
  • Frequency
  • Audience for report
  • Distribution list
Objective 4.2
Given a scenario, use appropriate design components for reports and dashboards.
  • Report cover page
  • Instructions
  • Summary
  • Observations and insights
  • Design elements
  • Color schemes, Layout
  • Font size and style
  • Key chart elements: Titles, Labels, Legends
  • Corporate reporting standards / style guide
  • Branding, Color codes, Logos/trademarks, Watermark
  • Report cover page (cont.)
  • Executive summary
  • FAQs
  • Appendix
  • Documentation elements
  • Version number
  • Reference data sources
  • Reference dates
  • Report run date
  • Data refresh date
Objective 4.3
Given a scenario, use appropriate methods for dashboard development.
  • Dashboard considerations
  • Data sources and attributes, Field definitions
  • Dimensions and Measures
  • Continuous/live data feed vs. static data
  • Consumer types: C-level executives, Management, External vendors/stakeholders, General public, Technical experts
  • Development process
  • Mockup/wireframe → Layout/presentation
  • Flow/navigation → Data story planning
  • Approval granted → Develop dashboard → Deploy to production
  • Delivery considerations
  • Subscription, Scheduled delivery
  • Interactive (drill down / roll up)
  • Saved searches, Filtering
  • Static, Web interface
  • Dashboard optimization
  • Access permissions
Objective 4.4
Given a scenario, apply the appropriate type of visualization.
  • Line chart
  • Pie chart
  • Bubble chart
  • Scatter plot
  • Bar chart
  • Histogram
  • Waterfall
  • Heat map
  • Geographic map
  • Tree map
  • Stacked chart
  • Infographic
  • Word cloud
Objective 4.5
Compare and contrast types of reports.
  • Static vs. dynamic reports
  • Point-in-time
  • Real time
  • Ad-hoc / one-time report
  • Self-service / on demand
  • Recurring reports
  • Compliance reports (financial, health, safety)
  • Risk and regulatory reports
  • Operational reports (performance, KPIs)
  • Tactical / research report
5.0 Data Governance, Quality, and Controls — 14% of exam
Objective 5.1
Summarize important data governance concepts.
  • Access requirements
  • Role-based
  • User group-based
  • Data use agreements
  • Release approvals
  • Security requirements
  • Data encryption
  • Data transmission
  • De-identify data / data masking
  • Storage environment requirements
  • Shared drive vs. cloud based vs. local storage
  • Use requirements
  • Acceptable use policy
  • Data processing — Data deletion, Data retention
  • Entity relationship requirements
  • Record link restrictions, Data constraints, Cardinality
  • Data classification
  • Personally identifiable information (PII)
  • Personal health information (PHI)
  • Payment card industry (PCI)
  • Jurisdiction requirements
  • Impact of industry and governmental regulations
  • Data breach reporting — Escalate to appropriate authority
Objective 5.2
Given a scenario, apply data quality control concepts.
  • Circumstances to check for quality
  • Data acquisition / data source
  • Data transformation / intrahops: Pass through, Conversion
  • Data manipulation
  • Final product (report / dashboard)
  • Automated validation
  • Data field to data type validation
  • Number of data points
  • Data quality dimensions
  • Data consistency
  • Data accuracy
  • Data completeness
  • Data integrity
  • Data attribute limitations
  • Data quality rule and metrics
  • Conformity, Non-conformity, Rows passed, Rows failed
  • Methods to validate quality
  • Cross-validation, Sample/spot check
  • Reasonable expectations, Data profiling, Data audits
Objective 5.3
Explain master data management (MDM) concepts.
  • Processes
  • Consolidation of multiple data fields
  • Standardization of data field names
  • Data dictionary
  • Compliance with policies and regulations
  • Streamline data access
  • Circumstances for MDM
  • Mergers and acquisitions

📋 Exam Format

  • 90 questions total
  • 90 minutes to complete
  • Passing score: 675 (scale 100–900)
  • Performance-based assessment format
  • Multiple-choice, fill-in-the-blank
  • Multiple-response, drag-and-drop
  • Image-based problems

💡 Exam Tips

  • CompTIA includes vague questions — use logic
  • Some questions may have two correct answers
  • Item seeding: some questions are unscored
  • 18–24 months hands-on experience recommended
  • Focus on scenario-based learning
  • Review all 5 domains proportionally
  • Data Mining (25%) is the heaviest domain
Chapter 1

Today's Data Analyst

Introduction to the world of analytics — the roles, tools, processes, and techniques that define modern data work.

🎯 Exam Objectives Covered

  • Domain 1.0 — Data Concepts & Environments (15%)
  • Domain 2.0 — Data Mining (25%)
  • Domain 3.0 — Data Analysis (23%)
  • Domain 4.0 — Visualization (23%)
  • Domain 5.0 — Data Governance, Quality & Controls (14%)
  • This chapter introduces all 5 domains at a high level

📖 Key Topics

Topic 1 of 7
Loading topics…
0 of 7 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

📌 A retail company wants to understand why sales dipped last quarter. Which analytics type applies — descriptive, predictive, or prescriptive?
✅ AnswerDescriptive analytics — it describes what happened (sales drop) and examines the past. It does not predict or recommend action.
📌 A logistics company wants to recommend optimal shipping routes to minimise delivery time and cost. Which analytics type is this?
✅ AnswerPrescriptive analytics — it recommends the best action to take (which route to use). It combines predictions and optimisation algorithms to produce actionable recommendations.
📌 A bank wants to identify which loan applicants are likely to default in the next 12 months. Which analytics type applies?
✅ AnswerPredictive analytics — it uses historical data and models to forecast a future outcome (probability of default). It does not recommend action, only predicts likelihood.
📌 List the 5 steps of the analytics process in order.
✅ Answer1. Data Acquisition → 2. Cleaning & Manipulation → 3. Analysis → 4. Visualization → 5. Reporting & Communication. This sequence appears directly on the CompTIA Data+ exam.
📌 What is the difference between a Data Analyst and a Data Scientist?
✅ AnswerA Data Analyst focuses on describing and interpreting existing data through reporting, dashboards, and SQL queries. A Data Scientist builds predictive and statistical models using machine learning algorithms, typically with Python or R.
📌 A streaming music service suggests songs a user might like based on their listening history. Which analytics type drives this feature?
✅ AnswerPrescriptive analytics (recommendation engine) — it recommends specific songs to listen to. This is built on predictive models that estimate what the user will enjoy, then prescribes what to play next.
📌 A marketing manager receives a PDF report every Monday showing last week's campaign KPIs. What type of report is this?
✅ AnswerA recurring static report — it is generated on a fixed weekly schedule and delivered in a non-interactive format (PDF). It is descriptive in nature, summarising historical performance.
📌 Which analytics tool would be most appropriate for a business user who needs to build an interactive dashboard to share with executives without coding?
✅ AnswerA Business Intelligence platform like Tableau or Microsoft Power BI — they provide drag-and-drop dashboard creation, interactive filters, and publishing capabilities without requiring programming skills.

🧪 Practice Labs

🔢 Put the Analytics Process in Order

Click & Order

Use the ↑ ↓ buttons to arrange the five steps in the correct sequence.

🎯 Match Analytics Types to Scenarios

Click & Match

Click a card to select it (it highlights), then click the correct category zone to place it.

🃏 Flashcards — Click to Flip

1 / 10
Term
Descriptive Analytics
Definition
Analyzes historical data to understand what has happened in the past.

Quick Quiz

📝 My Study Notes

Chapter 2

Understanding Data

Explores data types, structures, and file formats — the foundational vocabulary every data analyst must know.

🎯 Exam Objectives Covered

  • Obj 1.2 — Compare and contrast different data types
  • Date, Numeric, Alphanumeric, Currency, Text, Discrete, Continuous, Categorical
  • Obj 1.3 — Compare and contrast common data structures and file formats
  • Structured, Semi-structured, Unstructured; CSV, JSON, XML, Parquet, XLSX

📖 Key Topics

Topic 1 of 8
Loading topics…
0 of 8 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

📌 A dataset has columns: CustomerID (integer), Name (text), Email (text), SignupDate (date), OrderCount (integer). Classify each by data type.
✅ AnswerCustomerID: Numeric (discrete/integer). Name: Alphanumeric/Text. Email: Alphanumeric/Text. SignupDate: Date/Time. OrderCount: Numeric (discrete/integer). All are structured data in defined columns.
📌 You receive IoT sensor logs as pipe-delimited text files: "2024-01-15|23.7|OK". What data structure type is this?
✅ AnswerStructured data in a flat/delimited file format. Each record has a consistent structure (timestamp | value | status) with a known delimiter. It is not truly unstructured — it has implicit structure, even without a formal schema.
📌 You need to exchange data between two web APIs. Which file format is most appropriate and why?
✅ AnswerJSON — it is the standard for REST API data exchange. It supports nested structures, is human-readable, lightweight, and natively understood by all modern programming languages and web frameworks.
📌 Is "Number of website visits per day" discrete or continuous? What about "Page load time in seconds"?
✅ AnswerVisits per day: Discrete — you cannot have 3.7 visits; it is a whole number count. Page load time: Continuous — it can be 1.237 seconds, 2.5 seconds, any decimal value within a range.
📌 A data warehouse stores sales data in a fact table linked to Product, Store, Date, and Customer dimension tables. What schema design is this?
✅ AnswerStar schema — one central fact table (sales transactions with numeric measures) surrounded by dimension tables (Product, Store, Date, Customer). This is the standard design for OLAP/data warehouse analytics.
📌 What is the difference between strong typing and weak typing in databases?
✅ AnswerStrong typing: the database strictly enforces data types — inserting text into a numeric column fails with an error. Weak typing: the system auto-coerces types, potentially allowing "5" (string) + 3 (integer) = 8. Relational databases are strongly typed; spreadsheets often use weak typing.
📌 A big data platform stores 10 TB of raw log files, images, video, and JSON from 50 different systems. What type of storage system fits best?
✅ AnswerA data lake — it stores raw data in native format (structured, semi-structured, and unstructured) without requiring a predefined schema. It is cost-effective for large volumes and supports schema-on-read when analysis is needed later.
📌 A "Customer Satisfaction Rating" field contains values: Excellent, Good, Fair, Poor. What type of data is this?
✅ AnswerOrdinal qualitative data — it is categorical (labels, not numbers) with a meaningful rank order (Excellent > Good > Fair > Poor). The gaps between categories are not necessarily equal, so arithmetic (calculating an average rating) is statistically inappropriate.

🧪 Practice Labs

📓 Jupyter Notebook Available

Download and run hands-on Python exercises for this chapter in Jupyter.

🗂️ Classify These Data Examples

Click & Match

Drag each item to the correct data structure category (Structured, Semi-structured, or Unstructured).

✏️ Fill in the Data Type Definitions

Fill in Blank

Type the correct CompTIA data type term for each definition.

🐍 Python: Exploring Data Types

Code Lab

Run Python cells to explore data types. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10
Term
Structured Data
Definition
Data organized in a defined schema with rows and columns — typically stored in relational databases or spreadsheets. Easy to query with SQL.

Quick Quiz

📝 My Study Notes

Chapter 3

Databases & Data Acquisition

Covers the relational model, non-relational databases, OLTP/OLAP patterns, ETL/ELT, and core SQL operations.

🎯 Exam Objectives Covered

  • Obj 1.1 — Identify basic concepts of data schemas and dimensions (Relational & NoSQL; Data Mart/Warehouse/Lake; OLTP, OLAP; Star & Snowflake)
  • Obj 2.1 — Explain data acquisition concepts (ETL, ELT, Delta load, APIs, web scraping, surveys, sampling)
  • Obj 2.4 — Data manipulation & query optimization (filtering, sorting, aggregate functions, indexing, execution plans)

📖 Key Topics

Topic 1 of 8
Loading topics…
0 of 8 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

📌 A company runs 24/7 point-of-sale transactions across 500 stores. OLTP or OLAP?
✅ AnswerOLTP — it requires high-speed inserts and updates for individual transactions. The system must handle thousands of concurrent writes. OLAP would be used separately to analyse the accumulated transaction history.
📌 A data team loads daily order changes instead of reloading the entire 5-year history every night. What technique is this?
✅ AnswerDelta load (incremental load) — only records that are new or changed since the last load are processed. This is more efficient than full-refresh loads and reduces pipeline runtime and system load.
📌 Write the SQL structure to retrieve all customers who placed orders in the last 30 days, grouped by customer, sorted by total spend descending.
✅ AnswerSELECT c.name, SUM(o.amount) AS total_spend FROM customers c INNER JOIN orders o ON c.id = o.customer_id WHERE o.order_date >= CURRENT_DATE - 30 GROUP BY c.name ORDER BY total_spend DESC
📌 When would you choose ELT over ETL?
✅ AnswerChoose ELT when using a modern cloud data warehouse (Snowflake, BigQuery, Redshift) that has sufficient compute power to transform data in-place. ELT preserves raw data, enables iterative transformation with dbt, and avoids maintaining a separate transformation server.
📌 A social network stores user profiles with variable fields. What database type fits best?
✅ AnswerA NoSQL document store (like MongoDB) — it handles flexible, variable schemas naturally. Each user document can have different fields without requiring schema alteration or NULL columns for every optional field.
📌 What is the difference between a Data Warehouse and a Data Lake?
✅ AnswerData Warehouse: stores structured, cleaned data with a defined schema; optimised for BI queries; schema-on-write. Data Lake: stores raw data in native format; cheap storage; schema-on-read. Warehouses serve BI; lakes serve data science and exploration.
📌 INNER JOIN vs LEFT JOIN — what is returned by each?
✅ AnswerINNER JOIN: returns only rows where both tables have matching values. LEFT JOIN: returns all rows from the left table plus matched rows from the right — unmatched right-side rows appear as NULL.
📌 A fraud detection system needs to find accounts connected through shared addresses. What database type is optimal?
✅ AnswerA graph database (Neo4j, Amazon Neptune) — it stores relationships as first-class objects and can traverse networks of connections efficiently. Relational databases require complex self-joins for relationship traversal at scale.

🧪 Practice Labs

📓 Jupyter Notebook Available

Download and run hands-on Python exercises for this chapter in Jupyter.

🗄️ Match Database Concepts

Click & Match

Drag each item to its correct database concept category.

⚙️ Put the ETL Steps in Order

Click & Match

Drag the ETL steps into the correct order.

🐍 Python + SQLite

Code Lab

Run Python cells to explore SQLite. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10
Term
ETL
Definition
Extract, Transform, Load — a pipeline that extracts data from sources, transforms it (clean, format, enrich), then loads it into a target data warehouse.

Quick Quiz

📝 My Study Notes

Chapter 4

Data Quality

Learn to identify data quality challenges and apply manipulation techniques to clean, transform, and validate data.

🎯 Exam Objectives Covered

  • Obj 2.2 — Identify common reasons for cleansing and profiling (duplicate, missing, invalid, outliers, spec mismatch)
  • Obj 2.3 — Execute data manipulation techniques (recoding, imputation, normalization, parsing, deduplication)
  • Obj 5.2 — Apply data quality control concepts (dimensions: accuracy, completeness, consistency, timeliness, uniqueness, validity)

📖 Key Topics

Topic 1 of 7
Loading topics…
0 of 7 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

📌 A dataset has 15% missing values in "Income". You want to preserve the distribution shape. Which imputation method is best?
✅ AnswerMedian imputation — it is robust to outliers and preserves the distribution shape better than mean when data is skewed.
📌 Customer IDs appear multiple times with slightly different name spellings. What quality issue is this?
✅ AnswerDuplicate/redundant data combined with inconsistent data (data entry variation). Deduplication with fuzzy matching is needed.
📌 You need to combine FirstName and LastName into a single FullName field. Which technique?
✅ AnswerConcatenation — joining two string fields into one.
📌 A "Salary" field contains the value "N/A". What type of data quality issue is this?
✅ AnswerInvalid data / data type validation failure — text in a numeric field.
📌 What are the six dimensions of data quality?
✅ AnswerAccuracy, Completeness, Consistency, Timeliness, Uniqueness, Validity.
📌 You want to scale the "Revenue" column to a 0-1 range for use in a machine learning model. Which technique?
✅ AnswerMin-max normalization: (value - min) / (max - min). This ensures no variable dominates due to scale differences.
📌 A "Gender" field contains: M, Male, male, 1, F, Female, female, 0. What technique fixes this?
✅ AnswerRecoding (standardization) — mapping all variants to a single consistent value set (e.g., M and F).
📌 What is the difference between data profiling and data auditing?
✅ AnswerProfiling is exploratory — generating column statistics to discover quality issues. Auditing is systematic and scheduled — checking data against documented business rules and producing formal reports.

🧪 Practice Labs

📓 Jupyter Notebook Available

Download and run hands-on Python exercises for this chapter in Jupyter.

✏️ Data Quality Dimensions

Fill in Blank

Type the correct data quality term for each definition.

🧹 Match Data Issues to Fixes

Click & Match

Drag each data issue to its correct fix or solution.

🐍 Python: Data Cleaning Pipeline

Code Lab

Run Python cells to practice data cleaning. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10
Term
Imputation
Definition
The process of replacing missing data values with substitute estimates, such as the mean, median, mode, or values derived from regression models.

Quick Quiz

📝 My Study Notes

Chapter 5

Data Analysis & Statistics

Master descriptive and inferential statistics — the mathematical foundation of data analysis and decision-making.

🎯 Exam Objectives Covered

  • Obj 3.1 — Explain the purpose of a variety of statistical methods (measures of central tendency, dispersion, distribution, hypothesis testing)
  • Obj 3.2 — Explain the purpose of data sampling techniques (random, stratified, cluster, systematic, convenience)
  • Obj 3.3 — Explain the purpose of various analysis and reporting techniques (regression, correlation, trend, cohort)

📖 Key Topics

Topic 1 of 8
Loading topics…
0 of 8 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

📌 A dataset of house prices has mean=$450K but median=$320K. What does this tell you?
✅ AnswerThe distribution is right-skewed (positively skewed) — a few very expensive houses pull the mean up significantly above the median. Use median for a typical price.
📌 You conduct an A/B test and get p-value = 0.03 with α=0.05. What do you conclude?
✅ AnswerReject the null hypothesis — the p-value (0.03) is less than the significance level (0.05), indicating the observed difference is statistically significant and unlikely due to chance.
📌 What does a Z-score of -2.5 mean?
✅ AnswerThe value is 2.5 standard deviations below the mean. It is an unusual observation. Values with |Z| > 3 are typically flagged as outliers.
📌 Correlation coefficient r = 0.92 between advertising spend and revenue. Can you conclude advertising causes revenue increases?
✅ AnswerNo — correlation does not imply causation. A confounding variable (e.g., seasonality) may drive both. Only a randomised controlled experiment can establish causation.
📌 What is the difference between Type I and Type II errors?
✅ AnswerType I (false positive): rejecting a true null hypothesis — detecting an effect that doesn't exist. Type II (false negative): failing to reject a false null — missing a real effect. α controls Type I error rate.
📌 You want to survey 1,000 customers from 5 regions ensuring proportional representation from each. Which sampling method?
✅ AnswerStratified sampling — divide the population into strata (regions) and sample proportionally from each. Ensures all groups are represented.
📌 A company tracks monthly revenue for 3 years. What analysis technique identifies a long-term upward trend vs. seasonal fluctuations?
✅ AnswerTime series decomposition — separating the series into Trend, Seasonality, Cyclical, and Residual components. Moving averages smooth out seasonal and irregular components to reveal the trend.
📌 R² = 0.85 for a regression model predicting sales from advertising. What does this mean?
✅ AnswerThe model explains 85% of the variance in sales. 15% of variance remains unexplained by the model. R² = 1 would be a perfect fit; R² = 0 means the model explains nothing.

🧪 Practice Labs

📓 Jupyter Notebook Available

Download and run hands-on Python exercises for this chapter in Jupyter.

✏️ Statistics Formulas & Concepts

Fill in Blank

Type the correct statistics term for each definition or formula.

🐍 Python: Statistics from Scratch

Code Lab

Run Python cells to compute statistics from scratch. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10
Term
Null Hypothesis (H₀)
Definition
A statement assuming no effect, no difference, or no relationship between variables — the default position that must be disproven through evidence.

Quick Quiz

📝 My Study Notes

Chapter 6

Data Analytics Tools

Survey the landscape of tools — from spreadsheets and programming languages to BI suites and ML platforms.

🎯 Exam Objectives Covered

  • Obj 4.1 — Given a scenario, translate business requirements to form a report (report types, delivery methods, design elements)
  • Obj 4.2 — Given a scenario, use appropriate design components for reports and dashboards (charts, KPIs, filters)
  • Obj 4.3 — Given a scenario, use appropriate methods for dashboard development workflow (wireframe, mock-up, prototyping)

📖 Key Topics

Topic 1 of 7
Loading topics…
0 of 7 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

📌 A business user wants to explore sales data by region, product, and time period without asking IT each time. What tool category fits best?
✅ AnswerA self-service BI platform (Tableau, Power BI) — provides interactive filters, drill-down, and slicers that let users explore data independently without technical skills.
📌 You need to build a machine learning model to predict customer churn, handling feature engineering, model training, and evaluation. Which tool is most appropriate?
✅ AnswerPython with pandas and scikit-learn — provides full programmatic control for complex ML pipelines, feature engineering, cross-validation, and model evaluation.
📌 A pharmaceutical company needs to run clinical trial statistical analysis that regulators will accept. Which tool?
✅ AnswerSAS — it is the gold standard for regulated industries, FDA-accepted for clinical trial analysis, and has built-in audit trails for regulatory compliance.
📌 A data analyst needs to query a Snowflake data warehouse directly. What interface do they use?
✅ AnswerSQL — either via Snowflake's web UI, a SQL client (DBeaver), or a BI tool that connects directly to Snowflake using a JDBC/ODBC connector.
📌 What is the difference between Python and R for data analysis?
✅ AnswerPython: general-purpose programming language with excellent data analysis libraries (pandas, scikit-learn); better for ML, automation, web scraping, and production deployment. R: statistics-first language with extensive academic statistical packages; better for advanced statistics, academic research, and publication-quality charts.
📌 When should you use Excel vs. Python for data analysis?
✅ AnswerExcel: best for small-medium datasets (<1M rows), ad hoc analysis, sharing with non-technical users, and quick calculations. Python: best for large datasets, reproducible pipelines, automation, machine learning, and processing data that changes regularly.
📌 A company wants to build cloud analytics that automatically scales with query demand and charges only per query. Which service fits?
✅ AnswerA serverless query service — AWS Athena (queries S3 data) or Google BigQuery. These charge per TB scanned and require no infrastructure management.

🧪 Practice Labs

📓 Jupyter Notebook Available

Download and run hands-on Python exercises for this chapter in Jupyter.

🛠️ Match Tools to Their Use Case

Click & Match

Drag each analytics tool to its correct use case.

🐍 Python Analytics Pipeline

Code Lab

Run Python cells to build an analytics pipeline. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10
Term
tidyverse
Definition
A collection of R packages (including ggplot2, dplyr, tidyr) designed to facilitate data manipulation and visualization using consistent, readable syntax.

Quick Quiz

📝 My Study Notes

Chapter 7

Data Visualization with Reports & Dashboards

Learn how to translate business requirements into compelling visualizations, reports, and interactive dashboards.

🎯 Exam Objectives Covered

  • Obj 4.1 — Report types, layouts, delivery (static, interactive, ad hoc, recurring, pixel-perfect)
  • Obj 4.2 — Design components: appropriate chart types, KPIs, conditional formatting, reference lines
  • Obj 4.3 — Dashboard workflow: wireframe, mock-up, prototype, stakeholder review, data story

📖 Key Topics

Topic 1 of 8
Loading topics…
0 of 8 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

📌 Monthly website traffic over the past 2 years. Which chart type?
✅ AnswerLine chart — it shows a continuous trend over time with many data points. Each point connects to the next, making rate of change and trend direction immediately visible.
📌 Revenue breakdown by product category (6 categories). Which chart?
✅ AnswerHorizontal bar chart — comparing discrete categories. Bars are easier to compare than pie slices, especially with 6+ categories. Sort by value descending for maximum clarity.
📌 Show how gross revenue becomes net profit through a series of additions and subtractions (returns, COGS, operating expenses). Which chart?
✅ AnswerWaterfall chart — each bar "floats" at the level where the previous value ended, making the contribution of each factor intuitively clear.
📌 Compare the distribution of exam scores across three different training cohorts. Which chart?
✅ AnswerBox plot — shows min, Q1, median, Q3, max for each group simultaneously, enabling direct comparison of distributions including outliers.
📌 You have 20 KPIs to display for an executive. A user asks you to put them all on one dashboard. What should you advise?
✅ AnswerRecommend limiting to 5-7 most critical KPIs per dashboard view. Too many KPIs create cognitive overload. Use drill-down to secondary dashboards for additional metrics. Prioritise the metrics most directly linked to executive decisions.
📌 A dashboard filter lets users select time period, region, and product category simultaneously. What is this feature called?
✅ AnswerA slicer (or filter panel) — interactive controls that allow users to dynamically subset the data displayed in all connected charts on the dashboard simultaneously.
📌 Relationship between 50 products' advertising spend (X axis) and revenue (Y axis). Which chart?
✅ AnswerScatter plot — each product is one point at coordinates (ad spend, revenue). A trend line can be added to show the direction and strength of correlation.
📌 What is the difference between a static report, an interactive report, and an ad hoc report?
✅ AnswerStatic: fixed PDF/printout, no interaction. Interactive: live dashboard with filters and drill-down (Tableau, Power BI). Ad hoc: custom one-time report built by an analyst to answer a specific question not covered by existing reports.

🧪 Practice Labs

📓 Jupyter Notebook Available

Download and run hands-on Python exercises for this chapter in Jupyter.

📊 Pick the Right Chart

Click & Match

Drag each scenario to the chart type that best represents it.

✏️ Dashboard & Report Vocabulary

Fill in Blank

Type the correct visualization or reporting term for each definition.

🐍 Python: Chart Logic

Code Lab

Run Python cells to practice chart selection logic. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10
Term
Waterfall Chart
Definition
A visualization that shows the cumulative effect of sequentially introduced positive or negative values — ideal for financial analysis (e.g., profit/loss breakdown).

Quick Quiz

📝 My Study Notes

Chapter 8

Data Governance

Understand the policies, roles, and frameworks that ensure data is secure, compliant, and used appropriately.

🎯 Exam Objectives Covered

  • Obj 5.1 — Summarise important data governance concepts (roles, data classification, policies, master data management)
  • Obj 5.2 — Apply data quality control concepts (validation, profiling, auditing, data lineage)
  • Obj 5.3 — Explain master data management (MDM) concepts (golden record, entities, workflows)

📖 Key Topics

Topic 1 of 7
Loading topics…
0 of 7 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

📌 A business executive is accountable for a customer dataset — approving who can access it and setting the retention policy. What is their governance role?
✅ AnswerData Owner — accountable for the data's use, access approvals, and lifecycle policies. They don't manage data day-to-day but are ultimately responsible.
📌 A database administrator manages backups, storage systems, and security controls for a dataset. What is their role?
✅ AnswerData Custodian — responsible for the physical infrastructure (storage, backups, security). Not accountable for data content or quality.
📌 Your organisation must notify customers within a specific timeframe after a personal data breach. Which regulation mandates this?
✅ AnswerGDPR (72 hours to notify the supervisory authority, then without undue delay for affected individuals) applies in the EU. HIPAA requires notification within 60 days for US healthcare breaches.
📌 A developer needs to test an application using realistic customer data without exposing real PII. What technique should be used?
✅ AnswerData masking — replace real PII (names, emails, SSNs) with fictitious but realistic values. The data structure and format are preserved but real values are protected.
📌 A company stores customer records in 5 different systems, all with slightly different versions of the same customer. What governance practice creates a single authoritative record?
✅ AnswerMaster Data Management (MDM) — the process of matching, merging, and creating a "golden record" that becomes the single source of truth for each customer entity across all systems.
📌 "Financial reporting data must be retained for 7 years then securely disposed of." What governance policy area is this?
✅ AnswerData retention and disposal policy — defines how long each data type must be kept (driven by legal/regulatory requirements) and how it must be destroyed at the end of its lifecycle.
📌 A financial analyst exports a spreadsheet of customer PII to their personal email to work from home. A governance tool detects and blocks this. What technology is this?
✅ AnswerData Loss Prevention (DLP) — monitors data flows and blocks or alerts on potential exfiltration of sensitive data to unauthorised destinations.
📌 What is the difference between data anonymisation and pseudonymisation under GDPR?
✅ AnswerAnonymisation: irreversibly removes all identifying information — GDPR no longer applies (not personal data). Pseudonymisation: replaces identifiers with pseudonyms (hashes) but can be reversed with the key — GDPR still applies (still personal data).

🧪 Practice Labs

📓 Jupyter Notebook Available

Download and run hands-on Python exercises for this chapter in Jupyter.

🔒 Governance Roles & Responsibilities

Click & Match

Drag each governance role to its correct responsibility.

✏️ Data Governance Vocabulary

Fill in Blank

Type the correct data governance term for each definition.

🐍 Python: Governance in Practice

Code Lab

Run Python cells to explore governance concepts in code. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10
Term
Data Steward
Definition
The role responsible for leading an organization's data governance activities, ensuring data quality, security, privacy, and regulatory compliance across the enterprise.

Quick Quiz

📝 My Study Notes

📦 Question Bank

Instructor question management & import pipeline

Published
Drafts
Pending Review
Failed Jobs

Recent Import Jobs

Loading…

⬆ Import Questions

Upload a PDF, Word document, or scanned exam sheet

📄

Drag & drop a file here, or

Accepted: PDF, DOCX, PNG, JPG, TIFF · Max 50 MB

🔬 Review Studio

Reviewing import job

No items loaded.

Select a question from the list to edit it.

📝 Draft Queue

Questions pending review across all import jobs

Loading…

🔍 Browse Question Bank

Search, filter, edit, and manage published questions

Loading…