CompTIA Data+ Study Buddy

Practice Quiz

258 real exam-style questions from the official question bank. Choose your topics, set the count, and start quizzing.

Questions

258

Topics

8

Multi-Answer

14

🎓 Choose Mode

🎯 Select Topics

Choose one or more topics, or leave all unselected to include every topic.

⚙️ Quiz Options

Order

Number of Questions

max: 258

Terminology Reference

Plain-language definitions for every key term across all 5 exam domains. Click a domain to expand, or search for any term.

🔍

Official Acronym Flashcards

All 26 acronyms from the official CompTIA DA0-001 exam objectives. Master every one before exam day.

Total

26

Source

Official

🃏 Acronym Flashcards — Click to Flip

1 / 26

Acronym

ETL

Stands For

Extract, Transform, Load

📋 Full Acronym Reference

⚡ Quick-Match Quiz — Acronyms

Chapter Notes

Your personal notes for this chapter

📝 My Notes

CompTIA Data+ Study Buddy

Your all-in-one study companion for the DA0-001 exam. Work through all 8 chapters with topics, flashcards, quizzes, and personal notes.

📚

8

Total Chapters

🃏

40+

Flashcards

✅

0

Chapters Studied

🎯

—

Quiz Score

All Chapters

Ch 1Today's Data Analyst

Introduction to analytics roles, processes, and the data landscape.

Analytics ProcessDescriptivePredictiveAI/ML

Ch 2Understanding Data

Data types, structures, and common file formats used in analytics.

Structured DataJSONXMLData Types

Ch 3Databases & Data Acquisition

Relational models, NoSQL, OLTP/OLAP, ETL/ELT, and SQL fundamentals.

SQLETLOLAPNoSQL

Ch 4Data Quality

Identifying and resolving data quality issues; manipulation techniques.

ImputationNormalizationOutliersValidation

Ch 5Data Analysis & Statistics

Descriptive & inferential statistics, hypothesis testing, regression.

Hypothesis TestingRegressionMean/Median

Ch 6Data Analytics Tools

Spreadsheets, programming languages, BI suites, and ML platforms.

PythonRPower BITableau

Ch 7Visualization & Dashboards

Report design, dashboard development, and visualization best practices.

ChartsDashboardsInfographicsKPIs

Ch 8Data Governance

Governance roles, access control, data classification, and MDM.

Data StewardPIIMDMCompliance

Exam Domains & Weightings

The DA0-001 exam covers five domains. Understanding the weight of each domain helps you prioritize your study time.

🎯 Domain Weightings

1.0 Data Concepts & Environments

15%

2.0 Data Mining

25%

3.0 Data Analysis

23%

4.0 Visualization

23%

5.0 Data Governance, Quality & Controls

14%

Exam Objectives — Click Any Domain to Expand

1.0 Data Concepts and Environments — 15% of exam

▶

Objective 1.1

Identify basic concepts of data schemas and dimensions.

Databases
Relational
Non-relational
Data mart / data warehousing / data lake
Online transactional processing (OLTP)
Online analytical processing (OLAP)

Schema concepts
Snowflake
Star
Slowly changing dimensions
Keep current information
Keep historical and current information

Objective 1.2

Compare and contrast different data types.

Date
Numeric
Alphanumeric
Currency
Text

Discrete vs. continuous
Categorical / dimension
Images
Audio
Video

Objective 1.3

Compare and contrast common data structures and file formats.

Structures
Structured — defined rows/columns; key-value pairs
Unstructured — undefined fields; machine data

Data file formats
Text/Flat file — Tab delimited, Comma delimited
JavaScript Object Notation (JSON)
Extensible Markup Language (XML)
Hypertext Markup Language (HTML)

2.0 Data Mining — 25% of exam

▶

Objective 2.1

Explain data acquisition concepts.

Integration
Extract, transform, load (ETL)
Extract, load, transform (ELT)
Delta load
Application programming interfaces (APIs)

Data collection methods
Web scraping
Public databases
Application programming interface (API) / web services
Survey
Sampling
Observation

Objective 2.2

Identify common reasons for cleansing and profiling datasets.

Duplicate data
Redundant data
Missing values
Invalid data

Non-parametric data
Data outliers
Specification mismatch
Data type validation

Objective 2.3

Given a scenario, execute data manipulation techniques.

Recoding data
Numeric
Categorical
Derived variables
Data merge
Data blending
Concatenation

Data append
Imputation
Reduction / aggregation
Transpose
Normalize data
Parsing / string manipulation

Objective 2.4

Explain common techniques for data manipulation and query optimization.

Data manipulation
Filtering
Sorting
Date functions
Logical functions
Aggregate functions
System functions

Query optimization
Parametrization
Indexing
Temporary table in the query set
Subset of records
Execution plan

3.0 Data Analysis — 23% of exam

▶

Objective 3.1

Given a scenario, apply the appropriate descriptive statistical methods.

Measures of central tendency
Mean
Median
Mode
Distribution
Measures of dispersion
Range, Max, Min
Variance
Standard deviation

Frequencies / percentages
Percent change
Percent difference
Confidence intervals

Objective 3.2

Explain the purpose of inferential statistical methods.

t-tests
Z-score
p-values
Chi-squared

Hypothesis testing
Type I error
Type II error
Simple linear regression
Correlation

Objective 3.3

Summarize types of analysis and key analysis techniques.

Process to determine type of analysis
Review / refine business questions
Determine data needs and sources
Scoping / gap analysis
Type of analysis
Trend analysis
Comparison of data over time
Performance analysis

Tracking measurements against defined goals
Basic projections to achieve goals
Link analysis — connection of data points or pathway
Exploratory data analysis
Use of descriptive statistics to determine observations

Objective 3.4

Identify common data analytics tools.

Note: The intent of this objective is NOT to test specific vendor feature sets nor the purposes of the tools.

Structured Query Language (SQL)
Python
Microsoft Excel
R
RapidMiner
IBM Cognos
IBM SPSS Modeler

IBM SPSS
SAS
Tableau
Power BI
Qlik
MicroStrategy
BusinessObjects
Apex, Dataroma, Domo, AWS QuickSight, Stata, Minitab

4.0 Visualization — 23% of exam

▶

Objective 4.1

Given a scenario, translate business requirements to form a report.

Data content
Filtering
Views
Date range

Frequency
Audience for report
Distribution list

Objective 4.2

Given a scenario, use appropriate design components for reports and dashboards.

Report cover page
Instructions
Summary
Observations and insights
Design elements
Color schemes, Layout
Font size and style
Key chart elements: Titles, Labels, Legends
Corporate reporting standards / style guide
Branding, Color codes, Logos/trademarks, Watermark

Report cover page (cont.)
Executive summary
FAQs
Appendix
Documentation elements
Version number
Reference data sources
Reference dates
Report run date
Data refresh date

Objective 4.3

Given a scenario, use appropriate methods for dashboard development.

Dashboard considerations
Data sources and attributes, Field definitions
Dimensions and Measures
Continuous/live data feed vs. static data
Consumer types: C-level executives, Management, External vendors/stakeholders, General public, Technical experts
Development process
Mockup/wireframe → Layout/presentation
Flow/navigation → Data story planning
Approval granted → Develop dashboard → Deploy to production

Delivery considerations
Subscription, Scheduled delivery
Interactive (drill down / roll up)
Saved searches, Filtering
Static, Web interface
Dashboard optimization
Access permissions

Objective 4.4

Given a scenario, apply the appropriate type of visualization.

Line chart
Pie chart
Bubble chart
Scatter plot
Bar chart
Histogram
Waterfall

Heat map
Geographic map
Tree map
Stacked chart
Infographic
Word cloud

Objective 4.5

Compare and contrast types of reports.

Static vs. dynamic reports
Point-in-time
Real time
Ad-hoc / one-time report
Self-service / on demand

Recurring reports
Compliance reports (financial, health, safety)
Risk and regulatory reports
Operational reports (performance, KPIs)
Tactical / research report

5.0 Data Governance, Quality, and Controls — 14% of exam

▶

Objective 5.1

Summarize important data governance concepts.

Access requirements
Role-based
User group-based
Data use agreements
Release approvals
Security requirements
Data encryption
Data transmission
De-identify data / data masking
Storage environment requirements
Shared drive vs. cloud based vs. local storage

Use requirements
Acceptable use policy
Data processing — Data deletion, Data retention
Entity relationship requirements
Record link restrictions, Data constraints, Cardinality
Data classification
Personally identifiable information (PII)
Personal health information (PHI)
Payment card industry (PCI)
Jurisdiction requirements
Impact of industry and governmental regulations
Data breach reporting — Escalate to appropriate authority

Objective 5.2

Given a scenario, apply data quality control concepts.

Circumstances to check for quality
Data acquisition / data source
Data transformation / intrahops: Pass through, Conversion
Data manipulation
Final product (report / dashboard)
Automated validation
Data field to data type validation
Number of data points

Data quality dimensions
Data consistency
Data accuracy
Data completeness
Data integrity
Data attribute limitations
Data quality rule and metrics
Conformity, Non-conformity, Rows passed, Rows failed
Methods to validate quality
Cross-validation, Sample/spot check
Reasonable expectations, Data profiling, Data audits

Objective 5.3

Explain master data management (MDM) concepts.

Processes
Consolidation of multiple data fields
Standardization of data field names
Data dictionary
Compliance with policies and regulations
Streamline data access

Circumstances for MDM
Mergers and acquisitions

📋 Exam Format

90 questions total
90 minutes to complete
Passing score: 675 (scale 100–900)
Performance-based assessment format
Multiple-choice, fill-in-the-blank
Multiple-response, drag-and-drop
Image-based problems

💡 Exam Tips

CompTIA includes vague questions — use logic
Some questions may have two correct answers
Item seeding: some questions are unscored
18–24 months hands-on experience recommended
Focus on scenario-based learning
Review all 5 domains proportionally
Data Mining (25%) is the heaviest domain

Chapter 1

Today's Data Analyst

Introduction to the world of analytics — the roles, tools, processes, and techniques that define modern data work.

🎯 Exam Objectives Covered

Domain 1.0 — Data Concepts & Environments (15%)
Domain 2.0 — Data Mining (25%)
Domain 3.0 — Data Analysis (23%)
Domain 4.0 — Visualization (23%)
Domain 5.0 — Data Governance, Quality & Controls (14%)
This chapter introduces all 5 domains at a high level

📖 Key Topics

Topic 1 of 7

Loading topics…

0 of 7 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

▶📌 A retail company wants to understand why sales dipped last quarter. Which analytics type applies — descriptive, predictive, or prescriptive?

✅ AnswerDescriptive analytics — it describes what happened (sales drop) and examines the past. It does not predict or recommend action.

▶📌 A logistics company wants to recommend optimal shipping routes to minimise delivery time and cost. Which analytics type is this?

✅ AnswerPrescriptive analytics — it recommends the best action to take (which route to use). It combines predictions and optimisation algorithms to produce actionable recommendations.

▶📌 A bank wants to identify which loan applicants are likely to default in the next 12 months. Which analytics type applies?

✅ AnswerPredictive analytics — it uses historical data and models to forecast a future outcome (probability of default). It does not recommend action, only predicts likelihood.

▶📌 List the 5 steps of the analytics process in order.

✅ Answer1. Data Acquisition → 2. Cleaning & Manipulation → 3. Analysis → 4. Visualization → 5. Reporting & Communication. This sequence appears directly on the CompTIA Data+ exam.

▶📌 What is the difference between a Data Analyst and a Data Scientist?

✅ AnswerA Data Analyst focuses on describing and interpreting existing data through reporting, dashboards, and SQL queries. A Data Scientist builds predictive and statistical models using machine learning algorithms, typically with Python or R.

▶📌 A streaming music service suggests songs a user might like based on their listening history. Which analytics type drives this feature?

✅ AnswerPrescriptive analytics (recommendation engine) — it recommends specific songs to listen to. This is built on predictive models that estimate what the user will enjoy, then prescribes what to play next.

▶📌 A marketing manager receives a PDF report every Monday showing last week's campaign KPIs. What type of report is this?

✅ AnswerA recurring static report — it is generated on a fixed weekly schedule and delivered in a non-interactive format (PDF). It is descriptive in nature, summarising historical performance.

▶📌 Which analytics tool would be most appropriate for a business user who needs to build an interactive dashboard to share with executives without coding?

✅ AnswerA Business Intelligence platform like Tableau or Microsoft Power BI — they provide drag-and-drop dashboard creation, interactive filters, and publishing capabilities without requiring programming skills.

🧪 Practice Labs

🔢 Put the Analytics Process in Order

Click & Order

Use the ↑ ↓ buttons to arrange the five steps in the correct sequence.

🎯 Match Analytics Types to Scenarios

Click & Match

Click a card to select it (it highlights), then click the correct category zone to place it.

🃏 Flashcards — Click to Flip

1 / 10

Term

Descriptive Analytics

Definition

Analyzes historical data to understand what has happened in the past.

❓ Quick Quiz

After each run you can review highlighted topics (below your score) and use Retake quiz for another pass.

📝 My Study Notes

Chapter 2

Understanding Data

Explores data types, structures, and file formats — the foundational vocabulary every data analyst must know.

🎯 Exam Objectives Covered

Obj 1.2 — Compare and contrast different data types
Date, Numeric, Alphanumeric, Currency, Text, Discrete, Continuous, Categorical
Obj 1.3 — Compare and contrast common data structures and file formats
Structured, Semi-structured, Unstructured; CSV, JSON, XML, Parquet, XLSX

📖 Key Topics

Topic 1 of 8

Loading topics…

0 of 8 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

▶📌 A dataset has columns: CustomerID (integer), Name (text), Email (text), SignupDate (date), OrderCount (integer). Classify each by data type.

✅ AnswerCustomerID: Numeric (discrete/integer). Name: Alphanumeric/Text. Email: Alphanumeric/Text. SignupDate: Date/Time. OrderCount: Numeric (discrete/integer). All are structured data in defined columns.

▶📌 You receive IoT sensor logs as pipe-delimited text files: "2024-01-15|23.7|OK". What data structure type is this?

✅ AnswerStructured data in a flat/delimited file format. Each record has a consistent structure (timestamp | value | status) with a known delimiter. It is not truly unstructured — it has implicit structure, even without a formal schema.

▶📌 You need to exchange data between two web APIs. Which file format is most appropriate and why?

✅ AnswerJSON — it is the standard for REST API data exchange. It supports nested structures, is human-readable, lightweight, and natively understood by all modern programming languages and web frameworks.

▶📌 Is "Number of website visits per day" discrete or continuous? What about "Page load time in seconds"?

✅ AnswerVisits per day: Discrete — you cannot have 3.7 visits; it is a whole number count. Page load time: Continuous — it can be 1.237 seconds, 2.5 seconds, any decimal value within a range.

▶📌 A data warehouse stores sales data in a fact table linked to Product, Store, Date, and Customer dimension tables. What schema design is this?

✅ AnswerStar schema — one central fact table (sales transactions with numeric measures) surrounded by dimension tables (Product, Store, Date, Customer). This is the standard design for OLAP/data warehouse analytics.

▶📌 What is the difference between strong typing and weak typing in databases?

✅ AnswerStrong typing: the database strictly enforces data types — inserting text into a numeric column fails with an error. Weak typing: the system auto-coerces types, potentially allowing "5" (string) + 3 (integer) = 8. Relational databases are strongly typed; spreadsheets often use weak typing.

▶📌 A big data platform stores 10 TB of raw log files, images, video, and JSON from 50 different systems. What type of storage system fits best?

✅ AnswerA data lake — it stores raw data in native format (structured, semi-structured, and unstructured) without requiring a predefined schema. It is cost-effective for large volumes and supports schema-on-read when analysis is needed later.

▶📌 A "Customer Satisfaction Rating" field contains values: Excellent, Good, Fair, Poor. What type of data is this?

✅ AnswerOrdinal qualitative data — it is categorical (labels, not numbers) with a meaningful rank order (Excellent > Good > Fair > Poor). The gaps between categories are not necessarily equal, so arithmetic (calculating an average rating) is statistically inappropriate.

🧪 Practice Labs

🗂️ Classify These Data Examples

Click & Match

Drag each item to the correct data structure category (Structured, Semi-structured, or Unstructured).

✏️ Fill in the Data Type Definitions

Fill in Blank

Type the correct CompTIA data type term for each definition.

🐍 Python: Exploring Data Types

Code Lab

Run Python cells to explore data types. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10

Term

Structured Data

Definition

Data organized in a defined schema with rows and columns — typically stored in relational databases or spreadsheets. Easy to query with SQL.

❓ Quick Quiz

After each run you can review highlighted topics (below your score) and use Retake quiz for another pass.

📝 My Study Notes

Chapter 3

Databases & Data Acquisition

Covers the relational model, non-relational databases, OLTP/OLAP patterns, ETL/ELT, and core SQL operations.

🎯 Exam Objectives Covered

Obj 1.1 — Identify basic concepts of data schemas and dimensions (Relational & NoSQL; Data Mart/Warehouse/Lake; OLTP, OLAP; Star & Snowflake)
Obj 2.1 — Explain data acquisition concepts (ETL, ELT, Delta load, APIs, web scraping, surveys, sampling)
Obj 2.4 — Data manipulation & query optimization (filtering, sorting, aggregate functions, indexing, execution plans)

📖 Key Topics

Topic 1 of 8

Loading topics…

0 of 8 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

▶📌 A company runs 24/7 point-of-sale transactions across 500 stores. OLTP or OLAP?

✅ AnswerOLTP — it requires high-speed inserts and updates for individual transactions. The system must handle thousands of concurrent writes. OLAP would be used separately to analyse the accumulated transaction history.

▶📌 A data team loads daily order changes instead of reloading the entire 5-year history every night. What technique is this?

✅ AnswerDelta load (incremental load) — only records that are new or changed since the last load are processed. This is more efficient than full-refresh loads and reduces pipeline runtime and system load.

▶📌 Write the SQL structure to retrieve all customers who placed orders in the last 30 days, grouped by customer, sorted by total spend descending.

✅ AnswerSELECT c.name, SUM(o.amount) AS total_spend FROM customers c INNER JOIN orders o ON c.id = o.customer_id WHERE o.order_date >= CURRENT_DATE - 30 GROUP BY c.name ORDER BY total_spend DESC

▶📌 When would you choose ELT over ETL?

✅ AnswerChoose ELT when using a modern cloud data warehouse (Snowflake, BigQuery, Redshift) that has sufficient compute power to transform data in-place. ELT preserves raw data, enables iterative transformation with dbt, and avoids maintaining a separate transformation server.

▶📌 A social network stores user profiles with variable fields. What database type fits best?

✅ AnswerA NoSQL document store (like MongoDB) — it handles flexible, variable schemas naturally. Each user document can have different fields without requiring schema alteration or NULL columns for every optional field.

▶📌 What is the difference between a Data Warehouse and a Data Lake?

✅ AnswerData Warehouse: stores structured, cleaned data with a defined schema; optimised for BI queries; schema-on-write. Data Lake: stores raw data in native format; cheap storage; schema-on-read. Warehouses serve BI; lakes serve data science and exploration.

▶📌 INNER JOIN vs LEFT JOIN — what is returned by each?

✅ AnswerINNER JOIN: returns only rows where both tables have matching values. LEFT JOIN: returns all rows from the left table plus matched rows from the right — unmatched right-side rows appear as NULL.

▶📌 A fraud detection system needs to find accounts connected through shared addresses. What database type is optimal?

✅ AnswerA graph database (Neo4j, Amazon Neptune) — it stores relationships as first-class objects and can traverse networks of connections efficiently. Relational databases require complex self-joins for relationship traversal at scale.

🧪 Practice Labs

🗄️ Match Database Concepts

Click & Match

Drag each item to its correct database concept category.

⚙️ Put the ETL Steps in Order

Click & Match

Drag the ETL steps into the correct order.

🐍 Python + SQLite

Code Lab

Run Python cells to explore SQLite. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10

Term

ETL

Definition

Extract, Transform, Load — a pipeline that extracts data from sources, transforms it (clean, format, enrich), then loads it into a target data warehouse.

❓ Quick Quiz

After each run you can review highlighted topics (below your score) and use Retake quiz for another pass.

📝 My Study Notes

Chapter 4

Data Quality

Learn to identify data quality challenges and apply manipulation techniques to clean, transform, and validate data.

🎯 Exam Objectives Covered

Obj 2.2 — Identify common reasons for cleansing and profiling (duplicate, missing, invalid, outliers, spec mismatch)
Obj 2.3 — Execute data manipulation techniques (recoding, imputation, normalization, parsing, deduplication)
Obj 5.2 — Apply data quality control concepts (dimensions: accuracy, completeness, consistency, timeliness, uniqueness, validity)

📖 Key Topics

Topic 1 of 7

Loading topics…

0 of 7 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

▶📌 A dataset has 15% missing values in "Income". You want to preserve the distribution shape. Which imputation method is best?

✅ AnswerMedian imputation — it is robust to outliers and preserves the distribution shape better than mean when data is skewed.

▶📌 Customer IDs appear multiple times with slightly different name spellings. What quality issue is this?

✅ AnswerDuplicate/redundant data combined with inconsistent data (data entry variation). Deduplication with fuzzy matching is needed.

▶📌 You need to combine FirstName and LastName into a single FullName field. Which technique?

✅ AnswerConcatenation — joining two string fields into one.

▶📌 A "Salary" field contains the value "N/A". What type of data quality issue is this?

✅ AnswerInvalid data / data type validation failure — text in a numeric field.

▶📌 What are the six dimensions of data quality?

✅ AnswerAccuracy, Completeness, Consistency, Timeliness, Uniqueness, Validity.

▶📌 You want to scale the "Revenue" column to a 0-1 range for use in a machine learning model. Which technique?

✅ AnswerMin-max normalization: (value - min) / (max - min). This ensures no variable dominates due to scale differences.

▶📌 A "Gender" field contains: M, Male, male, 1, F, Female, female, 0. What technique fixes this?

✅ AnswerRecoding (standardization) — mapping all variants to a single consistent value set (e.g., M and F).

▶📌 What is the difference between data profiling and data auditing?

✅ AnswerProfiling is exploratory — generating column statistics to discover quality issues. Auditing is systematic and scheduled — checking data against documented business rules and producing formal reports.

🧪 Practice Labs

✏️ Data Quality Dimensions

Fill in Blank

Type the correct data quality term for each definition.

🧹 Match Data Issues to Fixes

Click & Match

Drag each data issue to its correct fix or solution.

🐍 Python: Data Cleaning Pipeline

Code Lab

Run Python cells to practice data cleaning. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10

Term

Imputation

Definition

The process of replacing missing data values with substitute estimates, such as the mean, median, mode, or values derived from regression models.

❓ Quick Quiz

After each run you can review highlighted topics (below your score) and use Retake quiz for another pass.

📝 My Study Notes

Chapter 5

Data Analysis & Statistics

Master descriptive and inferential statistics — the mathematical foundation of data analysis and decision-making.

🎯 Exam Objectives Covered

Obj 3.1 — Explain the purpose of a variety of statistical methods (measures of central tendency, dispersion, distribution, hypothesis testing)
Obj 3.2 — Explain the purpose of data sampling techniques (random, stratified, cluster, systematic, convenience)
Obj 3.3 — Explain the purpose of various analysis and reporting techniques (regression, correlation, trend, cohort)

📖 Key Topics

Topic 1 of 8

Loading topics…

0 of 8 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

▶📌 A dataset of house prices has mean=$450K but median=$320K. What does this tell you?

✅ AnswerThe distribution is right-skewed (positively skewed) — a few very expensive houses pull the mean up significantly above the median. Use median for a typical price.

▶📌 You conduct an A/B test and get p-value = 0.03 with α=0.05. What do you conclude?

✅ AnswerReject the null hypothesis — the p-value (0.03) is less than the significance level (0.05), indicating the observed difference is statistically significant and unlikely due to chance.

▶📌 What does a Z-score of -2.5 mean?

✅ AnswerThe value is 2.5 standard deviations below the mean. It is an unusual observation. Values with |Z| > 3 are typically flagged as outliers.

▶📌 Correlation coefficient r = 0.92 between advertising spend and revenue. Can you conclude advertising causes revenue increases?

✅ AnswerNo — correlation does not imply causation. A confounding variable (e.g., seasonality) may drive both. Only a randomised controlled experiment can establish causation.

▶📌 What is the difference between Type I and Type II errors?

✅ AnswerType I (false positive): rejecting a true null hypothesis — detecting an effect that doesn't exist. Type II (false negative): failing to reject a false null — missing a real effect. α controls Type I error rate.

▶📌 You want to survey 1,000 customers from 5 regions ensuring proportional representation from each. Which sampling method?

✅ AnswerStratified sampling — divide the population into strata (regions) and sample proportionally from each. Ensures all groups are represented.

▶📌 A company tracks monthly revenue for 3 years. What analysis technique identifies a long-term upward trend vs. seasonal fluctuations?

✅ AnswerTime series decomposition — separating the series into Trend, Seasonality, Cyclical, and Residual components. Moving averages smooth out seasonal and irregular components to reveal the trend.

▶📌 R² = 0.85 for a regression model predicting sales from advertising. What does this mean?

✅ AnswerThe model explains 85% of the variance in sales. 15% of variance remains unexplained by the model. R² = 1 would be a perfect fit; R² = 0 means the model explains nothing.

🧪 Practice Labs

✏️ Statistics Formulas & Concepts

Fill in Blank

Type the correct statistics term for each definition or formula.

🐍 Python: Statistics from Scratch

Code Lab

Run Python cells to compute statistics from scratch. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10

Term

Null Hypothesis (H₀)

Definition

A statement assuming no effect, no difference, or no relationship between variables — the default position that must be disproven through evidence.

❓ Quick Quiz

After each run you can review highlighted topics (below your score) and use Retake quiz for another pass.

📝 My Study Notes

Chapter 6

Data Analytics Tools

Survey the landscape of tools — from spreadsheets and programming languages to BI suites and ML platforms.

🎯 Exam Objectives Covered

Obj 4.1 — Given a scenario, translate business requirements to form a report (report types, delivery methods, design elements)
Obj 4.2 — Given a scenario, use appropriate design components for reports and dashboards (charts, KPIs, filters)
Obj 4.3 — Given a scenario, use appropriate methods for dashboard development workflow (wireframe, mock-up, prototyping)

📖 Key Topics

Topic 1 of 7

Loading topics…

0 of 7 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

▶📌 A business user wants to explore sales data by region, product, and time period without asking IT each time. What tool category fits best?

✅ AnswerA self-service BI platform (Tableau, Power BI) — provides interactive filters, drill-down, and slicers that let users explore data independently without technical skills.

▶📌 You need to build a machine learning model to predict customer churn, handling feature engineering, model training, and evaluation. Which tool is most appropriate?

✅ AnswerPython with pandas and scikit-learn — provides full programmatic control for complex ML pipelines, feature engineering, cross-validation, and model evaluation.

▶📌 A pharmaceutical company needs to run clinical trial statistical analysis that regulators will accept. Which tool?

✅ AnswerSAS — it is the gold standard for regulated industries, FDA-accepted for clinical trial analysis, and has built-in audit trails for regulatory compliance.

▶📌 A data analyst needs to query a Snowflake data warehouse directly. What interface do they use?

✅ AnswerSQL — either via Snowflake's web UI, a SQL client (DBeaver), or a BI tool that connects directly to Snowflake using a JDBC/ODBC connector.

▶📌 What is the difference between Python and R for data analysis?

✅ AnswerPython: general-purpose programming language with excellent data analysis libraries (pandas, scikit-learn); better for ML, automation, web scraping, and production deployment. R: statistics-first language with extensive academic statistical packages; better for advanced statistics, academic research, and publication-quality charts.

▶📌 When should you use Excel vs. Python for data analysis?

✅ AnswerExcel: best for small-medium datasets (<1M rows), ad hoc analysis, sharing with non-technical users, and quick calculations. Python: best for large datasets, reproducible pipelines, automation, machine learning, and processing data that changes regularly.

▶📌 A company wants to build cloud analytics that automatically scales with query demand and charges only per query. Which service fits?

✅ AnswerA serverless query service — AWS Athena (queries S3 data) or Google BigQuery. These charge per TB scanned and require no infrastructure management.

🧪 Practice Labs

🛠️ Match Tools to Their Use Case

Click & Match

Drag each analytics tool to its correct use case.

🐍 Python Analytics Pipeline

Code Lab

Run Python cells to build an analytics pipeline. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10

Term

tidyverse

Definition

A collection of R packages (including ggplot2, dplyr, tidyr) designed to facilitate data manipulation and visualization using consistent, readable syntax.

❓ Quick Quiz

After each run you can review highlighted topics (below your score) and use Retake quiz for another pass.

📝 My Study Notes

Chapter 7

Data Visualization with Reports & Dashboards

Learn how to translate business requirements into compelling visualizations, reports, and interactive dashboards.

🎯 Exam Objectives Covered

Obj 4.1 — Report types, layouts, delivery (static, interactive, ad hoc, recurring, pixel-perfect)
Obj 4.2 — Design components: appropriate chart types, KPIs, conditional formatting, reference lines
Obj 4.3 — Dashboard workflow: wireframe, mock-up, prototype, stakeholder review, data story

📖 Key Topics

Topic 1 of 8

Loading topics…

0 of 8 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

▶📌 Monthly website traffic over the past 2 years. Which chart type?

✅ AnswerLine chart — it shows a continuous trend over time with many data points. Each point connects to the next, making rate of change and trend direction immediately visible.

▶📌 Revenue breakdown by product category (6 categories). Which chart?

✅ AnswerHorizontal bar chart — comparing discrete categories. Bars are easier to compare than pie slices, especially with 6+ categories. Sort by value descending for maximum clarity.

▶📌 Show how gross revenue becomes net profit through a series of additions and subtractions (returns, COGS, operating expenses). Which chart?

✅ AnswerWaterfall chart — each bar "floats" at the level where the previous value ended, making the contribution of each factor intuitively clear.

▶📌 Compare the distribution of exam scores across three different training cohorts. Which chart?

✅ AnswerBox plot — shows min, Q1, median, Q3, max for each group simultaneously, enabling direct comparison of distributions including outliers.

▶📌 You have 20 KPIs to display for an executive. A user asks you to put them all on one dashboard. What should you advise?

✅ AnswerRecommend limiting to 5-7 most critical KPIs per dashboard view. Too many KPIs create cognitive overload. Use drill-down to secondary dashboards for additional metrics. Prioritise the metrics most directly linked to executive decisions.

▶📌 A dashboard filter lets users select time period, region, and product category simultaneously. What is this feature called?

✅ AnswerA slicer (or filter panel) — interactive controls that allow users to dynamically subset the data displayed in all connected charts on the dashboard simultaneously.

▶📌 Relationship between 50 products' advertising spend (X axis) and revenue (Y axis). Which chart?

✅ AnswerScatter plot — each product is one point at coordinates (ad spend, revenue). A trend line can be added to show the direction and strength of correlation.

▶📌 What is the difference between a static report, an interactive report, and an ad hoc report?

✅ AnswerStatic: fixed PDF/printout, no interaction. Interactive: live dashboard with filters and drill-down (Tableau, Power BI). Ad hoc: custom one-time report built by an analyst to answer a specific question not covered by existing reports.

🧪 Practice Labs

📊 Pick the Right Chart

Click & Match

Drag each scenario to the chart type that best represents it.

✏️ Dashboard & Report Vocabulary

Fill in Blank

Type the correct visualization or reporting term for each definition.

🐍 Python: Chart Logic

Code Lab

Run Python cells to practice chart selection logic. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10

Term

Waterfall Chart

Definition

A visualization that shows the cumulative effect of sequentially introduced positive or negative values — ideal for financial analysis (e.g., profit/loss breakdown).

❓ Quick Quiz

After each run you can review highlighted topics (below your score) and use Retake quiz for another pass.

📝 My Study Notes

Chapter 8

Data Governance

Understand the policies, roles, and frameworks that ensure data is secure, compliant, and used appropriately.

🎯 Exam Objectives Covered

Obj 5.1 — Summarise important data governance concepts (roles, data classification, policies, master data management)
Obj 5.2 — Apply data quality control concepts (validation, profiling, auditing, data lineage)
Obj 5.3 — Explain master data management (MDM) concepts (golden record, entities, workflows)

📖 Key Topics

Topic 1 of 7

Loading topics…

0 of 7 topics covered

🏋️ Training Scenarios

Click any scenario to reveal the answer.

▶📌 A business executive is accountable for a customer dataset — approving who can access it and setting the retention policy. What is their governance role?

✅ AnswerData Owner — accountable for the data's use, access approvals, and lifecycle policies. They don't manage data day-to-day but are ultimately responsible.

▶📌 A database administrator manages backups, storage systems, and security controls for a dataset. What is their role?

✅ AnswerData Custodian — responsible for the physical infrastructure (storage, backups, security). Not accountable for data content or quality.

▶📌 Your organisation must notify customers within a specific timeframe after a personal data breach. Which regulation mandates this?

✅ AnswerGDPR (72 hours to notify the supervisory authority, then without undue delay for affected individuals) applies in the EU. HIPAA requires notification within 60 days for US healthcare breaches.

▶📌 A developer needs to test an application using realistic customer data without exposing real PII. What technique should be used?

✅ AnswerData masking — replace real PII (names, emails, SSNs) with fictitious but realistic values. The data structure and format are preserved but real values are protected.

▶📌 A company stores customer records in 5 different systems, all with slightly different versions of the same customer. What governance practice creates a single authoritative record?

✅ AnswerMaster Data Management (MDM) — the process of matching, merging, and creating a "golden record" that becomes the single source of truth for each customer entity across all systems.

▶📌 "Financial reporting data must be retained for 7 years then securely disposed of." What governance policy area is this?

✅ AnswerData retention and disposal policy — defines how long each data type must be kept (driven by legal/regulatory requirements) and how it must be destroyed at the end of its lifecycle.

▶📌 A financial analyst exports a spreadsheet of customer PII to their personal email to work from home. A governance tool detects and blocks this. What technology is this?

✅ AnswerData Loss Prevention (DLP) — monitors data flows and blocks or alerts on potential exfiltration of sensitive data to unauthorised destinations.

▶📌 What is the difference between data anonymisation and pseudonymisation under GDPR?

✅ AnswerAnonymisation: irreversibly removes all identifying information — GDPR no longer applies (not personal data). Pseudonymisation: replaces identifiers with pseudonyms (hashes) but can be reversed with the key — GDPR still applies (still personal data).

🧪 Practice Labs

🔒 Governance Roles & Responsibilities

Click & Match

Drag each governance role to its correct responsibility.

✏️ Data Governance Vocabulary

Fill in Blank

Type the correct data governance term for each definition.

🐍 Python: Governance in Practice

Code Lab

Run Python cells to explore governance concepts in code. Click ▶ Run to execute each cell.

🃏 Flashcards — Click to Flip

1 / 10

Term

Data Steward

Definition

The role responsible for leading an organization's data governance activities, ensuring data quality, security, privacy, and regulatory compliance across the enterprise.

❓ Quick Quiz

After each run you can review highlighted topics (below your score) and use Retake quiz for another pass.

Dashboard

Practice Quiz

🎓 Choose Mode

🎯 Select Topics

⚙️ Quiz Options

📋 Exam Simulation Settings

Quiz Complete!

📊 Score by Topic

❌ Incorrect Answers

📖 Full Question Review

Terminology Reference

Official Acronym Flashcards

🃏 Acronym Flashcards — Click to Flip

📋 Full Acronym Reference

⚡ Quick-Match Quiz — Acronyms

Chapter Notes

📝 My Notes

CompTIA Data+ Study Buddy

Exam Domains & Weightings

🎯 Domain Weightings

📋 Exam Format

💡 Exam Tips

Today's Data Analyst

🎯 Exam Objectives Covered

📖 Key Topics

🏋️ Training Scenarios

🧪 Practice Labs

🔢 Put the Analytics Process in Order

🎯 Match Analytics Types to Scenarios

🃏 Flashcards — Click to Flip

❓ Quick Quiz

📝 My Study Notes

Understanding Data

🎯 Exam Objectives Covered

📖 Key Topics

🏋️ Training Scenarios

🧪 Practice Labs

📓 Jupyter Notebook Available

🗂️ Classify These Data Examples

✏️ Fill in the Data Type Definitions

🐍 Python: Exploring Data Types

🃏 Flashcards — Click to Flip

❓ Quick Quiz

📝 My Study Notes

Databases & Data Acquisition

🎯 Exam Objectives Covered

📖 Key Topics

🏋️ Training Scenarios

🧪 Practice Labs

📓 Jupyter Notebook Available

🗄️ Match Database Concepts

⚙️ Put the ETL Steps in Order

🐍 Python + SQLite

🃏 Flashcards — Click to Flip

❓ Quick Quiz

📝 My Study Notes

Data Quality

🎯 Exam Objectives Covered

📖 Key Topics

🏋️ Training Scenarios

🧪 Practice Labs

📓 Jupyter Notebook Available

✏️ Data Quality Dimensions

🧹 Match Data Issues to Fixes

🐍 Python: Data Cleaning Pipeline

🃏 Flashcards — Click to Flip

❓ Quick Quiz

📝 My Study Notes

Data Analysis & Statistics

🎯 Exam Objectives Covered

📖 Key Topics

🏋️ Training Scenarios

🧪 Practice Labs

📓 Jupyter Notebook Available

✏️ Statistics Formulas & Concepts

🐍 Python: Statistics from Scratch

🃏 Flashcards — Click to Flip

❓ Quick Quiz

📝 My Study Notes

Data Analytics Tools