0% found this document useful (0 votes)
22 views5 pages

Handling Missing Data in SPSS Methods

The document outlines various methods for handling missing data in SPSS, including Listwise Deletion, Pairwise Deletion, EM Algorithm, and Regression Imputation, each with its pros and cons. It emphasizes the importance of choosing the right imputation method based on the type of variable, mechanism of missingness, and analytical goals. The recommended approach for complex datasets is Multiple Imputation with Iterative Method (MICE) to ensure valid group comparisons and preserve statistical inference.

Uploaded by

fasilistheo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views5 pages

Handling Missing Data in SPSS Methods

The document outlines various methods for handling missing data in SPSS, including Listwise Deletion, Pairwise Deletion, EM Algorithm, and Regression Imputation, each with its pros and cons. It emphasizes the importance of choosing the right imputation method based on the type of variable, mechanism of missingness, and analytical goals. The recommended approach for complex datasets is Multiple Imputation with Iterative Method (MICE) to ensure valid group comparisons and preserve statistical inference.

Uploaded by

fasilistheo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Missing values

In SPSS, there are several methods to handle missing data, each suitable for different analytical
goals. Here's a concise explanation of the key methods: Listwise Deletion, Pairwise Deletion,
EM (Expectation-Maximization), and Regression Imputation.

🔹 1. Listwise Deletion (Complete Case Analysis)

Description: SPSS excludes any case (row) that has a missing value in any variable used in the
analysis.

Use in SPSS:

 Default in many procedures (e.g., Analyze > Correlate > Bivariate)


 Option: “Exclude cases listwise”

Pros:

 Simple and easy to implement


 Keeps sample structure intact

Cons:

 Can lead to substantial data loss


 Introduces bias if data is not MCAR (Missing Completely At Random)

🔹 2. Pairwise Deletion

Description: Uses all available data for each pair of variables. If a value is missing for one pair,
it is excluded only from that pairwise calculation.

Use in SPSS:

 In correlation or covariance analysis


 Option: “Exclude cases pairwise”

Pros:

 Uses more data than listwise


 Better for exploratory analysis

Cons:
 Results in inconsistent sample sizes across comparisons
 May produce non-positive definite correlation matrices

🔹 3. EM (Expectation-Maximization) Algorithm

Description: An iterative method that estimates means, covariances, and regression parameters
assuming a multivariate normal distribution.

Use in SPSS:

 Analyze > Missing Value Analysis > EM algorithm

Pros:

 Produces unbiased estimates under MAR (Missing At Random)


 Preserves correlations among variables
 Good for descriptive statistics and imputation before modeling

Cons:

 Does not generate multiple datasets, so uncertainty from imputation is not fully
captured
 Not suitable alone for inferential analysis

🔹 4. Regression Imputation

Description: Predicts missing values using regression equations built from other variables in the
dataset.

Use in SPSS:

 Transform > Replace Missing Values > Choose “Linear regression”

Pros:

 Easy to apply
 Maintains relationships between variables

Cons:

 Can underestimate variance (imputed values lie on regression line)


 Can overfit if predictors are highly collinear
 Not recommended for final statistical inference

🧠 Summary Table

Handles Variance Suitable for


Method Notes
Missingness Preserved Inference
❌ Biased if not
Listwise No ✅ Yes Data loss risk
MCAR
Good for
Pairwise Partial ⚠️Sometimes ❌ Inconsistent N
correlations
⚠️No (not Good for summary
EM Algorithm Yes (MAR) ✅ Yes
multiple) stats
Regression
Yes (MAR) ❌ Underestimated ❌ No Risk of overfitting
Impute
MICE (not in Best for robust
Yes (MAR) ✅ Yes ✅ Yes
SPSS) analysis
Imputation is the process of filling in missing data. The choice of imputation method depends
on:

 the type of variable (numeric, categorical),


 the mechanism of missingness (MCAR, MAR, MNAR),
 and the analytical goals (e.g., preserving variance, predictive modeling, causal
inference).

✅ Single vs. Multiple Imputation

Feature Single Imputation Multiple Imputation


Create multiple datasets with different
Definition Fill in missing values once
imputations
Mean/Median Imputation, kNN,
Examples MICE, Bayesian Imputation
Regression
Captures Uncertainty ❌ No ✅ Yes
Bias Risk ✅ High risk of bias 🔻 Lower risk with proper modeling
Variance
✅ Yes ❌ No – preserves natural variability
Underestimated
🔺 Requires pooling of results across
Analysis Complexity ✅ Simple
datasets
Use in Inferential ✅ Recommended for inferential
❌ Often discouraged
Models statistics

📘 Best Practices for Your Graphology Dataset

You have:

 Quantitative features derived from handwriting (e.g., spacing, inclination).


 Some missingness likely due to measurement failure or partial administration.
 Inclination variables with only negative values, needing distribution-sensitive
imputation.
 Aim to compare groups (e.g., epilepsy vs. control), meaning you must preserve variance
and uncertainty.

🧠 Recommended Approach: Multiple Imputation with Iterative Method (MICE)


 Why: MICE (Multiple Imputation by Chained Equations) handles complex multivariate
missingness and accounts for the uncertainty of imputation, which is essential for valid
group comparisons and preserving statistical inference.
 Estimator: Bayesian Ridge or Random Forest are good defaults.
 Iterations: Usually 10–20 are sufficient; you can increase if convergence is not reached.

You might also like