VARIABLE, VALIDITY,
RELIABILITY
Dr. Yan Liu
Department of Biomedical, Industrial and Human Factors Engineering
Wright State University
Variables
! Definition
" An event, situation, behavior, or individual characteristic that varies
! cognitive task performance, word length, intelligence, etc.
! Types of Variables by Measurement Scales
" Nominal variables
! Values are unordered labels
! color, name
" Ordinal variables
! Possible levels are ordered in sequence)
! ranking of preference, satisfaction
" Interval variables
! Derive gaps between values; no true zero
! time between flight departure and arrival, temperature (in the unit of C or F)
" Ratio variables
! Values are real numbers with true zeros
! size, weight
2
Operational Definitions of Variables
! Operational Definition of Variable
" Definitionof the variable in terms of the operations or techniques the researcher
uses to measure or manipulate it
! Why Operational Definitions
" Some variables must be operationally defined so they can be studied empirically
! “cognitive task performance” can be defined as the number of errors made in
detecting a target object on a screen or the time spent in working on the task
" The task of operationally defining a variable forces scientists to discuss abstract
concepts in concrete terms
" Operational definitions help us communicate our ideas to others
! Which Operational Definition to Use
" A variety of methods to operationally define a variable may be available, each of
which has advantages and disadvantage
" Decision on which operational definition to use in a study should be based on
the goal of the study and other considerations (e.g. ethnics and cost)
3
Relationship Between Variables
! Types of Relationship in Interval and Ratio Variables
" Positive linear relationship
" Negative linear relationship
" Curvilinear relationship
" No relationship
! Relationships and Reduction of Uncertainty
" Detecting relationships between variables means reducing our uncertainty about
the nature of the variables
" Error variance (random variability)
" Research is aimed at reducing error variance by identifying systematic
relationships between variables
4
Perfect Positive Linear Relationship Perfect Negative Linear Relationship
Perfect Positive Linear Relationship
No Relationship
(quadratic relationship with an
intermediate minimum) 5
Suppose you have surveyed 200 people about whether or not they like shopping, and
100 people said Yes and the remaining 100 said No. What can do conclude from the
information?
When you meet a person, you can only make a random guess whether the person
likes shopping or not, and the chance your answer is correct (either way) is 50%
Suppose you also have asked people to indicate their gender, and found the
relationship between gender and attitude toward shopping as follows.
Male Female
Like Yes 30 70 70% males do not like shopping
shopping? 70% females like shopping
No 70 30
Therefore, when you predict a person’s
Number of attitude toward shopping based on the
100 100
participants person’s gender, you will be correct 70% of
the time
6
Evaluating Research: Three Validities
! Validity
" “Truth” and accurate representation of information
! Three Types of Validity to Evaluate Research
" Construct validity
" Internal validity
" External validity
7
Construct Validity
! Construct Validity
" Whether the measure that is employed actually measures the construct it is
intended to measure
! Applicants for some jobs are required to take a Clerical Ability Test, and this test is
supposed to predict an individual’s clerical ability. The construct validity of such a
test is determined by whether it actually measures the clerical ability.
" The adequacy of the operational definition of variables
! The operational definition of variable indeed reflects the true theoretical meaning of
the variables
! Variables that are abstract constructs usually can be measured and manipulated in a
variety of ways and may not have a single perfect operational definition
8
Indicators of Construct Validity
! Face Validity
" The content of the measure appears to accurately assess the intended variable
! To measure one’s mental workload in performing a task , the item such as “how hard
did you have to work to accomplish your level of performance” appears to be more
closely related to the construct than the item “Do you know how to drive?”
" Not sufficient to conclude that a measure is in fact valid; appearance may not be
a good indicator of validity
! Most personality measures that appear in popular magazines typically have several
questions that look reasonable, but there is no empirical evidence to support the
conclusions drawn from the measures
9
Indicators of Construct Validity (Cont’d)
! Criterion-Oriented Validity
" Relationship between the measure and some criterion
" Types
! Predictive validity
! Concurrent validity
! Convergent validity
! Discriminant validity
! Predictive Validity
" The extent to which the measure predicts behavior on a criterion measured at a
time in the future
! The Law School Admissions Test (LSAT) is developed to predict success in law
school. The predictive validity of the LSAT is demonstrated when research shows
that people who score high on the test do better in law school than those who score
low on the test, i.e., there is a positive relationship between the test score and grades
in law school.
" Important in studying measures designed to improve ability to make prediction
10
Indicators of Construct Validity (Cont’d)
! Concurrent Validity
" The relationship between the measure and a criterion behavior at the same time
(concurrently)
! Whether two or more groups of people differ on the measure in expected ways
" Suppose you have a measure of shyness. Your theory of shyness may lead you
to expect that salespeople whose job requires making “cold calls” to potential
customers would score lower on the shyness measure than those in positions in
which potential customers must make the effort to contact the company
themselves
! How people with different scores on the measure behave differently
" You can ask people who score high versus low on the shyness scale to describe
themselves to a stranger while measuring their level of anxiety. You would
expect that people who score higher on the shyness would exhibit higher
amount of anxiety
! Whether the measure correlates well with another measure that has previously been
validated
" If your shyness scale gives similar results to another shyness measure which has
been validated in past investigation, then your shyness scale has concurrent
validity
11
Indicators of Construct Validity (Cont’d)
! Convergent Validity
" Theextent to which the measure in question is related to (convergent to) other
measures of the same construct or similar constructs
! One measure of shyness should correlate highly with another measure of shyness or
a measure of a similar construct such as social anxiety
! Discriminant Validity
" The degree to which the measure is not related (divergent from) measures of
other unrelated constructs
! A low correlation between a measure of shyness and measures of conceptually
unrelated interpersonal values such as valuing forcefulness with others would be an
evidence of discriminant validity
! Concluding Remark
" A single study is only one indicator of the validity of a measure; our confidence
in the validity of a measure is built up over time as numerous studies investigate
the theory of the particular construct being measured
12
Internal Validity
! Internal Validity
" The ability to draw conclusions about causal relationships from our data
! A study has high internal validity when strong inferences can be made that one
variable caused changes in the other variable
" Strongcausal inferences can be made more easily when the experimental
method is used
! Evaluation of Internal Validity
" Equivalence of groups on participant characteristics
! Random assignment of participants to groups is the best way to achieve equivalence
! Matching of participants on important characteristics (which may affect the
relationship between independent and dependent variables)
" Control of extraneous variables
13
Internal Validity (Cont’d)
! Threats to Internal Validity
" Inequality of groups
! Bias in assignment to groups
! Participant dropout or attrition during the study
" Effects of extraneous variable
! Extraneous environment events occur during the study
! Carry-over effects in repeated measures
! Instrument or observer inconsistency
! Ambiguous temporal precedence
" Does the independent variable occur before the dependent variable?
14
External Validity
! External Validity
" The extent to which the results can be generalized to other populations and
settings
! Whether the results can be replicated with other operational definitions of the
variables, with different participants, or in other settings
! Issues
" Artificiality of laboratory experiments is an issue of external validity
" Field experiments represent one way that researchers try to increase the external
validity of their experiments
" The goal of high internal validity may sometimes conflict with the goal of
external validity
" Sample selected should be representative of the entire population
15
Measurement Error
! Measurement error
" Any deviation from the “true value”
! Systematic Error
" Caused by factors that systematically affect measurement of the variable across
samples
" Tends to be consistently either positive or negative
" Referred to as “bias”
" Can be controlled using strategies such as frequent calibration and
randomization
! Random Error
" Caused by factors that randomly affect measurement of the variable across
samples
" It does not have any consistent effects across the entire sample population
" Referred to “noise”
" Difficult to control
" Leads to unreliability of measures 16
Additive Error Model
! Additive Error Model
" Appropriate to most human factors studies
! Attempts are made to determine the values of the variables of interest yet one is not
able to do so because of various errors in the measurement
" Easy to understand and familiar to most human factor practitioners
" Every measurement is a sum of two components: the true score of the measure
and random error
X*: the observed score
(1) X: the true score
!: random error, with mean 0 and variance
(2)
17
Reliability of Measures
! Reliability
" The consistency of measures obtained by individuals when reexamined with the
same criterion measure on different occasions or with different sets of equivalent
tasks (Salvendy & Carayon, 1997)
! A reliable measure of intelligence should yield the same result each time you
administer the intelligence test to the same person. The test would be unreliable if it
measured the same person as average one week, low the next, and bright the next
" Mathematically, the reliability of a measure is defined as the proportion of the
variability in the measure attributable to the true score
(Eq.
3)
18
97 103
Less random error
19
Reliability of Measures (Cont’d)
! The Importance of Reliability
" Researchers cannot use unreliable measures to systematically study variables or
the relationships among variables
" Trying to study human behavior using unreliable measures is a waste of time
because the results will be unstable and unable to be replicated
! How to Improve Reliability
" Use careful measurement procedures
! Carefully training observers to collect data or record behavior
! Paying close attention to the way questions are phrased
20
Access Reliability
! Use Correlation Coefficient
" Correlation coefficient indicates the strength and direction of a linear
relationship between two random variables
" Pearson product-moment correlation coefficient
! If we have a series of n measurements of X and Y , xi and yi (i = 1, 2, ..., n), then the
sample correlation of X and Y, rX,Y , can be estimated using the Pearson product-
moment correlation coefficient. It is the best estimate of rX,Y if X and Y are both
normally distributed.
(Eq.
4)
To assess the reliability of a measure, we need to obtain at least two scores on the
measure from multiple individuals
21
Types of Reliability Estimates
! Test-Retest Reliability
! Alternative-Form Reliability
! Inter-Rater Reliability
! Internal Consistency Reliability
22
Test-Retest Reliability
! Procedure
" Measure the same individuals at two points in time and then calculate the
correlation coefficient between the first and second test scores
! To test the reliability of an intelligence test, the test is given to a group of people on
one day and again a week later
! Advantage
" The procedure is simple and straightforward
! Disadvantage
" Theproblem of carry-over effects due to memory and/or practice which will
result in inflated estimates of reliability
23
Alternative-Form Reliability
! Procedure
" Administer two parallel forms of a test to the same group of individuals
" Two forms of a measuring instrument is considered parallel if an object's true
score is the same for both forms and if both forms produce equal means and
equal variances
! Advantage
" Helps to alleviate carry-over effects
! Disadvantage
" Difficult
to come up with two parallel forms, especially with personality
measures
24
Inter-Rater Reliability
! Access Reliability of Rating System
" The extent to which two or more individuals (coders or raters) agree in their
observations
! Two usability experts are asked to give a rating on the usability of a website
according to a sliding rating scale (1 being the worst, 5 being the best). If one expert
gives “1” to the usability of the website, whereas the other gives “5”, then the
interrater reliability of the rating scale would be quite low
" Depends on the ability of the raters to be consistent
! Training and education can help enhance inter-rater reliability
! Cohen’s Kappa (K)
(Eq. PO: Observed proportion of agreement
5) PC: Proportion of agreement predicted by chance
(Eq.
pmi: product of the ith row and column marginals
6)
25
Raters A and B are asked to evaluate the usability of 100 websites using a three-
point likert scale (1 being the worst, 3 being the best)
Ratings of B
1 2 3
1 20 5 10 35
Ratings row margin
2 10 30 10 50
of A
3 2 3 10 15
32 38 30
column margin
26
Internal Consistency of Multi-Item tests
! Multi-Item Tests
" Many psychological measures are made up of a number of different questions
(items)
! An intelligence test may have 100 items, a satisfaction questionnaire may have 10
items
! Internal Consistency Reliability
" The extent to which the items of a measuring instrument correlate with one
another
! All items measure the same variable, so they should yield consistent results
" Split-half
reliability
" Cronbach’s alpha
27
Split-Half Reliability
! Procedure
" The questions in the measuring instrument are divided in half, creating two
pseudo-parallel half-tests a and b
" Each of the two half-tests is scored on a number of individuals
" Calculate the correlation between the total scores of a and b, rab
" The half-test reliability is adjusted to estimate the overall reliability of the whole
instrument, rAB , using the Spearman-Brown formula
(Eq.
7)
rab = 0.6, then rAB = 2·0.6/(1+0.6) = 0.75
28
Cronbach’s Alpha(!)
! Most popular measure of internal consistency reliability
! Interpreted as the mean of all possible split-half coefficients
(Eq.
8)
k : the number of items
sample variance for item i
sample variance of the total test scores
Cronbach's " of 0.7 is a rule-of-thumb acceptable level of agreement
Calculate Cronbach’s alpha from JMP
Choose Analyze # Multivariate Methods # Multivariate and specify your
continuous columns. From the Multivariate pull-down menu select Item
Reliability # Cronbach's Alpha
29
Sources of Psychological Tests
! It is usually wise to use existing measures of psychological
characteristics rather than develop your own
! Existing measures have reliability and validity data to help you
decide which measure to use
! You can compare your findings with prior research that uses the
measures
! You should always report the reliability of any psychological
measure used in your study even if it is an existing one!
30