0% found this document useful (0 votes)

18 views45 pages

Understanding Tests and Assessments

The document discusses the definitions, types, and characteristics of tests, assessments, and evaluations in educational contexts. It emphasizes the importance of validity, reliability, and practicality in testing, while outlining various test types based on response mode, administration, and interpretation. Additionally, it details methods for estimating reliability and factors influencing it, providing a comprehensive overview of effective testing practices.

Uploaded by

Latifa Essaidi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views45 pages

Understanding Tests and Assessments

Uploaded by

Latifa Essaidi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Testing & Evaluation

What to Discuss
• Definition of a test
• Types of tests
• Characteristics of a good test (Validity,
Reliability & Practicality)
Test, Assessment & Evaluation
Evaluation

ASSESSMENT

TEST
Definition of a test
A test is a systematic procedure for measuring an
individual’s behavior (Brown, 1991). This
definition implies that it has to be developed
following specific guidelines. It is a formal and
systematic way of gathering information about
learners’ behavior, usually through paper-and-
pencil procedure (Airisan, 1989).
Definition of a test
 “A method of measuring a person’s ability,
knowledge or performance in a given domain.”
(Brown, 2004)
 Language tests are simply instruments or
procedures for gathering particular kinds of
information, typically information having to do
with students’ language abilities.”
(Norris, 2003)
Assessment
 Assessment is the process of identifying, gathering and
interpreting information about students' learning. The
central purpose of assessment is to provide information on
student achievement and progress and set the direction for
ongoing teaching and learning .
 It seeks to improve the quality of student learning, not to
provide evidence for evaluating or grading students. It
provides faculty with feedback about their effectiveness as
teachers, and it gives students a measure of their progress
as learners. It is created, administered, and analysed by
teachers in the classrooms.
Evaluation

Decision making about student performance and

about appropriate teaching strategies (Woolfolk,
2005, p. 504).
Differences between testing and assessment
Testing Assessment
Tests are prepared administrative Assessment is an ongoing
procedures that occur at process that encompasses a
identifiable times in a
curriculum. much wider domain.
When tested, learners know that A good teacher never ceases to
their performance is being assess students, whether
measured and evaluated. those assessments are
When tested, learners muster all incidental or intended.
their faculties to offer peak
performance. Whenever a student responds to
Tests are a subset of a question, offers a comment,
assessment. They are only one or tries out a new word or
among many procedures and structure, the teacher
tasks that teachers can
ultimately use to assess subconsciously makes an
students. assessment of the student’s
Tests are usually time- performance.
constrained (usually spanning Assessment includes testing.
a class period or at most
several hours) and draw on a Assessment is more extended
limited sample of behaviour. and it includes a lot more
components.
Types of tests

Numerous types of tests are used in school. There

are different ways of categorizing tests, namely:
ease of quantification of response, mode of
preparation, mode of administration, test
constructor, mode of interpreting results, and
nature of response ( Manarang & Manarang, 1993;
Louisell & Descamps, (1992).
Mode of response-based tests
 Oral test- it is a test wherein the test taker gives
his answer orally.
 Written test- it is a test where answers to
questions are written by test taker.
 Performance test- it is one in which the test
taker creates an answer or a product that
demonstrates his knowledge or skill, as in cooking
and baking.
Ease of quantification of response-
based test
 Objective test- it is a paper and pencil test
wherein students’ answers can be compared and
quantified to yield a numerical score.
 Subjective test- it is a paper-and-pencil test which
is not easily quantified as students are given the
freedom to write their answer to a question, such
as an essay test.
Mode of administration-based tests
 Individual test- it is a test administered to one
student at a time.

 Group test- it is one administered to a group of

students simultaneously.
Constructor-based tests

Standardized test- it is a test prepared by an expert

or specialist and administered to representative
populations of similar individuals to obtain
normative data.
Ex: TOEFL is an example of standardized tests.
 Un-standardized test- it is one prepared by
teachers for use in the classroom, with no
established norms for scoring and interpretation of
results. It is not accompanied by normative data
Mode of interpreting results-based tests

 Norm-referenced test- it is a test that evaluates a

student’s performance by comparing it to the performance
of a group students on the same test.
 Criterion-referenced test compares all the testees to
a predetermined criterion. In such a test everybody whose
achievement comes up to the pre-set criterion will receive
a pass mark, while those under it will fail. The criteria are
often set in terms of tasks that students have to be able to
perform (e.g. to interact with an interlocutor with ease; to
ask for information and understand instructions).
Nature of answer-based test
 Personality test- it is a test designed for
assessing some aspects of an individual’s
personality.
 Intelligence test- it is a test that measures the
mental ability of an individual.
 Aptitude test- it is test designed for the purpose
of predicting the likelihood of an individual’s
success in a learning area or field of endeavor .
 Achievement test- it is a test given to students to
determine what a student has learned from formal
instruction in school.
 Summative test- it is a test given at the end of
instruction to determine students’ learning and
assign grades.
 Diagnostic test- it is a test administered to
students to identify their specific strengths and
weaknesses in past and present learning.
 Formative test- it is a test given to improve
teaching and learning while it is going on.
Nature of the question-based tests
 Direct test: candidates are required to perform the skill
the test intends to measure.
 Indirect test wants to measure skills that underlie
performance in a particular task.
 In a discrete point test every item focuses on one
clear-cut segment of the target language without involving
the others Typical test format: written multiple-choice test.
 In an integrative test candidates need to use a number
of language elements at the same time in completing the
test tasks. For example: essay writing, dictation, cloze test.
Factors to consider in testing
Validity

A test is said to be valid if it measures

accurately what it is intended to measure .
Content validity
 A test is said to have content validity if its content
constitutes a representative sample of the language
skills, structures, etc., with which it is meant to be
concerned.
 In order to judge whether or not a test has content
validity, we need a specification of the skills or
structures etc., that is meant to cover. Such a
specification should be made at a very early stage
in test construction.
 It isn’t to be expected that everything in the
specification will always appear in a single test.
But it will provide the test constructor with the
basis for making a principled selection of elements
for inclusion in the test. A comparison between
test specification and test content is the basis for
judgments as to content validity.
Criterion-related validity
 Also referred to as instrumental validity, it states
that the criteria should be clearly defined by the
teacher in advance. It has to take into account
other teachers´ criteria to be standardized and it
also needs to demonstrate the accuracy of a
measure or procedure compared to another
measure or procedure which has already been
demonstrated to be valid.
 There are essentially two kinds of criterion-related
validity: Concurrent validity and Predictive
validity
Concurrent validity
Concurrent validity is a statistical method using correlation.
Examinees who are known to be either masters or non-
masters on the content measured by the test are identified
before the test is administered. Once the tests have been
scored, the relationship between the examinees’ status as
either masters or non-masters and their performance (i.e.,
pass or fail) is estimated based on the test. This type of
validity provides evidence that the test is classifying
examinees correctly. The stronger the correlation is, the
greater the concurrent validity of the test is.
Predictive validity
Predictive validity concerns the degree to which a
test can predict candidates’ future performances
as masters or non-masters. An example would be
how a proficiency test could predict a student’s
ability to cope with a graduate course at an
American university.
Construct validity
 A test, part of test, or a testing technique is said
to have construct validity if it can be demonstrated
that it measures just the ability which is supposed
to measure. The word ‘ construct’ refers to
underlying ability (or trait) which is hypothesised
in a theory of language ability.
Face validity
A test is said to have face validity if it looks as if it measures
what it is supposed to measure. For example, a test which
pretended to measure writing ability but which didn’t require
the examinees to write might be thought to lack face validity.
Face validity is hardly scientific, yet it is important. It’s not
investigated through formal procedures and is not
determined by subject experts. Instead, anyone who looks
over the test including testees and other stakeholders, may
develop an informal opinion as to whether or not the test is
measuring what it calims to measure.
Reliability
Reliability
 Reliability is the extent to which an experiment, test, or
any measuring procedure shows the same result on
repeated trials. It implies the extent to which a test is
repeatable and yields consistent scores.
 A test is said to be reliable if we get almost the same
results repeatedly.
 A test is said to be unreliable if one’s scores might
fluctuate from one administration to the other. That is,
one’s score on various administrations will be
inconsistence.
Student Obtained score Score which would
have obtained on
the following day
Samir 15 15,5
Imane 12 11
Rania 14 14
Adil 9 10
Hicham 15 14
Marwa 10 10
Sami 8 9
Rajaa 13 13,5
Ayman 16 16
Hajar 14,5 15,5
Manar 8 8
Ahmed 10 11
Reda 11 10,5
Salah 12 12
Student Obtained score Score which would
have obtained on
the following day
Samir 15 11
Imane 12 7
Rania 14 17
Adil 9 15
Hicham 15 6,5
Marwa 10 17
Sami 8 14
Rajaa 13 9
Ayman 16 16
Hajar 14,5 10
Manar 8 13
Ahmed 10 18
Reda 11 6
Salah 12 16
Student Obtained score Score which would
have obtained on
the following day
Samir 15 16,5
Imane 12 12,5
Rania 14 16
Adil 9 12
Hicham 15 16
Marwa 10 13
Sami 8 10
Rajaa 13 14
Ayman 16 16,5
Hajar 14,5 15
Manar 8 10
Ahmed 10 12
Reda 11 11
Salah 12 13
 The notion of consistency of one’s score with respect to
one's average score over repeated administration is the
central concern on the concept of reliability.

 The change in one’s score is inevitable. Some of the

changes might represent a steady increase in one’s score.
The increase would most likely be due to some sort of
learning. This kind of change, which would be predictable,
is called systematic variation.

 The systematic variation contributes to the reliability and

the unsystematic variation, which is called error variation ,
contributes to the unreliability of a test.
True Score

Let’s assume that someone takes a test. Since all

measurement devices are subject to error, the score one
gets on a test cannot be true manifestation of one’s ability
in that particular trait. In other words, the score contains
one’s true ability along with some error. If this error part
could be eliminated, the resulting score would represent an
errorless measure of that ability. By definition, this
errorless score is called a “true score”.
Observed score

The true score is almost always different from the score

one gets, which is called the “observed score”. Since the
observed score includes the measurement error, i.e., the
error score, it can be greater than, equal to, or smaller than
the true score. If there is absolutely no error of
measurement, the observed score will equal the true score.
However, when there is a measurement error, which is
often the case, it can lead to an overestimation or an
underestimation of true score.
Therefore, if the observed score is represented by X, the
true score by T and the error score by E, the relationship
between the observed and true score can be illustrated as
follows:

X=T or
X>T or
X<T
Methods of Estimating
Reliability
Test-Retest Method
 In this method reliability is obtained through
administering a given test to a particular group twice and
calculating the correlation between the two sets of score
obtained from the two administrations.

 Since there has to be a reasonable amount of time

between the two administrations, this kind of reliability is
referred to as the reliability or consistency over time.
Test-Retest

Parallel-form Method
 In the parallel-form method, two similar, or parallel forms of the
same test are administrated to a group of examinees just once.
 The two form of the test should be the same. It means all the
elements upon which test items are constructed should be the
same in both forms. For example if we are measuring a particular
element of grammar, the other form should also contain the same
number of items on the same elements of grammar.
 Subtests should also be the same, i.e., if one form of the test has
tree subsection of grammar, vocabulary, and reading
comprehension, the other form should also have the same
subsections with the same proportions.
Split-Half Method
 In split-half method the items comprising a test are
homogeneous. That is, all the items in a test attempt to
measure elements of a particular trait, E.g., tenses,
propositions, other grammatical points, vocabulary,
reading and listening comprehension, which are all
subparts of the trait called language ability.
 In this method, when a single test with homogeneous items
is administrated to a group of examinees, the test is split,
or divided, into two equal halves. The correlation between
the two halves is an estimate of the test score reliability
Which method should we use?
It depends on the function of the test.
 Test-retest method is appropriate when the consistency of
scores a particular time interval (stability of test scores over
time) is important
 The Parallel-forms method is desirable when the
consistency of scores over different forms is of importance.
 When the go-togetherness of the items of a test is of
significance (all test items measure the same construct) (the
internal consistency), Split-Half will be the most appropriate
methods
Factors Influencing Reliability

 The Effect of Testees: Since human beings are dynamic creatures, the
attributes related to human beings are also dynamic. The implication is that
the performance of human beings will, by their very nature, fluctuate from
time to time, or from place to place. (e.g., students misunderstanding or
misreading test directions, noise level, distractions, and sickness) can cause
test scores to vary.
 The Effect of Test Factors:
1)Test length. Generally, the longer a test is, the more reliable it is, however
the length is up to a point.
2) Item difficulty. When there is little variability among test scores, the
reliability will be low. Thus, reliability will be low if a test is so easy that
every student gets most or all of the items correct or so difficult that every
student gets most or all of the items wrong.
 The effect of Administration Factors: Poor or unclear
directions given during administration or inaccurate scoring can affect
reliability.
Practicality
It refers to the economy of time, effort and money in testing.
In other words, a test should be…

 Easy to design
 Easy to administer
 Easy to invigilate
 Easy to score
 Easy to interpret (the results)
THANK YOU!

Understanding Test Characteristics and Validity
No ratings yet
Understanding Test Characteristics and Validity
57 pages
Language Testing and Evaluation Principles
No ratings yet
Language Testing and Evaluation Principles
45 pages
Understanding Test Characteristics and Validity
No ratings yet
Understanding Test Characteristics and Validity
57 pages
Language Testing and Assessment Overview
No ratings yet
Language Testing and Assessment Overview
79 pages
Designing LAQ, SAQ, MCQ Tests
No ratings yet
Designing LAQ, SAQ, MCQ Tests
67 pages
Illustrative Methods in Assessment
No ratings yet
Illustrative Methods in Assessment
155 pages
Types of Educational Tests Explained
No ratings yet
Types of Educational Tests Explained
4 pages
Types of Educational Tests Explained
No ratings yet
Types of Educational Tests Explained
16 pages
Understanding Assessment and Testing Types
No ratings yet
Understanding Assessment and Testing Types
40 pages
Assessment Concepts in Education
No ratings yet
Assessment Concepts in Education
18 pages
Understanding Measurement and Testing in Education
No ratings yet
Understanding Measurement and Testing in Education
27 pages
Types and Uses of Educational Tests
No ratings yet
Types and Uses of Educational Tests
5 pages
Understanding Testing and Assessment
No ratings yet
Understanding Testing and Assessment
34 pages
Understanding Tests and Assessments
No ratings yet
Understanding Tests and Assessments
32 pages
Classification of Educational Tests
100% (2)
Classification of Educational Tests
3 pages
Assessment, Testing, Measurement and Evaluation
No ratings yet
Assessment, Testing, Measurement and Evaluation
20 pages
Understanding Language Assessment Basics
No ratings yet
Understanding Language Assessment Basics
6 pages
Principles of Language Assessment
No ratings yet
Principles of Language Assessment
9 pages
Effective Testing and Assessment Methods
No ratings yet
Effective Testing and Assessment Methods
30 pages
Non-Evaluative Tests in Assessment
No ratings yet
Non-Evaluative Tests in Assessment
12 pages
Testing and Assessment Concepts
No ratings yet
Testing and Assessment Concepts
7 pages
Language Testing: Principles and Functions
No ratings yet
Language Testing: Principles and Functions
34 pages
Educational Assessment Purposes & Methods
No ratings yet
Educational Assessment Purposes & Methods
6 pages
Understanding Assessment and Measurement
No ratings yet
Understanding Assessment and Measurement
3 pages
Understanding Tests and Assessments
No ratings yet
Understanding Tests and Assessments
23 pages
Nature Purpose and Relevance of Assessment
No ratings yet
Nature Purpose and Relevance of Assessment
7 pages
Understanding Language Assessment Methods
No ratings yet
Understanding Language Assessment Methods
10 pages
Understanding Assessment Concepts
No ratings yet
Understanding Assessment Concepts
34 pages
Productive and Unproductive Test Uses
100% (1)
Productive and Unproductive Test Uses
61 pages
Effective Language Assessment Strategies
No ratings yet
Effective Language Assessment Strategies
172 pages
Validity in Educational Assessment
No ratings yet
Validity in Educational Assessment
24 pages
Test and Measurement
No ratings yet
Test and Measurement
74 pages
Record 4 (1)
No ratings yet
Record 4 (1)
76 pages
Key Concepts in Educational Assessment
100% (3)
Key Concepts in Educational Assessment
6 pages
Understanding Testing vs. Assessment
67% (3)
Understanding Testing vs. Assessment
23 pages
Language Assessment: Key Concepts Explained
No ratings yet
Language Assessment: Key Concepts Explained
14 pages
Understanding Tests and Assessments
No ratings yet
Understanding Tests and Assessments
4 pages
Types of Educational Tests Explained
No ratings yet
Types of Educational Tests Explained
4 pages
Standardized vs Non-Standardized Tests
No ratings yet
Standardized vs Non-Standardized Tests
24 pages
Testing Basics for Educators
No ratings yet
Testing Basics for Educators
3 pages
Assessment Concepts and Evaluation Types
No ratings yet
Assessment Concepts and Evaluation Types
8 pages
Key Testing Principles: Reliability & Validity
No ratings yet
Key Testing Principles: Reliability & Validity
77 pages
Assessment Strategies in Education
No ratings yet
Assessment Strategies in Education
20 pages
Testing and Evaluation in Education
100% (5)
Testing and Evaluation in Education
11 pages
Test Construction Guidelines for Educators
100% (1)
Test Construction Guidelines for Educators
120 pages
Types and Distinctions of Educational Tests
100% (2)
Types and Distinctions of Educational Tests
3 pages
Learning Evaluation Effectiveness Guide
No ratings yet
Learning Evaluation Effectiveness Guide
7 pages
Introduction to Tests and Measurement
100% (2)
Introduction to Tests and Measurement
20 pages
Language Testing: Principles and Practices
No ratings yet
Language Testing: Principles and Practices
18 pages
B.Ed Semester II: Assessment of Learning
No ratings yet
B.Ed Semester II: Assessment of Learning
93 pages
Evaluation and Assessment
No ratings yet
Evaluation and Assessment
24 pages
Understanding Evaluation and Assessment
No ratings yet
Understanding Evaluation and Assessment
11 pages
Understanding Measurement and Evaluation
No ratings yet
Understanding Measurement and Evaluation
4 pages
Assessment Concepts and Issues Overview
No ratings yet
Assessment Concepts and Issues Overview
21 pages
Effective Assessment Strategies in Education
100% (1)
Effective Assessment Strategies in Education
47 pages
Significance of Assessment in Education
No ratings yet
Significance of Assessment in Education
12 pages
AutoCAD 2D Drawing Tips for Beginners
No ratings yet
AutoCAD 2D Drawing Tips for Beginners
19 pages
Screw Air Compressor User Manual
No ratings yet
Screw Air Compressor User Manual
44 pages
Cruise Tourism Development in Gujarat
No ratings yet
Cruise Tourism Development in Gujarat
21 pages
Heavy Machinery Vocabulary Guide
100% (3)
Heavy Machinery Vocabulary Guide
1 page
Rapid Prototyping Data Formats & Software
No ratings yet
Rapid Prototyping Data Formats & Software
19 pages
Uttarakhand Technical University Registration Form
No ratings yet
Uttarakhand Technical University Registration Form
2 pages
Community Organization in Social Work
No ratings yet
Community Organization in Social Work
172 pages
Aptitude Questions and Solutions
100% (2)
Aptitude Questions and Solutions
116 pages
PSF.001 Dialysis Line
No ratings yet
PSF.001 Dialysis Line
59 pages
Ftbs 2
No ratings yet
Ftbs 2
416 pages
A New Look - FULL Big Band - Amy
100% (3)
A New Look - FULL Big Band - Amy
59 pages
Arbitration Institutions in India
100% (1)
Arbitration Institutions in India
26 pages
10 Socialscience23 24 sp06
No ratings yet
10 Socialscience23 24 sp06
14 pages
Sundar Pichai: Leadership and Legacy
No ratings yet
Sundar Pichai: Leadership and Legacy
11 pages
High School vs. College Learning Differences
No ratings yet
High School vs. College Learning Differences
9 pages
Empowerment and Voice in Job Performance
No ratings yet
Empowerment and Voice in Job Performance
18 pages
Cash Flow Enhancement for Funding Sources
No ratings yet
Cash Flow Enhancement for Funding Sources
2 pages
MS21919 Cushioned Loop Clamp Specs
No ratings yet
MS21919 Cushioned Loop Clamp Specs
1 page
Mechanical Measurements Exam Paper 2021
No ratings yet
Mechanical Measurements Exam Paper 2021
1 page
Breather Requirements for Terex Equipment
No ratings yet
Breather Requirements for Terex Equipment
1 page
Jupiter BT2 EN
No ratings yet
Jupiter BT2 EN
61 pages
Thai Fruit Export Marketing Strategies
No ratings yet
Thai Fruit Export Marketing Strategies
20 pages
Banker's Algorithm Simulation Tool
No ratings yet
Banker's Algorithm Simulation Tool
8 pages
Tertiary Control for DC-DC Efficiency
No ratings yet
Tertiary Control for DC-DC Efficiency
12 pages
Off-Road Vehicle Design Considerations
No ratings yet
Off-Road Vehicle Design Considerations
8 pages
Senior Associate Finance Job Description
No ratings yet
Senior Associate Finance Job Description
36 pages
Food Hygiene and Safety Practices
88% (8)
Food Hygiene and Safety Practices
38 pages
High-Growth Firms in Massachusetts
No ratings yet
High-Growth Firms in Massachusetts
22 pages
Waiting on a Miracle Lyrics
100% (1)
Waiting on a Miracle Lyrics
7 pages
BCA Backlog Exam Schedule Nov-Dec 2025
No ratings yet
BCA Backlog Exam Schedule Nov-Dec 2025
1 page

Understanding Tests and Assessments

Uploaded by

Understanding Tests and Assessments

Uploaded by

Testing & Evaluation

Decision making about student performance and

Numerous types of tests are used in school. There

 Group test- it is one administered to a group of

Standardized test- it is a test prepared by an expert

 Norm-referenced test- it is a test that evaluates a

A test is said to be valid if it measures

 The change in one’s score is inevitable. Some of the

 The systematic variation contributes to the reliability and

Let’s assume that someone takes a test. Since all

The true score is almost always different from the score

 Since there has to be a reasonable amount of time

You might also like