CHAPTER 1:
I.
1. Test is an instrument/method/technique used to measure a language ability/performance/knowledge
in a given domain ( in terms of language: grammar, speaking, …/ specific skills/knowledge or overall
language proficiency)
- A well-constructed method to accurately measure a person’s ability, knowledge or performance In a
given domain
- Method: explicit and structured
- Performance: test-takers’ ability or competence
2. Measurement: a process quantifying a student’s observed performance
- Quantitative: using scores – assingning numbers (rankings + letter grades)
- Qualitative: using written descriptions + oral feedback
3. Assessment: an on-going process of collecting information about a given performance, using systematic
or grounded procedures
- Intended or incidental; with(out) results
- Tests one technique/task/procedure a teacher uses to assess his/her students
4. Evaluation: using results of assessment instruments for decision making (judgement included)
- Convey meaning of the results
- Interpretation of the data/information
- With(out) measurement
- Not getting 50% of test result fail to pass the course
II. DICHOTOMIES IN ASSESSMENT
- Informal & formal
- Formative & summative
- Norm-referenced & criterion-referenced
- Direct & indirect
- Discrete-point & integrative
1. Informal and formal assessment
- Informal assessment: embedded in class tasks (without recording results and without making fixed
judgments about a student’s competence)
Ex: incidental, unplanned comments/responses, impromptu feedback, marginal comments, advice,
suggestions
- Formal assessment: systematic, planned techniques to give judgment of student achievement
+ exercises / procedures about a specific set of skills or knowledge
Ex: tests, journals, portfolios
2. Formative & summative assessment
- Formative:
+ evaluate students in the process of forming their competence/skills to help them continue that growth
+ occurring alongside or within instructions
+ the delivery (by the teacher) and the internalization (by the student) of the feedback on their performance
+ for the continuation (formation) of learning
+ informal assessment formative assessment
Ex: comments, suggestions, error identification
- Summative:
+ measuring or summarizing what a student has acquired
+ occurring at the end of a unit of instruction
+ focusing on how well a student has accomplished objectives
+ ex: final/proficiency exams
3. Norm-referenced & criterion referenced tests
- Norm-referenced
+ compare a student’s performance to each other’s
+ to measure a general language proficiency
+ normal distribution is around the mean
+ only top of the students can get A
+ test structure: long subtest with varied content
+ students may not know exactly what content will be on the test
+ large – scale standardized tests
- Criterion – referenced tests
+ compare a student’s performance to a criterion
+ scores not usually normally distributed
+ test structure: shorter subtests with similar item content
+ students should know what content to expect
+ measure specific language objectives
+ anyone scores 90% or higher gets an A
+ classroom tests
4. Direct and indirect testing
- Direct: asking the test takers to actually perform the target tasks
- Indirect: asking the test takers to perform the task related to the target tasks
5. Discrete point and integrative tests
- Discrete point:
+ testing component parts of language separately (skills and units of language)
+ decontextualized
- Integrative
+ test language competence as a unified set of interacting abilities
+ an integration of four skills and phonological, lexical, grammatical knowledge
Ex: dictation, cloze tests
III. TEST PURPOSES
- Language aptitude test
- Proficiency tests
- Placement tests
- Diagnostic tests
- Achievement tests
CHAPTER 2
I. PRACTICALITY
- Practicality = available resources / required resources
+ P >= 1 test development and use is practical
+ P < 1 test development and us is not practical
- The relationship between the rquired resources in designing, developing, and using the test and the
available resources for doing these activities
- Resources: human, material, time
- A test is practical when (costs, ease of administration, amount of time, ease of scoring
+ relatively easy to administer
+ not excessively expensive
+ scoring/evaluation procedure is specific and time-efficient
+ stay within the appropriate time constraint
II. RELIABILITY
1. Scores on test tasks with characteristics A Scores on test tasks with characteristics A’
2. Consistency and dependability
- Would a student get a similar score from the same test on a different day, on different items, from a
different teacher?
- Tests/assessment results affected by measurement error
- A reliable test
+ has consistent conditions across two/more administrations
+ has clear directions for scoring/evaluation
+ has uniform rubrics for scoring/evaluation
+ lend itself to consistent application of rubrics by scorer
+ contain items/tasks unambiguous to test takers
3. Factors affecting test reliability:
+ test administration
+ test
+ students
+ scoring
a. Student-related reliability
+ physical factor
+ psychological factors
+ test wiseness (strategies for efficient test taking)
b. Rater reliability
- Human error
- Subjectivity
- Bias
+ inter-rater reliability
+ intra-rater reliability
Halo effect: the phenomenon in which assessment is made based on the assessor’s posititve
impression of single characteristic (out of a whole)
Primacy effect: the phenomenon in which the assessor is influenced by their previous
impression/assessment of personality/performance
c. Test administration reliability
- Test site condition (light, noise, temperature)
- Test site facilities (chair, desk, CD player)
- Photocopying variations
d. Test reliability
- The nature of the test: duration & poorly written test item
Apply the principle
- Physical context: clear photocopied test sheets, audible sound amplification, visible video input, light,
extraneous noise, temperature conditions are equal to students, object scoring leaves little debate
about the correctness of an answer
- Intra-rater reliability:
+ use consistent sets of criteria for a correct response
+ pay uniform attention to the criteria during the evaluation time
+ read through the test at least twice to check for your consistency
+ mid-stream modifications go back and apply the standards to all
+ avoid fatigue by reading the tests in several settings or when the time requirement is a matter of several
hours
III. VALIDITY: the extent to which the inferences made from assessment results are appropriate,
meaningful, useful in terms of the purpose of assessment
1. A valid test
- Measure what is intended to measure (direct testing)
- Involve performance samples the test criterion (objective)
- Offer useful and meaningful information about the test taker’s abilities
- Not contain any contaminating/irrelevant variables
- Rely as much as possible on empirical evidence (performance)
- Is supported by theoretical rational or argument
Not valid test
Ex1: a writing test considers length of text, legibility of handwriting
Ex2: a reading test considers reading out loud, knowledge of a given topic
Ex3: a grammar test: writing only, lexical variety and complexity, idea development, organization
Not measure what’s intended to measure
Not testing directly
Not focus on what constitutes the ability/knowledge
2. The extent to which a test measures what is intended to measure
- Content-related evident (content validity)
- Criterion related evidence (criterion related validity)
- Construct related evidence (construct validity)
- Consequential validity
- Face validity
a. Content validity
- Is the test fully representative of what it aims to measure?
+ how many percent of pre-set content does the test cover?
+ a general English test cover what? Only reading, grammar, vocabulary, ok?
- Use direct testing
b. Criterion – related validity
- The extent to which the linguistics criteria (e.g. learning outcomes) and implied predetermined levels of
performance are reached (50% as minimal passing score)
- Criterion-referenced tests
Concurrent validity Predictive validity
Test results are supported by other concurrent Test results are used to gauge the future
performance performance
Language proficiency C1 results supported by IELTS Gram1-B1 & Gram2 – B2 results determine
results student’s readiness to move to AGC1
c. Construct validity
- Construct: A specific definition of an ability (not directly measurable but observable)
+ “the specific definition of an ability that provides a basis for a given assessment / assessment tasks /
interpreting scores derived from tasks”
Ex: language proficiency, communicative competence, academic writing
- Construct validity: the extent to which we can interpret a test score as an indicator of the constructs
we want to measure
+ the extent to which the content of the tests/assessment reflects current theoretical understanding of the
skills being assessed
+ associated with large-scale standardized tests
X: a speaking test only assessed in the use of grammar and vocab
d. Consequential validity ~ impact
- All the consequences of a test
+ accuracy in measuring the intended criterion
+ effects on test taker’s preparation for the test
+ social consequences of a test’s score interpretation and use
e. Face validity: the extent to which a student view the assessment as fair, relevant, and useful for
improving learning
- When students perceive the test to be valid
+ a well constructed, expected format with familiar test task
+ a test clearly doable within the allotted time limit
+ crystal clear instruction
+ clear, uncomplicated test tasks
+ related to course work
+ a difficulty level as a reasonable challenge
3. Applying the principle
a. Content validity
- Unit/course objectives are clearly identified and represented in the form of test specifications
- Test specification includes tasks that are featured/covered in classroom tasks & represent almost/all of
the unit objective
- Test task involve actual performance of the target tasks
- Test specification: the structure of a test that logically follows from lesson/unit/course being tested
+ a number of sections (corresponding to the objectives)
+ a variety of items types
+ a appropriate relative weighting for each section
b. Consequential validity
- Offering students review and preparation for the test
- Suggesting test taking strategies that will be beneficial
- Structuring the test so that the best students can be modestly challenged and the weaker students won
‘t be overwhelmed
- The test lending itself to giving beneficial washback
- Student encouraged to take the test as a learning experience
c. Face validity
- Clear directions
- Logically organized test structure
- Appropriate set difficulty level
- No surprise test
- Appropriate timing
IV. AUTHENTICITY
- Characteristics of the TLU task authenticity characteristics of the test task
+ the degree to of resemblance between test task and the target language use (TLU) task
- The degree of correspondence of the characteristic of a given language test task with the features of
the target test task
+ the language is as natural as possible
+ items are contextualized not isolated
+ topics are meaningful for the learner
+ some thematic organization to items is provided (story line)
+ test task represent real world tasks
V. WASHBACK: the effect of testing on teaching and learning (instructions and how students prepare
for the test) – an aspect of consequential validity/impact
- Testing – beneficial (promotion) and harmful (inhibition) effect on teaching and learning, individuals
and educational system
- Creating classroom tests as learning devices to acheve beneficial washback (incorrect and c orrect
responses)
- Commenting on test performance generously and specifically enhance washback
1. A test provide beneficial washback
- Having positive influence on what and how teachers teacher
- Having positive influence on what and how learners learn
- Offering a chance for learners to adequately prepare
- Providing conditions for learners’ peak performance
- Giving learners feedback to enhance their language development
- Being more formative in nature than summative
2. Apply the principle
- Test offer beneficial washback to the learner
+ …..
3. Applying principles to practice
CHAPTER 3:
I. LEARNING OUTCOMES: broad statements about intended student learning
- After thelesson/course/program has been completed
- In terms of the desired and product
+ what students should know & be able to demonstrate & the depth of learning expected
1. Develop out comes based on
- Levels of thinking
- Bloom’s Revised Taxonomy = a taxonomy of objectives for cognitive domains in six levels
+ remember: recall facts and basic concepts
+ understand: explain ideas or concepts
+ apply: use inf in new situations
+ analyse: draw connections among ideas
+ evaluate: justify a stand / decision
+ create: produce new/original work
II. ESSENTIAL CONCEPTS
1. Constructive alignment:
- a deisgn for teaching in which what students should learn and how they should express their learning
are clearly stated before the teaching takes place
- an outcome-based approach in which learning outcomes are clearly defined before the teaching takes
place
desired learning outcomes teaching and learning activities assessment task teaching and learning
activities desired learning outcomes
- classroom activities: what will I do and what will my students do?
- Learning outcomes: what do I want my students to learn
+ action verbs (performance)
+ learning statement (condition)
+ broader criteria
- Assessment: what will my students do to show that they have learnt
2. LOs
a. Number of Los
- 10 – 15 broadly stated Program LOS
- 5-8 broadly stated Course Los
- 2-4 Lesson Los
b. Don’t describe the activity objectives
c. Only 1 action verb for LO
d. Include criteria for the outcome
e. Arrange LOs in complexity order
f. Number LOs for reference
g. Using SMART ( specific, measurable, achievable, relevant, time-based)
3. Assessment plan: an overall guide of how we will assess student’s achievement of learning goals and
outcomes relevant to instruction
4. Designing classroom language test
a. Determin the test purpose (its role in the course)
+ evaluate overall proficiency
+ place Ss in the course
+ what impact before / after the test
b. Determine the ability to be assessded
+ based on the material used
+ the abilities must be assessable (performance for your observation, in llinguistic domain)
+ LOs: what ss should know and be able to do
+ construct: abilities to be assessed
c. Draw up test specification: a guiding plan, blueprint of a test; provide official statement about what the
test tests and how it tests it to ensure validity
+ ……….
d. Devise test items
- Elicitation format: the way in which test takers are required to interact with the test material
(elicitation mode) = way of interaction (format of input)
+ written + oral
Ex: writing a paragraph after reading 3 prompts to choose one
- Response format: the way in which test takers are required to respond to the test material = ways of
responding (format of response)
+ selected/constructed response
+ receptive/productive response
+ oral/written response
Ex: reading 50 MC items and choosing the right answer to fill in the gaps
WHICH FORMAT TO BE USED?
e. Administer the test
f. Scoring, grading, giving feedback
CHAPTER 4
I. MULTIPLE CHOICE ITEMS
- The simplest kind of item to construct
- Extremely difficult to design correctly
- Two principles supporting MC formats: practicality and reliability (easy an consistent process of scoring
and grading)
- Issues:
+ difficult to write good items
+ only test recognition knowledge
+ cheating possible
+ guessing affects test score
+ limit what to test
+ minimal beneficial washback
1. MC items:
+ receptive response, selective response items (supply items) (choosing from a set of responses rather
creating a response)
(also T/F and matching)
+ consisting of a stem & 3-5 options/alternatives
+ stem: present a stimulus, the text of the question
+ one key: the correct answer
+ distractors: the incorrect answers
a. Writing stems
- Narrowly focused on one problem, fact, concept, principle, process
- Stand on its own (upon reading the stems, the students with content mastery should have know the
correct answers without reading the options)
- Written as a positive expression (using negative expression Except, Not confusing)
+ used only when significant learning outcomes reqiure them
- FOUR PRINCIPLEs
+ design each item to measure a single objective/outcome
+ make the stem and options as direct and simple as possible
+ the intended key is the only correct one
+ use item indices to accept, discard, revise iitems (optional)
- Additional tips:
+ avoid negative in stems when possibles, idioms, absolutes
+ make all the options similar in grammar, style, complexity, lengthy
+ write the key first, make the distractors in a parallel style
+ when the stem is an incomplete statement, make sure the options follow a grammatically correct manner
+ avoid using ‘all above” “both”
+ use “none of the above” with caution
+ vary the positions of key, avoid creating a pattern
+ keep the specific content of items idependent of one another (test takers can use info in one question to
answer another validity)
CHAPTER 5: VOCABULARY
1. Words= identified by tokens and types
- Tokens: total number of words in a text
- Types (not counting words repeated): number of words of different types
- Vocabulary test not testing two derivative words (grammatical knowledge)
2. Words: content and function words
- Content words: nouns, verbs, adj, adv
Focus of vocab test
- Function words: prep, conjunctions, article
+ show association amongst content words
+ belong to the grammar of the language
3. Three general types of vocabulary items types
a. Limited response (for beginner, pointing at something, yes/no verbal answer)
b. Multiple choice completion (choosing a word to complete a sentence)
c. Multiple choice paraphrase (choosing a word closet in meaning to a given word in a sentence)