Kirkpatrick 4-Level Deep: How to Apply Training Evaluation in Indonesia (New World Model, Backward Design, Required Drivers)
Operational Kirkpatrick 4-level guide: from Donald Kirkpatrick's 1959 model to the New World Kirkpatrick Model (Jim & Wendy Kirkpatrick, 2016), the backward-design principle (start at L4), workplace required drivers, instruments per level, a 30/60/90-day measurement schedule, common mistakes, and adaptation for Indonesia.
Neksus Research Team
Corporate training curation research β Neksus
Short answer: The Kirkpatrick Model measures training at four levels: Reaction (L1), Learning (L2), Behavior (L3), Business Results (L4). Introduced by Donald L. Kirkpatrick through four articles in ASTD's Training Director's Journal in 1959, and expanded by Jim Kirkpatrick & Wendy Kayser Kirkpatrick into the New World Kirkpatrick Model (2016) which adds the backward-design principle (design from L4 to L1) and required drivers (workplace reinforcers). To make it work in Indonesia: tie L2 to SKKNI, treat the direct manager as the primary required driver (hierarchical culture functions as an asset), measure L3 after 3β6 months, triangulate instruments, and connect to Phillips ROI Level 5 for budget justification.
Most Kirkpatrick articles stop at the four definitions and one smile sheet. That is enough to learn the terms, yet too thin to run the method. This guide closes that gap: the 1959 model to the 2016 New World Model (what changed and why it matters), the backward-design principle with worked examples, required drivers with a concrete list, instruments per level with a 30/60/90-day schedule, common mistakes, academic critiques and how to answer them, and Indonesian-context adaptation.
Intended readers: HR / HC / L&D / SDM and unit leaders who design, buy, or approve training evaluation β in private companies, BUMN/BUMD, government agencies, institutions, associations, and non-profits.
Quick navigation
- Short history: from 1959 to the 2016 New World Model
- Four levels (Reaction, Learning, Behavior, Results)
- Backward-design principle (start from Level 4)
- Required drivers: workplace reinforcers
- Instruments per level + measurement schedule
- Level 3 deep: measuring behavior transfer
- Level 4 deep: leading vs lagging indicators
- Academic critiques & how to answer them
- Indonesian-context adaptation (SKKNI, hierarchy, RKAP)
- Relation to TNA, ADDIE, Phillips ROI, 70-20-10
- Worked program example L1βL4 (illustrative)
- Ten common mistakes & how to avoid them
- Kirkpatrick implementation checklist
- FAQ
- Next step
Short history: from 1959 to the 2016 New World Model
Donald L. Kirkpatrick wrote his 1954 PhD dissertation at University of WisconsinβMadison on the evaluation of supervisor training. Five years later (1959), he published four sequential articles in the Training Director's Journal of the American Society for Training and Development (ASTD) laying out the four levels β one article per level. The term "Kirkpatrick Model" came later; in his early publication Donald called them "four steps".
1994 β Donald published Evaluating Training Programs: The Four Levels, which became the field standard for two decades.
2016 β His son Jim Kirkpatrick and daughter-in-law Wendy Kayser Kirkpatrick published Kirkpatrick's Four Levels of Training Evaluation, introducing the New World Kirkpatrick Model. The four levels stay; the additions:
- Backward-design principle: design from Level 4 (target business result) to Level 1, then measure forward from Level 1 to Level 4.
- Required drivers: processes, systems, and management support that keep new behavior alive after class.
- Level 3 emphasis as central, because without behavior change L4 does not occur.
- Leading vs lagging indicators at Level 4: short-term aggregate behavior proxies (leading) and long-term business outcomes (lagging).
Rule of thumb: Original Kirkpatrick is a map of the territory; the New World Model is a road map. The first names what to measure; the second explains how to make it actually work.
Four levels (Reaction, Learning, Behavior, Results)
| Level | What is measured | Core question |
|---|---|---|
| L1 Reaction | Participant perception of relevance, engagement, intent to apply | "Did participants find this useful and intend to apply it?" |
| L2 Learning | Knowledge, skill, attitude, confidence, commitment gain | "What do participants now know/do that they didn't?" |
| L3 Behavior | Application on the job | "Are participants actually doing what they learned in class?" |
| L4 Results | Targeted business indicators | "Did the business results that motivated this training move?" |
Each level builds on the previous but is not always causal: participants can be satisfied (L1) without learning (L2), or learn without applying (L3). This naive causal assumption is what academics critique (Holton 1996, Bates 2004) and what the New World Model answers via required drivers bridging L2 to L3.
New World additions per level
- L1: from "satisfaction" β engagement + relevance + commitment. New questions: "How confident are you that you will apply this?" and "What will get in the way?"
- L2: from "knowledge" β knowledge + skill + attitude + confidence + commitment. Attitude and commitment matter as much as knowledge for transfer to L3.
- L3: explicitly adds required drivers as a core part of L3 architecture.
- L4: separates leading indicators (fast signals: behavior frequency, stakeholder satisfaction) from lagging indicators (final outcomes: revenue, turnover, NPS).
Backward-design principle (start from Level 4)
The strongest New World Kirkpatrick concept. Design the program from the back: define L4 first, then L3, L2, L1.
Four backward-design steps
- L4 β Target business result. Specific indicators that must move. Example: first-line-manager turnover down 20% in 12 months; average support-ticket close time down from 8h to 5h; client NPS up from 32 to 50.
- L3 β Required behaviors. A list of 3β7 specific, observable behaviors that, done consistently, produce L4. Example for manager turnover: "Manager runs weekly one-on-ones with each team member β₯ 30 minutes with a development focus that goes well beyond status updates." Concrete observable behaviors take precedence over abstract competencies.
- L2 β Required knowledge/skills/attitudes. What must participants know, be able to do, and believe to make L3 possible? Example: GROW coaching technique, active listening, constructive feedback delivery, the attitude that team development is the manager's primary job (not a distraction).
- L1 β Learning experience. What kind of experience grows L2 and builds the commitment to apply? Modules, methods (simulation, role-play, case study), duration, facilitator, schedule.
Consequences of backward design
- Every module ties to a business indicator. No "interesting but unclear contribution" modules.
- Vendors design programs end-to-end. First question to the vendor: "Which business indicators will move and what behaviors will harden them?" β answered with a system, going well beyond a class catalog.
- Budget is justifiable. Conversation with the CFO: "To move KPI X by Y, we need behavior Z in the field; this program produces Z; cost = β¦; expected benefit = β¦"
- Measurement is built in. L1βL4 are designed into the program from day one.
Rule of thumb: Without backward design, classes are built from L1 ("what will be engaging?") and L4 impact is incidental. With backward design, classes are built from L4 and L1 is a consequence.
Required drivers: workplace reinforcers
The New World Kirkpatrick names what many L&D teams sense but rarely systematize: workplace environment determines whether learning becomes behavior. Without required drivers, participants return to work with good intentions and old habits win in weeks.
Four categories of required drivers
| Category | Concrete examples for a first-line-leadership program |
|---|---|
| Reinforce | GROW coaching job aid on laptops; weekly nudge email from L&D; micro-learning video library |
| Encourage | Direct manager weekly check-ins on application; monthly peer-coaching circle; monthly success story in the newsletter |
| Reward | Manager KPIs include 360 team scores; annual bonus weighs team engagement; public recognition for high-retention managers |
| Monitor | One-on-one frequency dashboard; monthly pulse survey to team members; quarterly L&D review |
Practical rules for required drivers
- Shared ownership. Vendor designs; HR/L&D facilitates; participant's direct manager + unit sponsor own the required drivers.
- Built into the program from intake. A "manager briefing" module before the participant class matters as much as the participant class itself.
- Consequence without required drivers: a program "succeeds" at L2 (participants learn) but fails at L3 (they do not apply). Classic.
Rule of thumb: Kirkpatrick Partners research is consistent: workplace environment contributes more to behavior transfer than class quality itself. A budget that is 70% class and 0% required drivers is half a budget.
Instruments per level + measurement schedule
| Level | Common instruments | When measured |
|---|---|---|
| L1 Reaction | Short smile sheet (relevance, clarity, intent-to-apply, barriers), session NPS, engagement observation | Immediately after session (last 10β15 minutes or within 24 hours) |
| L2 Learning | Pre/post knowledge test, skill demonstration, rubric-scored role-play, structured simulation, competency assessment (vs SKKNI where relevant) | Pre: before session. Post: immediately after or within 1β2 weeks |
| L3 Behavior | Manager/peer observation with checklist, work-product sampling, application interviews/FGDs, 180/360 surveys, system data (CRM, process KPIs) | 3β6 months after training (30/60/90 staged observation) |
| L4 Results | Target business KPIs (revenue, turnover, NPS, time-to-X, error rate), leading & lagging indicators | 6β12 months after training |
Instrument rules
- L1, do not go generic. Replace "Are you satisfied?" with "What is one specific thing you will apply next week? What will get in the way?"
- L2 requires a baseline. A pre-test before the session turns the post-test from "a high meaningless score" into "a measured gain".
- L3 measured late enough. Measuring two weeks post-class yields unreliable data because behavior has not settled (Kirkpatrick Partners). Use the 3β6-month window.
- L4 requires triangulation. A single aggregate KPI is influenced by many factors; combine leading (behavior proxy) and lagging (business outcome) plus an isolation method (control group or trend line from Phillips ROI).
Level 3 deep: measuring behavior transfer
L3 is the hardest level and the most-skipped, yet the most decisive. Three components measure it correctly:
(a) Define observable behaviors
Concrete observable behaviors win over abstract competencies. Weak: "Manager is capable of coaching." Strong: "Manager runs weekly one-on-ones β₯ 30 minutes with each team member, using the GROW framework, and logs follow-ups in HRIS."
(b) Triangulate instruments
| Instrument | What it captures | Bias |
|---|---|---|
| Participant self-survey | Participant's perception of own application | Positive bias (overestimate) |
| Manager observation | Behavior visible to the manager | Selection bias (only when manager is around) |
| 360 from subordinates/peers | Experience of the people who receive the behavior | More accurate for interpersonal behavior |
| Work-product sampling | Tangible evidence (proposals, recorded conversations, tickets) | Objective but availability-limited |
| System data (CRM/HRIS) | Frequency/quality of activity | Objective, real-time, expensive to collect ad-hoc |
Triangulate at least 2 instruments β one qualitative (interview/observation) and one quantitative (system data or 360).
(c) 30/60/90-day cadence
- 30 days: check intent & early barriers (mid-course correction).
- 60 days: observe early behavior; are required drivers working?
- 90 days: primary L3 measurement; behavior has settled enough.
- 6 months: confirmation measurement before L4.
Required drivers determine L3
Strong L3 comes from strong required drivers. When L3 is weak, the first question is not "was the class good?" but "did the direct manager support application? Does the system capture new behavior? Do KPIs reward target behavior?"
Level 4 deep: leading vs lagging indicators
| Type | Examples | Characteristics |
|---|---|---|
| Leading (early signals) | One-on-one frequency, diagnostic conversations per week, quarterly stakeholder satisfaction | Move fast, aggregate behavior proxies |
| Lagging (final outcomes) | Annual turnover, annual client NPS, revenue per FTE, market share | Move slowly, the target business outcomes |
L4 rules
- Pick existing KPIs. Do not invent KPIs only for training β collection cost is high and continuity low. Use indicators already monitored in RKAP / business dashboards.
- Isolate training effect. Aggregate KPIs are influenced by many factors (market, product, incentives). Use Phillips ROI methods (control group if possible, trend-line analysis, or participant estimation) to separate training's contribution.
- Leading first, lagging follows. Leading indicators (3β6 months) signal whether lagging (6β12 months) will move β time for correction before final results.
See the Phillips ROI Level 5 guide for isolation methods and L4 monetization.
Academic critiques & how to answer them
| Critique | Source | Practical answer |
|---|---|---|
| Model is not a theory β does not explain causality | Holton (1996) "From Training to Performance Improvement" | Use it as a framework; pair with TNA (cause of need) + required drivers (cause of transfer) |
| Inter-level causal assumption not always valid | Bates (2004) | Triangulate instruments; do not assume high L1 β high L4 |
| Environmental factors ignored | Tannenbaum, Cannon-Bowers, Mathieu | New World required drivers answer this explicitly |
| Does not explain cost | Several academics | Phillips ROI Level 5 adds the cost dimension |
| Bottom-up bias (starting from L1) | Wendy & Jim Kirkpatrick | Backward design β start from L4 answers this |
Healthy synthesis: Kirkpatrick is a measurement map. Pair with TNA for cause, required drivers for transfer, Phillips ROI for cost justification, and 70-20-10 for learning realism. Together they form a whole engine.
Indonesian-context adaptation (SKKNI, hierarchy, RKAP)
(1) Tie L2 to SKKNI
For technical and compliance roles, measure L2 against relevant SKKNI competency units alongside internal scores. If competency certification is needed, connect with a BNSP-licensed LSP assessment path. This makes L2 auditable and links training to formal career paths. See the TNA guide for mapping competencies to SKKNI.
(2) Leverage hierarchical culture as an asset
In many Indonesian organizations, the direct manager has strong influence on subordinate behavior. Treat this as an asset for Kirkpatrick:
- Manager briefing before the participant class β managers are told what participants will learn and asked to serve as required drivers.
- Manager assigns application tasks after class (job aid + check-in schedule).
- 360 scores from subordinates feed manager performance reviews so the "give coaching" required driver is structurally rewarded.
(3) Tie L4 to RKAP / performance contracts
For BUMN and government agencies, business indicators often tie to RKAP (BUMN) or Echelon I/II performance contracts (government). L4 KPIs used in training should be a subset of KPIs already monitored β to avoid adding reporting burden via newly-invented indicators.
(4) Instrument language
- L1 forms in Indonesian with specific, operational questions, not "satisfied/very satisfied".
- L2/L3 rubrics in the everyday vocabulary used in the organization (e.g. "customer visit" instead of "client engagement").
- Cultural calibration: avoid questions that make participants feel they are "rating their boss" if the cultural context makes that uncomfortable; use peers as a proxy.
Relation to TNA, ADDIE, Phillips ROI, 70-20-10
| Framework | Role |
|---|---|
| TNA (McGehee & Thayer; Allison Rossett) | Sets the need and baseline; without TNA, Kirkpatrick has no benchmark |
| ADDIE (Analysis, Design, Development, Implementation, Evaluation) | Kirkpatrick = the heart of the Evaluation phase; backward design = Analysis & Design |
| Phillips ROI Level 5 | Monetizes L4 into BCR & ROI%; adds the cost dimension |
| 70-20-10 (Lombardo & Eichinger) | Learning reality: 70% experience, 20% interaction, 10% formal β required drivers at L3 = 70 and 20 |
Synthesis: TNA β ADDIE (Design backward from L4) β Implementation (class + required drivers) β Evaluation (L1βL4 per schedule) β ROI Level 5 for budget justification. Kirkpatrick is evaluation; it only stands on a healthy foundation.
Worked program example L1βL4 (illustrative)
Illustrative scenario.
Program: First-line manager leadership training, 24 participants, in-house Jakarta, 5 days + 6 months reinforcement.
Target business result (L4): First-line-manager turnover down from 18% to 12% in 12 months; team engagement score up from 65 to 75.
Target behaviors (L3): Managers run weekly one-on-ones β₯ 30 minutes with each team member; use the GROW framework; log in HRIS; deliver β₯ 1 specific constructive feedback per week.
Target learning (L2): Understand GROW; able to listen actively (rubric score β₯ 4/5); able to deliver structured constructive feedback (rubric score β₯ 4/5); attitude that team development = primary job.
Learning experience (L1): 5 days Γ 7 hours = 35 hours in-person with 60% simulation/role-play; 6 months monthly micro-learning + monthly peer-coaching + weekly manager check-ins.
Required drivers: Manager KPIs include 360 team scores; one-on-one frequency dashboard in HRIS; quarterly recognition for high-retention managers; micro-learning video library; weekly nudge email.
Measurement:
| Level | Instrument | When |
|---|---|---|
| L1 | Smile sheet (relevance, intent, barriers), session NPS | End of each day + end of program |
| L2 | Pre/post knowledge test; coaching role-play with rubric | Day 1 (pre) + Day 5 (post) |
| L3 | 360 survey to 5 team members; HRIS one-on-one frequency data; sample observation of 5 managers | 90 days + 6 months |
| L4 | Annual turnover; team engagement from annual corporate survey | 12 months |
Final report: links L1 β L2 β L3 β L4, with notes on required drivers that worked/failed, and recommendations for the next cycle.
This program spends ~30% of budget on required drivers + measurement (outside facilitator fees and logistics) β a healthy ratio for a scale behavior program.
Ten common mistakes & how to avoid them
| # | Mistake | How to avoid |
|---|---|---|
| 1 | Only measuring L1, calling it "evaluation" | Agree L1βL4 from vendor proposal |
| 2 | No pre-training baseline | TNA mandatory + pre-class assessment |
| 3 | Measuring L3 too early (2 weeks) | Schedule 3β6 months; staged 30/60/90 observation |
| 4 | Generic L1 smile sheet | Ask relevance + intent-to-apply + barriers; treat satisfaction as one input among many |
| 5 | No required drivers | Build into the program: manager briefing, systems, KPIs, recognition |
| 6 | Single instrument without triangulation | Combine survey + observation + system data |
| 7 | L4 = aggregate KPIs only without isolation | Use Phillips ROI methods (control group / trend line / participant estimation) |
| 8 | No backward design | Start from L4, step back to L1 |
| 9 | Vendor not involved in L3βL4 | Agree in contract: vendor helps instrument, baseline, and report |
| 10 | No report linking L1βL4 to business | Require a final report format with causal narrative |
Kirkpatrick implementation checklist
Before execution, every box ticked:
- L4 defined: specific, measurable business indicators; baseline recorded.
- L3 defined: 3β7 observable behaviors with a scoring rubric.
- L2 defined: knowledge/skill/attitude; tied to SKKNI where relevant; pre-test ready.
- L1 designed from L2βL4 (backward design); "what is engaging" plays a supporting role.
- Required drivers identified: reinforce, encourage, reward, monitor.
- Required-driver owners assigned (direct manager, HR, system).
- Measurement schedule written: L1 immediately, L2 pre/post, L3 30/60/90+6mo, L4 12mo.
- Instruments ready: smile sheet, assessment, observation rubric, 360 survey, KPI dashboard.
- L3 triangulation agreed (at least 2 instruments).
- L4 isolation method selected (control group / trend line / participant estimation).
- Vendor engaged: instrument, baseline, final report.
- Final-report format agreed: L1βL4 narrative β business β follow-up.
- Measurement budget (~10β20% of total) explicitly separated as a line item.
FAQ
What is the Kirkpatrick 4-Level Model?
The Kirkpatrick Model is the world's most-used training-evaluation framework: Level 1 Reaction (participant satisfaction & relevance), Level 2 Learning (knowledge/skill/attitude gain), Level 3 Behavior (application on the job), Level 4 Results (targeted business indicators). Donald L. Kirkpatrick introduced it through four sequential articles in the Training Director's Journal of ASTD (1959), and his son Jim Kirkpatrick with wife Wendy Kayser Kirkpatrick expanded it in 'Kirkpatrick's Four Levels of Training Evaluation' (2016) into the 'New World Kirkpatrick Model' β adding the concept of required drivers (workplace reinforcers) and the backward-design principle (design programs from L4 to L1).
How does the original Kirkpatrick model differ from the New World Kirkpatrick Model?
The four levels remain. What is new in the New World Model (2016): (1) The backward-design principle β design from Level 4 (targeted business result) working back to Level 1, while still measuring forward from Level 1 to Level 4. (2) The required-drivers concept at Level 3 β workplace processes, systems, and management support that keep new behavior alive after class. (3) Explicit emphasis on Level 3 as central, because without behavior change Level 4 does not occur. (4) Explicit distinction between leading indicators (short-term) and lagging indicators (long-term) at Level 4. The New World Model makes Kirkpatrick relevant for organizational behavior transformation, beyond a post-class audit.
Why do most trainings stop at Level 1?
Four causes. (1) Cheap & fast β smile sheets are handed out at session end and scored immediately. (2) Comfortable to report β a 4.5/5 score easily justifies budget. (3) L2βL4 require pre-training baselines and cross-functional coordination that are often not prepared. (4) No sponsor pull β if CFO/board do not demand behavior/results evidence, L&D is not forced to climb. The consequence: training budget gets cut first under efficiency drives because its impact cannot be calculated. The fix: require vendors to design L1βL4 evaluation from the proposal stage with a baseline from the TNA.
How does the backward-design principle (starting from Level 4) work?
A New World Kirkpatrick concept: design backward from business results. (1) Start at L4 β ask 'which business indicators must move?' (e.g. first-line-manager turnover down 20%, NPS up 5 points, time-to-quote down 3 days). (2) Step to L3 β 'what specific behaviors must participants do for L4 to happen?' (e.g. managers run weekly check-ins, salespeople run diagnostic conversations on every proposal). (3) Step to L2 β 'what knowledge/skills/attitudes are needed for L3 behavior to be possible?' (e.g. GROW coaching, SPIN selling). (4) Step to L1 β 'what learning experience grows L2 and builds commitment?'. The result: every module has a straight line to a business indicator. Without backward design, classes are built from L1 (what will be engaging) and L4 impact is incidental.
What are required drivers and why are they critical at Level 3?
Required drivers are the workplace processes, systems, management support, and reward structures that make new behavior happen and persist on the job. Examples: direct manager runs weekly check-ins on application; CRM is updated to capture diagnostic conversations; KPI incentives adjust to reward target behavior; monthly peer-coaching; job aids at the desk. Without required drivers, participants return to work with good intentions and old habits win in weeks β training becomes an event without impact. Kirkpatrick Partners research is consistent: workplace environment determines whether learning becomes behavior, more strongly than class quality itself.
When is each level measured (measurement schedule)?
Level 1 β immediately after the session (the last 10β15 minutes of class or within 24 hours). Level 2 β immediately after the session (pre/post assessment) for knowledge, or within 1β2 weeks for demonstrated skill. Level 3 β 3 to 6 months after training; earlier measurement (e.g. 2 weeks) yields unreliable data because behavior has not yet settled. Common pattern: 30/60/90-day observation or a 3β6-month behavior survey with self + manager + subordinate references (180/360). Level 4 β 6 to 12 months; enough time for business indicators to move and for required drivers to consolidate behavior. Too fast = false evidence; too slow = no follow-through.
What instruments are used per level?
Level 1: short smile sheet (relevance, clarity, intent-to-apply beyond plain 'satisfaction'), session NPS. Level 2: pre/post knowledge test, skill demonstration, rubric-scored role-play, structured simulation, competency assessment against SKKNI where relevant. Level 3: behavior observation by manager/peer with a checklist, work-product sampling (e.g. written proposals, recorded conversations), application interviews/FGDs, 180/360 surveys, system data (CRM, ticket, process KPIs). Level 4: targeted business KPIs (NPS, turnover, time-to-X, error rate, revenue per FTE), leading indicators (aggregate behavior proxies), and lagging indicators (final business outcomes). Instrument triangulation reduces self-report bias.
What is the academic critique of the Kirkpatrick Model and how do you answer it?
Main critiques (Holton 1996; Bates 2004): (a) the model is not a theory β it is a taxonomy framework, it does not explain causality; (b) the inter-level causal assumption (L1 β L2 β L3 β L4) is not always valid β participants can be satisfied (high L1) without learning (low L2), or learn without applying (low L3); (c) environmental factors (which the New World Model answers via required drivers) are often ignored; (d) it does not explain cost β Jack Phillips added Level 5 ROI as the answer. The fix: use Kirkpatrick as a measurement framework paired with TNA (cause of need), required drivers (cause of behavior transfer), Phillips ROI Level 5 (cost justification), and data triangulation. Kirkpatrick is a map β not an engine.
How do you adapt Kirkpatrick to the Indonesian corporate-training context?
Four adjustments. (1) Map Level 2 to SKKNI: role competencies tie to relevant SKKNI units; pre/post assessment is measured against the external standard alongside internal scores. If certification is needed, connect with a BNSP-licensed LSP. (2) Local-context required drivers: the direct manager carries strong influence in Indonesian organizational culture, so a manager briefing before class + post-class check-ins yield large leverage. (3) Level 4 for BUMN/agencies: business indicators often tie to RKAP or performance contracts; ensure KPIs come from existing monitored systems to avoid invented KPIs that exist only for training. (4) Instrument language: L1 in Indonesian with specific actionable questions that go beyond 'satisfied/very satisfied'; L2/L3 rubrics in everyday operational terms. See the TNA guide for tying measurement baselines to SKKNI standards.
How does Kirkpatrick relate to Phillips ROI Level 5?
Phillips ROI Methodology (Jack J. Phillips, 1973) adds Level 5 above Kirkpatrick L1βL4: monetize L4 benefits and compare against program cost. Formula: BCR = Total Benefits Γ· Total Costs; ROI% = ((Benefits β Costs) Γ· Costs) Γ 100. L5 answers the critique 'Kirkpatrick does not explain cost'. But L5 is only valid if L4 is valid; L4 only if L3 is; L3 only if required drivers exist; L1βL2 only if learning design is solid. ROI is not a replacement for Kirkpatrick β it deepens it. See the Phillips ROI Level 5 guide for isolation methods (control group / trend line / participant estimation), monetization, and fully-loaded costs.
What are the most common mistakes when applying Kirkpatrick?
Ten most-frequent mistakes: (1) Measuring only L1 and calling it 'evaluation'; (2) No pre-training baseline for L2/L3/L4; (3) Measuring L3 too early (2 weeks) so the data is unreliable; (4) Generic L1 smile sheet ('satisfied/very satisfied') without relevance or intent-to-apply questions; (5) No required drivers β participants return to a workplace that blocks application; (6) Relying on a single instrument (e.g. only a self-survey) without triangulation; (7) L4 = only aggregate KPIs influenced by many factors, with no isolation method; (8) No backward design β classes built from L1 and L4 is incidental; (9) Vendor not involved in L3βL4 measurement so the data depends on internal teams who lack time; (10) No final report linking L1βL4 to business outcomes and follow-up. Each mistake lowers L&D credibility before finance and the board.
Next step
You now have a complete Kirkpatrick framework: the four levels, the backward-design principle, required drivers, instruments per level, a 30/60/90-day schedule, Indonesia adaptation, and academic critiques with answers. The sensible next step is to run a TNA that sets the L2βL4 baseline β before designing any program.
Neksus designs every program with backward-design Kirkpatrick: starting from the target business indicator (L4), stepping down to behaviors (L3) + required drivers, to learning (L2) tied to SKKNI, then to the learning experience (L1). L1βL4 measurement is included in the proposal with instruments, schedule, and a final-report format that links to the business objective. Discuss your team's need and request an initial TNA via the Neksus contact page β no obligation.
Also see the companion guides:
- Training Needs Analysis (TNA) β cause of need & baseline
- Phillips ROI Level 5 β monetization & budget justification
- How to Choose a Corporate Training Vendor β criteria that require Kirkpatrick L1βL4
- Corporate Training RFP: Template & Criteria β Kirkpatrick technical questions to include
- Vendor Scoring Rubric β "methodology & measurement" weight
- Building a Training Budget (RAB) β budget required drivers & measurement
- Leadership for First-Line Managers β example program with L1βL4
- See the full training catalog β
Last updated: 18 May 2026. The frameworks cited (Donald L. Kirkpatrick, 1959, four-article series in ASTD's Training Director's Journal; Donald Kirkpatrick, 1994, Evaluating Training Programs: The Four Levels; Jim D. Kirkpatrick & Wendy Kayser Kirkpatrick, 2016, Kirkpatrick's Four Levels of Training Evaluation β New World Kirkpatrick Model; Holton 1996; Bates 2004; Phillips 1973 ROI Methodology; ADDIE; SKKNI; 70-20-10 Lombardo & Eichinger) are attributed to their original sources. Worked program is illustrative; numbers exist to show method, with no relation to client data. Neksus does not publish client names or success statistics.
Tags
Related Articles
Continue reading more articles
Training Needs Analysis (TNA): What, Why, and How β A Complete Operational Guide for HR & L&D
An operational Training Needs Analysis (TNA) guide: definition & the 3 levels (McGehee-Thayer), the root-cause gate (Mager & Pipe / Gilbert), 7 steps, a data-method matrix, DIF prioritization with worked numbers, competency mapping to SKKNI, and turning gaps into measurable objectives and an ROI baseline.
In-House vs Public Training: A Complete Decision Guide β When to Choose Which
An in-house vs public training decision guide: six decision axes, the real break-even math (when in-house is cheaper), the hidden costs of each model, a decision tree, tax & procurement implications, the hybrid path, and when public genuinely wins.
Building a Training Budget (RAB) and Annual Training Plan: A Complete Guide for HR, L&D, Procurement, and Finance
A complete guide to building a training budget (RAB) and annual training plan: four budgeting methods, 12 cost components and their drivers, direct/indirect/opportunity costs, tax inside the RAB (VAT/PPh 23/PPh 21/gross-up), BUMN RKAP, government DIPA/SBM, contingency & reforecast, and defending the budget to the CFO.