Skip to content
Back to Blog
Buyer Guide
L&D
Methodology

Kirkpatrick 4-Level Deep: How to Apply Training Evaluation in Indonesia (New World Model, Backward Design, Required Drivers)

Operational Kirkpatrick 4-level guide: from Donald Kirkpatrick's 1959 model to the New World Kirkpatrick Model (Jim & Wendy Kirkpatrick, 2016), the backward-design principle (start at L4), workplace required drivers, instruments per level, a 30/60/90-day measurement schedule, common mistakes, and adaptation for Indonesia.

Neksus Research Team

Corporate training curation research β€” Neksus

May 17, 2026
20 min read
~4,728 words

Short answer: The Kirkpatrick Model measures training at four levels: Reaction (L1), Learning (L2), Behavior (L3), Business Results (L4). Introduced by Donald L. Kirkpatrick through four articles in ASTD's Training Director's Journal in 1959, and expanded by Jim Kirkpatrick & Wendy Kayser Kirkpatrick into the New World Kirkpatrick Model (2016) which adds the backward-design principle (design from L4 to L1) and required drivers (workplace reinforcers). To make it work in Indonesia: tie L2 to SKKNI, treat the direct manager as the primary required driver (hierarchical culture functions as an asset), measure L3 after 3–6 months, triangulate instruments, and connect to Phillips ROI Level 5 for budget justification.

Most Kirkpatrick articles stop at the four definitions and one smile sheet. That is enough to learn the terms, yet too thin to run the method. This guide closes that gap: the 1959 model to the 2016 New World Model (what changed and why it matters), the backward-design principle with worked examples, required drivers with a concrete list, instruments per level with a 30/60/90-day schedule, common mistakes, academic critiques and how to answer them, and Indonesian-context adaptation.

Intended readers: HR / HC / L&D / SDM and unit leaders who design, buy, or approve training evaluation β€” in private companies, BUMN/BUMD, government agencies, institutions, associations, and non-profits.

Quick navigation

  1. Short history: from 1959 to the 2016 New World Model
  2. Four levels (Reaction, Learning, Behavior, Results)
  3. Backward-design principle (start from Level 4)
  4. Required drivers: workplace reinforcers
  5. Instruments per level + measurement schedule
  6. Level 3 deep: measuring behavior transfer
  7. Level 4 deep: leading vs lagging indicators
  8. Academic critiques & how to answer them
  9. Indonesian-context adaptation (SKKNI, hierarchy, RKAP)
  10. Relation to TNA, ADDIE, Phillips ROI, 70-20-10
  11. Worked program example L1–L4 (illustrative)
  12. Ten common mistakes & how to avoid them
  13. Kirkpatrick implementation checklist
  14. FAQ
  15. Next step

Short history: from 1959 to the 2016 New World Model

Donald L. Kirkpatrick wrote his 1954 PhD dissertation at University of Wisconsin–Madison on the evaluation of supervisor training. Five years later (1959), he published four sequential articles in the Training Director's Journal of the American Society for Training and Development (ASTD) laying out the four levels β€” one article per level. The term "Kirkpatrick Model" came later; in his early publication Donald called them "four steps".

1994 β€” Donald published Evaluating Training Programs: The Four Levels, which became the field standard for two decades.

2016 β€” His son Jim Kirkpatrick and daughter-in-law Wendy Kayser Kirkpatrick published Kirkpatrick's Four Levels of Training Evaluation, introducing the New World Kirkpatrick Model. The four levels stay; the additions:

  • Backward-design principle: design from Level 4 (target business result) to Level 1, then measure forward from Level 1 to Level 4.
  • Required drivers: processes, systems, and management support that keep new behavior alive after class.
  • Level 3 emphasis as central, because without behavior change L4 does not occur.
  • Leading vs lagging indicators at Level 4: short-term aggregate behavior proxies (leading) and long-term business outcomes (lagging).

Rule of thumb: Original Kirkpatrick is a map of the territory; the New World Model is a road map. The first names what to measure; the second explains how to make it actually work.

Four levels (Reaction, Learning, Behavior, Results)

LevelWhat is measuredCore question
L1 ReactionParticipant perception of relevance, engagement, intent to apply"Did participants find this useful and intend to apply it?"
L2 LearningKnowledge, skill, attitude, confidence, commitment gain"What do participants now know/do that they didn't?"
L3 BehaviorApplication on the job"Are participants actually doing what they learned in class?"
L4 ResultsTargeted business indicators"Did the business results that motivated this training move?"

Each level builds on the previous but is not always causal: participants can be satisfied (L1) without learning (L2), or learn without applying (L3). This naive causal assumption is what academics critique (Holton 1996, Bates 2004) and what the New World Model answers via required drivers bridging L2 to L3.

New World additions per level

  • L1: from "satisfaction" β†’ engagement + relevance + commitment. New questions: "How confident are you that you will apply this?" and "What will get in the way?"
  • L2: from "knowledge" β†’ knowledge + skill + attitude + confidence + commitment. Attitude and commitment matter as much as knowledge for transfer to L3.
  • L3: explicitly adds required drivers as a core part of L3 architecture.
  • L4: separates leading indicators (fast signals: behavior frequency, stakeholder satisfaction) from lagging indicators (final outcomes: revenue, turnover, NPS).

Backward-design principle (start from Level 4)

The strongest New World Kirkpatrick concept. Design the program from the back: define L4 first, then L3, L2, L1.

Four backward-design steps

  1. L4 β€” Target business result. Specific indicators that must move. Example: first-line-manager turnover down 20% in 12 months; average support-ticket close time down from 8h to 5h; client NPS up from 32 to 50.
  2. L3 β€” Required behaviors. A list of 3–7 specific, observable behaviors that, done consistently, produce L4. Example for manager turnover: "Manager runs weekly one-on-ones with each team member β‰₯ 30 minutes with a development focus that goes well beyond status updates." Concrete observable behaviors take precedence over abstract competencies.
  3. L2 β€” Required knowledge/skills/attitudes. What must participants know, be able to do, and believe to make L3 possible? Example: GROW coaching technique, active listening, constructive feedback delivery, the attitude that team development is the manager's primary job (not a distraction).
  4. L1 β€” Learning experience. What kind of experience grows L2 and builds the commitment to apply? Modules, methods (simulation, role-play, case study), duration, facilitator, schedule.

Consequences of backward design

  • Every module ties to a business indicator. No "interesting but unclear contribution" modules.
  • Vendors design programs end-to-end. First question to the vendor: "Which business indicators will move and what behaviors will harden them?" β€” answered with a system, going well beyond a class catalog.
  • Budget is justifiable. Conversation with the CFO: "To move KPI X by Y, we need behavior Z in the field; this program produces Z; cost = …; expected benefit = …"
  • Measurement is built in. L1–L4 are designed into the program from day one.

Rule of thumb: Without backward design, classes are built from L1 ("what will be engaging?") and L4 impact is incidental. With backward design, classes are built from L4 and L1 is a consequence.

Required drivers: workplace reinforcers

The New World Kirkpatrick names what many L&D teams sense but rarely systematize: workplace environment determines whether learning becomes behavior. Without required drivers, participants return to work with good intentions and old habits win in weeks.

Four categories of required drivers

CategoryConcrete examples for a first-line-leadership program
ReinforceGROW coaching job aid on laptops; weekly nudge email from L&D; micro-learning video library
EncourageDirect manager weekly check-ins on application; monthly peer-coaching circle; monthly success story in the newsletter
RewardManager KPIs include 360 team scores; annual bonus weighs team engagement; public recognition for high-retention managers
MonitorOne-on-one frequency dashboard; monthly pulse survey to team members; quarterly L&D review

Practical rules for required drivers

  • Shared ownership. Vendor designs; HR/L&D facilitates; participant's direct manager + unit sponsor own the required drivers.
  • Built into the program from intake. A "manager briefing" module before the participant class matters as much as the participant class itself.
  • Consequence without required drivers: a program "succeeds" at L2 (participants learn) but fails at L3 (they do not apply). Classic.

Rule of thumb: Kirkpatrick Partners research is consistent: workplace environment contributes more to behavior transfer than class quality itself. A budget that is 70% class and 0% required drivers is half a budget.

Instruments per level + measurement schedule

LevelCommon instrumentsWhen measured
L1 ReactionShort smile sheet (relevance, clarity, intent-to-apply, barriers), session NPS, engagement observationImmediately after session (last 10–15 minutes or within 24 hours)
L2 LearningPre/post knowledge test, skill demonstration, rubric-scored role-play, structured simulation, competency assessment (vs SKKNI where relevant)Pre: before session. Post: immediately after or within 1–2 weeks
L3 BehaviorManager/peer observation with checklist, work-product sampling, application interviews/FGDs, 180/360 surveys, system data (CRM, process KPIs)3–6 months after training (30/60/90 staged observation)
L4 ResultsTarget business KPIs (revenue, turnover, NPS, time-to-X, error rate), leading & lagging indicators6–12 months after training

Instrument rules

  • L1, do not go generic. Replace "Are you satisfied?" with "What is one specific thing you will apply next week? What will get in the way?"
  • L2 requires a baseline. A pre-test before the session turns the post-test from "a high meaningless score" into "a measured gain".
  • L3 measured late enough. Measuring two weeks post-class yields unreliable data because behavior has not settled (Kirkpatrick Partners). Use the 3–6-month window.
  • L4 requires triangulation. A single aggregate KPI is influenced by many factors; combine leading (behavior proxy) and lagging (business outcome) plus an isolation method (control group or trend line from Phillips ROI).

Level 3 deep: measuring behavior transfer

L3 is the hardest level and the most-skipped, yet the most decisive. Three components measure it correctly:

(a) Define observable behaviors

Concrete observable behaviors win over abstract competencies. Weak: "Manager is capable of coaching." Strong: "Manager runs weekly one-on-ones β‰₯ 30 minutes with each team member, using the GROW framework, and logs follow-ups in HRIS."

(b) Triangulate instruments

InstrumentWhat it capturesBias
Participant self-surveyParticipant's perception of own applicationPositive bias (overestimate)
Manager observationBehavior visible to the managerSelection bias (only when manager is around)
360 from subordinates/peersExperience of the people who receive the behaviorMore accurate for interpersonal behavior
Work-product samplingTangible evidence (proposals, recorded conversations, tickets)Objective but availability-limited
System data (CRM/HRIS)Frequency/quality of activityObjective, real-time, expensive to collect ad-hoc

Triangulate at least 2 instruments β€” one qualitative (interview/observation) and one quantitative (system data or 360).

(c) 30/60/90-day cadence

  • 30 days: check intent & early barriers (mid-course correction).
  • 60 days: observe early behavior; are required drivers working?
  • 90 days: primary L3 measurement; behavior has settled enough.
  • 6 months: confirmation measurement before L4.

Required drivers determine L3

Strong L3 comes from strong required drivers. When L3 is weak, the first question is not "was the class good?" but "did the direct manager support application? Does the system capture new behavior? Do KPIs reward target behavior?"

Level 4 deep: leading vs lagging indicators

TypeExamplesCharacteristics
Leading (early signals)One-on-one frequency, diagnostic conversations per week, quarterly stakeholder satisfactionMove fast, aggregate behavior proxies
Lagging (final outcomes)Annual turnover, annual client NPS, revenue per FTE, market shareMove slowly, the target business outcomes

L4 rules

  • Pick existing KPIs. Do not invent KPIs only for training β€” collection cost is high and continuity low. Use indicators already monitored in RKAP / business dashboards.
  • Isolate training effect. Aggregate KPIs are influenced by many factors (market, product, incentives). Use Phillips ROI methods (control group if possible, trend-line analysis, or participant estimation) to separate training's contribution.
  • Leading first, lagging follows. Leading indicators (3–6 months) signal whether lagging (6–12 months) will move β€” time for correction before final results.

See the Phillips ROI Level 5 guide for isolation methods and L4 monetization.

Academic critiques & how to answer them

CritiqueSourcePractical answer
Model is not a theory β€” does not explain causalityHolton (1996) "From Training to Performance Improvement"Use it as a framework; pair with TNA (cause of need) + required drivers (cause of transfer)
Inter-level causal assumption not always validBates (2004)Triangulate instruments; do not assume high L1 β†’ high L4
Environmental factors ignoredTannenbaum, Cannon-Bowers, MathieuNew World required drivers answer this explicitly
Does not explain costSeveral academicsPhillips ROI Level 5 adds the cost dimension
Bottom-up bias (starting from L1)Wendy & Jim KirkpatrickBackward design β€” start from L4 answers this

Healthy synthesis: Kirkpatrick is a measurement map. Pair with TNA for cause, required drivers for transfer, Phillips ROI for cost justification, and 70-20-10 for learning realism. Together they form a whole engine.

Indonesian-context adaptation (SKKNI, hierarchy, RKAP)

(1) Tie L2 to SKKNI

For technical and compliance roles, measure L2 against relevant SKKNI competency units alongside internal scores. If competency certification is needed, connect with a BNSP-licensed LSP assessment path. This makes L2 auditable and links training to formal career paths. See the TNA guide for mapping competencies to SKKNI.

(2) Leverage hierarchical culture as an asset

In many Indonesian organizations, the direct manager has strong influence on subordinate behavior. Treat this as an asset for Kirkpatrick:

  • Manager briefing before the participant class β€” managers are told what participants will learn and asked to serve as required drivers.
  • Manager assigns application tasks after class (job aid + check-in schedule).
  • 360 scores from subordinates feed manager performance reviews so the "give coaching" required driver is structurally rewarded.

(3) Tie L4 to RKAP / performance contracts

For BUMN and government agencies, business indicators often tie to RKAP (BUMN) or Echelon I/II performance contracts (government). L4 KPIs used in training should be a subset of KPIs already monitored β€” to avoid adding reporting burden via newly-invented indicators.

(4) Instrument language

  • L1 forms in Indonesian with specific, operational questions, not "satisfied/very satisfied".
  • L2/L3 rubrics in the everyday vocabulary used in the organization (e.g. "customer visit" instead of "client engagement").
  • Cultural calibration: avoid questions that make participants feel they are "rating their boss" if the cultural context makes that uncomfortable; use peers as a proxy.

Relation to TNA, ADDIE, Phillips ROI, 70-20-10

FrameworkRole
TNA (McGehee & Thayer; Allison Rossett)Sets the need and baseline; without TNA, Kirkpatrick has no benchmark
ADDIE (Analysis, Design, Development, Implementation, Evaluation)Kirkpatrick = the heart of the Evaluation phase; backward design = Analysis & Design
Phillips ROI Level 5Monetizes L4 into BCR & ROI%; adds the cost dimension
70-20-10 (Lombardo & Eichinger)Learning reality: 70% experience, 20% interaction, 10% formal β€” required drivers at L3 = 70 and 20

Synthesis: TNA β†’ ADDIE (Design backward from L4) β†’ Implementation (class + required drivers) β†’ Evaluation (L1–L4 per schedule) β†’ ROI Level 5 for budget justification. Kirkpatrick is evaluation; it only stands on a healthy foundation.

Worked program example L1–L4 (illustrative)

Illustrative scenario.

Program: First-line manager leadership training, 24 participants, in-house Jakarta, 5 days + 6 months reinforcement.

Target business result (L4): First-line-manager turnover down from 18% to 12% in 12 months; team engagement score up from 65 to 75.

Target behaviors (L3): Managers run weekly one-on-ones β‰₯ 30 minutes with each team member; use the GROW framework; log in HRIS; deliver β‰₯ 1 specific constructive feedback per week.

Target learning (L2): Understand GROW; able to listen actively (rubric score β‰₯ 4/5); able to deliver structured constructive feedback (rubric score β‰₯ 4/5); attitude that team development = primary job.

Learning experience (L1): 5 days Γ— 7 hours = 35 hours in-person with 60% simulation/role-play; 6 months monthly micro-learning + monthly peer-coaching + weekly manager check-ins.

Required drivers: Manager KPIs include 360 team scores; one-on-one frequency dashboard in HRIS; quarterly recognition for high-retention managers; micro-learning video library; weekly nudge email.

Measurement:

LevelInstrumentWhen
L1Smile sheet (relevance, intent, barriers), session NPSEnd of each day + end of program
L2Pre/post knowledge test; coaching role-play with rubricDay 1 (pre) + Day 5 (post)
L3360 survey to 5 team members; HRIS one-on-one frequency data; sample observation of 5 managers90 days + 6 months
L4Annual turnover; team engagement from annual corporate survey12 months

Final report: links L1 β†’ L2 β†’ L3 β†’ L4, with notes on required drivers that worked/failed, and recommendations for the next cycle.

This program spends ~30% of budget on required drivers + measurement (outside facilitator fees and logistics) β€” a healthy ratio for a scale behavior program.

Ten common mistakes & how to avoid them

#MistakeHow to avoid
1Only measuring L1, calling it "evaluation"Agree L1–L4 from vendor proposal
2No pre-training baselineTNA mandatory + pre-class assessment
3Measuring L3 too early (2 weeks)Schedule 3–6 months; staged 30/60/90 observation
4Generic L1 smile sheetAsk relevance + intent-to-apply + barriers; treat satisfaction as one input among many
5No required driversBuild into the program: manager briefing, systems, KPIs, recognition
6Single instrument without triangulationCombine survey + observation + system data
7L4 = aggregate KPIs only without isolationUse Phillips ROI methods (control group / trend line / participant estimation)
8No backward designStart from L4, step back to L1
9Vendor not involved in L3–L4Agree in contract: vendor helps instrument, baseline, and report
10No report linking L1–L4 to businessRequire a final report format with causal narrative

Kirkpatrick implementation checklist

Before execution, every box ticked:

  • L4 defined: specific, measurable business indicators; baseline recorded.
  • L3 defined: 3–7 observable behaviors with a scoring rubric.
  • L2 defined: knowledge/skill/attitude; tied to SKKNI where relevant; pre-test ready.
  • L1 designed from L2–L4 (backward design); "what is engaging" plays a supporting role.
  • Required drivers identified: reinforce, encourage, reward, monitor.
  • Required-driver owners assigned (direct manager, HR, system).
  • Measurement schedule written: L1 immediately, L2 pre/post, L3 30/60/90+6mo, L4 12mo.
  • Instruments ready: smile sheet, assessment, observation rubric, 360 survey, KPI dashboard.
  • L3 triangulation agreed (at least 2 instruments).
  • L4 isolation method selected (control group / trend line / participant estimation).
  • Vendor engaged: instrument, baseline, final report.
  • Final-report format agreed: L1–L4 narrative β†’ business β†’ follow-up.
  • Measurement budget (~10–20% of total) explicitly separated as a line item.

FAQ

What is the Kirkpatrick 4-Level Model?

The Kirkpatrick Model is the world's most-used training-evaluation framework: Level 1 Reaction (participant satisfaction & relevance), Level 2 Learning (knowledge/skill/attitude gain), Level 3 Behavior (application on the job), Level 4 Results (targeted business indicators). Donald L. Kirkpatrick introduced it through four sequential articles in the Training Director's Journal of ASTD (1959), and his son Jim Kirkpatrick with wife Wendy Kayser Kirkpatrick expanded it in 'Kirkpatrick's Four Levels of Training Evaluation' (2016) into the 'New World Kirkpatrick Model' β€” adding the concept of required drivers (workplace reinforcers) and the backward-design principle (design programs from L4 to L1).

How does the original Kirkpatrick model differ from the New World Kirkpatrick Model?

The four levels remain. What is new in the New World Model (2016): (1) The backward-design principle β€” design from Level 4 (targeted business result) working back to Level 1, while still measuring forward from Level 1 to Level 4. (2) The required-drivers concept at Level 3 β€” workplace processes, systems, and management support that keep new behavior alive after class. (3) Explicit emphasis on Level 3 as central, because without behavior change Level 4 does not occur. (4) Explicit distinction between leading indicators (short-term) and lagging indicators (long-term) at Level 4. The New World Model makes Kirkpatrick relevant for organizational behavior transformation, beyond a post-class audit.

Why do most trainings stop at Level 1?

Four causes. (1) Cheap & fast β€” smile sheets are handed out at session end and scored immediately. (2) Comfortable to report β€” a 4.5/5 score easily justifies budget. (3) L2–L4 require pre-training baselines and cross-functional coordination that are often not prepared. (4) No sponsor pull β€” if CFO/board do not demand behavior/results evidence, L&D is not forced to climb. The consequence: training budget gets cut first under efficiency drives because its impact cannot be calculated. The fix: require vendors to design L1–L4 evaluation from the proposal stage with a baseline from the TNA.

How does the backward-design principle (starting from Level 4) work?

A New World Kirkpatrick concept: design backward from business results. (1) Start at L4 β€” ask 'which business indicators must move?' (e.g. first-line-manager turnover down 20%, NPS up 5 points, time-to-quote down 3 days). (2) Step to L3 β€” 'what specific behaviors must participants do for L4 to happen?' (e.g. managers run weekly check-ins, salespeople run diagnostic conversations on every proposal). (3) Step to L2 β€” 'what knowledge/skills/attitudes are needed for L3 behavior to be possible?' (e.g. GROW coaching, SPIN selling). (4) Step to L1 β€” 'what learning experience grows L2 and builds commitment?'. The result: every module has a straight line to a business indicator. Without backward design, classes are built from L1 (what will be engaging) and L4 impact is incidental.

What are required drivers and why are they critical at Level 3?

Required drivers are the workplace processes, systems, management support, and reward structures that make new behavior happen and persist on the job. Examples: direct manager runs weekly check-ins on application; CRM is updated to capture diagnostic conversations; KPI incentives adjust to reward target behavior; monthly peer-coaching; job aids at the desk. Without required drivers, participants return to work with good intentions and old habits win in weeks β€” training becomes an event without impact. Kirkpatrick Partners research is consistent: workplace environment determines whether learning becomes behavior, more strongly than class quality itself.

When is each level measured (measurement schedule)?

Level 1 β€” immediately after the session (the last 10–15 minutes of class or within 24 hours). Level 2 β€” immediately after the session (pre/post assessment) for knowledge, or within 1–2 weeks for demonstrated skill. Level 3 β€” 3 to 6 months after training; earlier measurement (e.g. 2 weeks) yields unreliable data because behavior has not yet settled. Common pattern: 30/60/90-day observation or a 3–6-month behavior survey with self + manager + subordinate references (180/360). Level 4 β€” 6 to 12 months; enough time for business indicators to move and for required drivers to consolidate behavior. Too fast = false evidence; too slow = no follow-through.

What instruments are used per level?

Level 1: short smile sheet (relevance, clarity, intent-to-apply beyond plain 'satisfaction'), session NPS. Level 2: pre/post knowledge test, skill demonstration, rubric-scored role-play, structured simulation, competency assessment against SKKNI where relevant. Level 3: behavior observation by manager/peer with a checklist, work-product sampling (e.g. written proposals, recorded conversations), application interviews/FGDs, 180/360 surveys, system data (CRM, ticket, process KPIs). Level 4: targeted business KPIs (NPS, turnover, time-to-X, error rate, revenue per FTE), leading indicators (aggregate behavior proxies), and lagging indicators (final business outcomes). Instrument triangulation reduces self-report bias.

What is the academic critique of the Kirkpatrick Model and how do you answer it?

Main critiques (Holton 1996; Bates 2004): (a) the model is not a theory β€” it is a taxonomy framework, it does not explain causality; (b) the inter-level causal assumption (L1 β†’ L2 β†’ L3 β†’ L4) is not always valid β€” participants can be satisfied (high L1) without learning (low L2), or learn without applying (low L3); (c) environmental factors (which the New World Model answers via required drivers) are often ignored; (d) it does not explain cost β€” Jack Phillips added Level 5 ROI as the answer. The fix: use Kirkpatrick as a measurement framework paired with TNA (cause of need), required drivers (cause of behavior transfer), Phillips ROI Level 5 (cost justification), and data triangulation. Kirkpatrick is a map β€” not an engine.

How do you adapt Kirkpatrick to the Indonesian corporate-training context?

Four adjustments. (1) Map Level 2 to SKKNI: role competencies tie to relevant SKKNI units; pre/post assessment is measured against the external standard alongside internal scores. If certification is needed, connect with a BNSP-licensed LSP. (2) Local-context required drivers: the direct manager carries strong influence in Indonesian organizational culture, so a manager briefing before class + post-class check-ins yield large leverage. (3) Level 4 for BUMN/agencies: business indicators often tie to RKAP or performance contracts; ensure KPIs come from existing monitored systems to avoid invented KPIs that exist only for training. (4) Instrument language: L1 in Indonesian with specific actionable questions that go beyond 'satisfied/very satisfied'; L2/L3 rubrics in everyday operational terms. See the TNA guide for tying measurement baselines to SKKNI standards.

How does Kirkpatrick relate to Phillips ROI Level 5?

Phillips ROI Methodology (Jack J. Phillips, 1973) adds Level 5 above Kirkpatrick L1–L4: monetize L4 benefits and compare against program cost. Formula: BCR = Total Benefits Γ· Total Costs; ROI% = ((Benefits βˆ’ Costs) Γ· Costs) Γ— 100. L5 answers the critique 'Kirkpatrick does not explain cost'. But L5 is only valid if L4 is valid; L4 only if L3 is; L3 only if required drivers exist; L1–L2 only if learning design is solid. ROI is not a replacement for Kirkpatrick β€” it deepens it. See the Phillips ROI Level 5 guide for isolation methods (control group / trend line / participant estimation), monetization, and fully-loaded costs.

What are the most common mistakes when applying Kirkpatrick?

Ten most-frequent mistakes: (1) Measuring only L1 and calling it 'evaluation'; (2) No pre-training baseline for L2/L3/L4; (3) Measuring L3 too early (2 weeks) so the data is unreliable; (4) Generic L1 smile sheet ('satisfied/very satisfied') without relevance or intent-to-apply questions; (5) No required drivers β€” participants return to a workplace that blocks application; (6) Relying on a single instrument (e.g. only a self-survey) without triangulation; (7) L4 = only aggregate KPIs influenced by many factors, with no isolation method; (8) No backward design β€” classes built from L1 and L4 is incidental; (9) Vendor not involved in L3–L4 measurement so the data depends on internal teams who lack time; (10) No final report linking L1–L4 to business outcomes and follow-up. Each mistake lowers L&D credibility before finance and the board.

Next step

You now have a complete Kirkpatrick framework: the four levels, the backward-design principle, required drivers, instruments per level, a 30/60/90-day schedule, Indonesia adaptation, and academic critiques with answers. The sensible next step is to run a TNA that sets the L2–L4 baseline β€” before designing any program.

Neksus designs every program with backward-design Kirkpatrick: starting from the target business indicator (L4), stepping down to behaviors (L3) + required drivers, to learning (L2) tied to SKKNI, then to the learning experience (L1). L1–L4 measurement is included in the proposal with instruments, schedule, and a final-report format that links to the business objective. Discuss your team's need and request an initial TNA via the Neksus contact page β€” no obligation.

Also see the companion guides:


Last updated: 18 May 2026. The frameworks cited (Donald L. Kirkpatrick, 1959, four-article series in ASTD's Training Director's Journal; Donald Kirkpatrick, 1994, Evaluating Training Programs: The Four Levels; Jim D. Kirkpatrick & Wendy Kayser Kirkpatrick, 2016, Kirkpatrick's Four Levels of Training Evaluation β€” New World Kirkpatrick Model; Holton 1996; Bates 2004; Phillips 1973 ROI Methodology; ADDIE; SKKNI; 70-20-10 Lombardo & Eichinger) are attributed to their original sources. Worked program is illustrative; numbers exist to show method, with no relation to client data. Neksus does not publish client names or success statistics.

Tags

Kirkpatrick
training evaluation
New World Kirkpatrick
backward design
required drivers
Level 3 behavior
Level 4 results
TNA
Phillips ROI
Kirkpatrick 4-Level Deep: How to Apply in Indonesia (2026 Guide) | Neksus