This paper proposes a framework adoptable by any Latin American country, organized in four integrated layers: a low-stakes census measurement layer covering grades 6 through 11 in the public system; a welfare layer with universal school meals and a school-attendance-conditional cash transfer; an escalated intervention layer with causal verification through randomized pilots; and an opportunity layer culminating in a single high-stakes exam in grade 11, destined for broad scholarships awarded on progress, context and need. The design rests on a single operational rule —money only touches inputs that are cheap to verify, never the thermometer itself— and explicitly excludes school-performance-based funding and individual incentives tied to test scores, for technical reasons developed throughout. Fiscal feasibility closes at 0.2–0.3% of GDP for countries with existing food and conditional-cash-transfer programs (most of the region), and at 0.10–0.15% of GDP for the pure measurement-intervention-scholarship system. The Argentine case is costed at USD 2.9 billion per year (0.43% of GDP), less than current AUH expenditure.
Keywords: educational assessment, conditional cash transfers, comparative education policy, Latin America, education finance, AUH, school feeding, school intervention, student scholarships.
Latin American educational systems face three persistent failures worth naming precisely, because each admits a different instrument, and combining them into a single device —as previous reforms have attempted— has produced counterproductive effects.
First, an information failure: ministries do not know, with clean and comparable data, which schools are teaching well and which are not. Available data is either self-reported by the schools themselves or comes from high-stakes standardized tests whose stakes distort the measurement. Without reliable diagnosis no improvement policy is possible.
Second, a welfare failure: a significant share of students attend class hungry, and another share simply does not attend because their family needs the income from their work. No pedagogy solves these two problems; however, both have proven social policy instruments (school meals and conditional cash transfers) with decades of rigorous evaluation.
Third, an intervention failure: even when the system knows a school is failing, the mechanisms to help it are slow, opaque, poorly coordinated with subnational tiers of government, and lacking causal evidence on which interventions actually work.
Latin American reforms of the past three decades have offered familiar answers: performance-based funding, teacher evaluations with salary consequences, rewards and punishments tied to scores. The comparative evidence on these mechanisms is, at best, ambiguous, and in several documented cases frankly negative: increased statistical fraud, curricular narrowing toward tested subjects, push-out of low-performing students, and teacher demoralization. This framework is built explicitly to avoid that set of pathologies.
The entire architecture rests on a single constraint, derived from the literature on incentives in performance measurement (Campbell, 1979; Muller, 2018) and verified across comparative experience:
The same instrument cannot measure cleanly and distribute rewards at the same time. When a metric becomes the target of money or consequences, it ceases to be a good metric: effort is redirected toward inflating the number, not toward producing what the number was meant to measure.
From this follows the operational rule that orders where money may exist within the system:
Money only touches things that are cheap to verify and are not the thermometer. Verifiable inputs are paid (attendance, retention, meals). Results are rewarded only once, at the end of the trajectory, in a single instrument shielded to bear that weight.
This rule excludes by construction the following mechanisms: school funding tied to scores; teacher bonuses based on their students' results; individual prizes for year-on-year improvement; ordinal public school rankings; grade promotion automated by external exam. Each of these mechanisms was considered and discarded for reasons developed in the corresponding sections.
The framework defines functions, not specific institutions. Each country embodies them in its own organs through a local implementation annex (in Argentina, provincial jurisdictions act as Responsible Authority; in Chile, the local education services; in Colombia, certified education secretariats; and so on).
Without these functions the model does not operate, in any country. The presence of these five conditions constitutes the minimum floor of adoptability:
Scope: the system exclusively covers public education. The stated objective is that public schools improve until they become more attractive than private ones; the engine of that improvement is self-referential —each school against its own trajectory and its peers— and does not require measuring the private sector.
Once a year, grades 6 through 10 of the public system take a standardized national exam on a common curriculum, in every compulsory subject. The exam is deliberately low-stakes: it does not affect promotion, distribute prizes, or alter budgets. The student operates as a sensor of the system, not as a contestant —and precisely for that reason, the data can come out clean.
Coverage combines two modalities. Small-enrollment schools are fully censused (trivially administered, eliminating sampling noise). Large schools are sampled with stratification by prior performance level (low / medium / high), classified by the previous year's clean exam and never by school grades. This stratification has a central property: inflating a weak student's grades to reclassify them into the high stratum places them in a group where they will perform poorly on the clean exam and drag that stratum down; fraud statistically reveals itself.
Instrument generation is delegated to an external provider under a scheme with four operational defenses: an item bank with siloed authors (who do not know each other), the bank held in technical custody (escrow), periodic rotation of the provider to prevent capture, and physical and procedural separation between item authors and those who access the final bank. Security at grades 6–10 is deliberately light: with no stakes, the incentive to steal an exam is low and does not justify a fortress.
The most important technical objection to measurement without consequences is that a fraction of students will not put in effort, biasing the diagnostic. The framework confronts this with three combined mechanisms: short exams to minimize fatigue; age-appropriate motivational framing without threats of consequences; and post-hoc statistical filters that detect and discard random response patterns. International evidence (NAEP, PISA) shows that with these mechanisms the effort bias is manageable and does not compromise inferences at the school and system level.
If evidence from the early years indicates the problem is larger than expected, a defined upgrade exists: have the exam result count as one grade among many within the subject's evaluation (bounded weight; the school retains full promotion authority through all its instances). This upgrade carries a cost —moving from sample to full census in large schools— and is activated only if data quality requires it.
The nominal registry allows tracking each student's trajectory between grades 6 and 11, including school changes: the record follows the student, not the other way around. Dropout and anomalous enrollment purges —the hardest fraud lever to eliminate in any system with any kind of stakes— trigger automatic red flags. No school can improve its numbers by pushing weak students out, because those students do not disappear from the system: they are tracked to their new school or, if they appear in none, flagged as dropout requiring explanation.
Monthly exams and school grades remain the school's: a formative tool for the teacher, never input to money or rankings. The gap between school self-reporting and clean annual exam measurement functions as a free detector of grade inflation at the class level —without auditing any monthly exam, the system obtains an integrity signal.
Anonymous student surveys on their teachers (quarterly and end-of-year) are formative input with no reward or punishment: well-designed, they predict teacher quality better than value-added models alone, but they lose that property when stakes are added (complacency bias, reward for the easy teacher, punishment for the demanding one).
Teacher evaluation is formative, multi-source and low-stakes —classroom observation, peers, student survey, student progress as one input among several— aimed at supporting and developing the teacher, never at ranking or firing them. This is a technically grounded decision: high-stakes teacher evaluation schemes based on value added are noisy to the point of unfairness (same teacher, different quintiles in consecutive years by chance) and shift all gaming pressure to the individual teacher, which explains much of the educational fraud scandals of the last two decades. Additionally, competition among teachers destroys the collaboration that transfer of good practices requires.
Promotion remains a school decision, integrating the full year, with its multiple instances (make-ups, examination boards) and the judgment of the teachers who know the student. The national exam never automatically decides an individual student's promotion. This separation is technically necessary: a diagnostic exam is designed with items spanning the full range of difficulty to map the complete distribution; a promotion exam requires psychometric precision around the cut-line, individually defensible standards, and legal appealability. They are distinct artifacts, and a single instrument cannot optimize for both.
The obvious operational question facing a system with two national exams (low-stakes at grades 6–10, high-stakes at grade 11) is how they are physically administered. The answer must differ for each, and that asymmetry is the design decision that makes the model fiscally viable while remaining sound. Applying the same logistical apparatus to both —as would be the case, for instance, if one replicated Colombia's ICFES protocol at every grade— would multiply operational cost without adding useful protection, given that at grades 6–10 no serious incentive exists to falsify a consequence-free exam.
Grades 6–10 exam (low-stakes). A single simultaneous national date per school year. Taken at the student's own school, proctored by teachers rotated between classrooms of the same institution (a teacher from one grade proctors another grade's exam, never their own). Common form across the country for each grade, preserving national comparability without the psychometric complexity of equating multiple forms. Security is deliberately light: with no stakes for student or school, the incentive to cheat or tamper is marginal, and the few remaining cases are captured ex post by the system's structural defenses (self-correcting stratification, gap between school self-reporting and clean exam, the auditing function of the grade-11 exam over the trajectory). Minors are not moved to outside venues, external proctors are not required, and lost class time is limited to one day per grade per year.
Grade 11 exam (high-stakes). Applies the full protocol described in section 6: venue away from the student's school (a model analogous to ICFES in Colombia, ENEM in Brazil, or PAES in Chile), randomly drawn citizen jurors —not teachers, since teachers are part of the measured system and citizens are not—, statistically equated isomorphic forms by shift, operational secrecy of the item bank, control infrastructure. A single national date with a bounded make-up for justified cases, applying an alternative equated form. Here the full fortress is justified because the exam distributes scholarships: the gaming incentive is real, and the cost of protection is bounded to a single annual event with a single cohort (~5–10% of the system's enrollment).
The deeper reason for this asymmetry is the same operational rule of the system: protection is applied where the incentive to cheat exists, not as a uniform property of the measurement apparatus. In stakes-free grades, moving millions of minors to outside venues, twice a year, with external proctors, would be a cosmetic imitation of the serious exam's apparatus: it would produce the appearance of rigor without its substance, multiply logistical budget by an order of magnitude, and add a perception of high stakes that would contaminate precisely the clean data being sought.
Every public school in the system has a cafeteria. All students eat lunch using the student card, with no cash in the cafeteria lines. This operational detail is central: the literature on school-meal stigma is unanimous that when the payment method publicly signals who is subsidized, the students who most need the service stop using it. A single interface (universal card), with the difference lying in who loads the credits (the State for beneficiaries; families who choose to contribute do so from home, never in the line), eliminates the problem at no additional cost.
Credits renew monthly and do not accumulate, which prevents hoarding, secondary credit markets, and the decoupling of the benefit from its immediate purpose. The regional evidence on school feeding (attendance, cognition, equity) is among the most solid in educational policy.
A monthly transfer (reference: USD 50 per student in the Argentine case) paid to the student's registered adult, conditional on meeting an attendance threshold (80–85%) with justified absences excluded. The threshold design, as opposed to a binary "perfect attendance" design, eliminates the incentive to send a sick child to class: the child stays home with a medical note, and the family does not lose the benefit.
This is the instrument with the strongest evidence in Latin American social policy: conditional cash transfers (Progresa/Oportunidades in Mexico, Bolsa Família in Brazil, Asignación Universal por Hijo in Argentina, Familias en Acción in Colombia) have consistently shown, across decades of evaluation, positive effects on enrollment and attendance and reductions in child labor. The framework does not propose reinventing the instrument: it proposes integrating it where it already exists and creating it where it does not.
Targeting operates by school, not by family, using the socioeconomic strata that the system builds from territorial census data. This choice responds to a regional constraint: individual means-testing collides with informality (unregistered income produces cruel false negatives, and informality incentivizes concealment). Targeting by school moves classification to an aggregate level, unmanipulable by the institution, with no individual means-test and no intra-school stigma. Leakages (well-off students in poor schools who collect, poor students in well-off schools who do not collect through this channel) are tolerable and are covered by the individual track of the existing national program.
Physical token of the nominal registry that unifies the system: identity, attendance mark, cafeteria access, and benefits channel. Attendance verification operates through three independent signals: the mark at the entrance reader (primary signal, no teacher workload, working offline with deferred sync); the paper attendance list that the teacher is already legally required to take (audited through random sampling against card marks; discrepancies trigger automatic flags); and the cafeteria mark (third signal). The 80% monthly threshold tolerates daily operational noise without requiring surgical precision.
At the end of secondary school, the system's only high-stakes exam: census-wide, individual, in a "competition against oneself" format against an absolute standard, not against other schools. Administered outside the educational institution, in a controlled environment, with full security: statistically equated isomorphic forms by shift, item-bank secrecy, dedicated logistical infrastructure.
Process custody operates through an adapted electoral model: groups of 4 to 5 randomly drawn citizen jurors per venue, from the same province but with residence in different districts (to reduce collusion and local conflicts of interest), plus one State employee per group. Jurors receive a flat per diem (USD 50 per day plus on-site meals) and guard the process: sealed opening of the exam, absence of teacher or institutional interference, complete collection of materials. Grading is done by a machine over multiple-choice sheets; humans add fraud and error surface to the grading task and are removed from it. Anonymized results are published so any citizen may recount.
The grade-11 score distributes scholarships, not university admission. This is a deliberate decision on two grounds. First, gatekeeper exams systematically exclude late bloomers and intelligences that do not fit a standardized test: the literature on the predictive validity of single exams is modest, and false negatives carry irreversible social costs. Second, conditioning university admission on a secondary-school exam interferes with the university autonomy prevailing in much of the region and multiplies the political negotiations required to adopt the framework.
The scholarship is calculated on progress (how much the student advanced along their grade 6–11 trajectory), aggregate context (school stratum and region —never the student's individual scores at lower grades, so as not to contaminate them with stakes), and socioeconomic need. The scholarship is deliberately broad: many students have a real chance of earning it, which sustains the motivation of the entire cohort and not just the top performers. This breadth is also the answer to the effort problem described in 4.2: the grade-11 exam gains seriousness when a majority of students perceive that the outcome can meaningfully change their lives.
Use of the scholarship is unrestricted, and payment is vested in monthly or quarterly installments over 2 to 3 years (a scheme analogous to equity vesting in the private sector). This simultaneously serves three functions: it avoids the operational problem of a large amount received at age 18; it spreads fiscal cost across multiple budget cycles; and it turns what graduates do with the scholarship into a sensor of the system —if a majority do not use it to continue studies, that indicates weaknesses in the secondary-to-tertiary articulation that the proposal would have no other way of knowing about. The condition lies in earning the scholarship, not in spending it: it is the payment of a contract (six years of schooling with cleanly measured progress), not a use-conditioned subsidy.
Intervention is the system's reason for existing: measure → diagnose → intervene. A thermometer without treatment only confirms the patient is sick. This layer is what distinguishes the proposed framework from a mere evaluation system.
A school enters the strengthening circuit when three conditions are met simultaneously: performance significantly below its within-stratum peers (not below the general average, since by arithmetic half of all schools are always below average); measurement in progress, not in absolute level (which predominantly measures the socioeconomic level of the environment, not educational quality); and persistence (2 to 3 consecutive years), to exclude the variance of a single bad year.
Comparison operates within strata defined by socioeconomic context, location, and size, with classification that is public, stable, and unmanipulable by the school (built from external territorial census data). The same trigger applies surgically to a specific subject or grade within an otherwise well-functioning school, allowing bounded and inexpensive interventions.
Once triggered, the type of problem is diagnosed —pedagogical, infrastructural, geographic, or social— because each type demands a different treatment. An infrastructure problem is not addressed by sending a pedagogical consultant, and a pedagogical problem is not addressed by providing equipment.
Intervention operates in four sequential levels:
Level 0 — Early alert. First year under threshold: private notice to the school and its Responsible Authority; guided self-assessment; voluntary access to the support menu. No labeling or public record.
Level 1 — Chosen support. Once the formal trigger is confirmed (2–3 persistent years), the school chooses a strengthening provider from an accredited menu (universities, school networks, certified programs) with earmarked funds from the Fund and a plan with measurable progress targets. Duration: two years.
Level 2 — Conditioned plan. If improvement has not occurred after Level 1, the Responsible Authority co-signs a binding plan; management and leadership are reviewed; resources continue under joint RA–IB administration; the RA is exposed in named public aggregates (pressure rises up the chain toward the responsible government, never down to the school). Duration: two years.
Level 3 — Direct intervention. Severe and persistent failure (typically 4–5 cumulative years, or acute cases). The Intervention Body assumes temporary co-management —support principal and reinforced teaching staff— with exit criteria defined ex ante (sustained return to the peer band over two years). Closure or consolidation: never, under any circumstance. The firebreak is absolute, and especially protects the rural school, whose community role exceeds its score.
The decision to escalate between levels is made by a technical committee (MA + IB + an RA representative) through a reasoned and publicly recorded resolution, auditable against political double standards. The MA certifies the data; the trigger is objective, not contestable. Each case is reviewed annually.
Three mechanisms prevent the well-known death spiral (the diagnosed school loses students and teachers before help arrives): no public labels (the school's banded report does not display its intervention status; the program is called "strengthening", not "intervention"); participation in pilots is framed as early access to proven practices, not as punishment; and a retention bonus for teachers at Level 2–3 schools —payment for a cheap-to-verify input (staying), legitimate under the system's operational rule, which halts the flight of good teachers at precisely the moment the school most needs them.
Good practices are not photocopied from the ranking: schools that appear at the top often do so because of who their students are, not because of a replicable method. The correct process operates thus: identify positive outliers in progress within their stratum (schools that advance more than expected with difficult populations —those genuinely have something worth studying); isolate the practice through qualitative study; pilot it via random assignment among volunteer schools. The invitation is voluntary; the lottery, among those that accept, decides who receives it in the first batch and who in the next. The comparison of voluntary-with vs voluntary-waiting is causally clean, preserves school voluntariness, and turns the transfer of practices into causal evidence before scaling to national cost. With proven effect in pilot, the practice scales; with no effect, it is discarded with evidence and the provider loses accreditation if they repeat the failure.
The guiding principle of publication is that stakes rise up the governance chain. Reputational pressure falls on responsible jurisdictions, which have power and resources to fix things, and never falls down on individual schools and students.
Each family accesses its own school's report —authenticated against the registry through the MA portal, or via a printed report at the mandatory annual assembly where there is no connectivity. The report contains bands relative to peer schools ("above / in line with / below, in progress"), the school's own evolution, and performance by broad area. No public document contains raw scores: bands are relative to different strata and are not commensurable with each other, so the national ranking cannot be reconstructed even by combining every report in the country. Protection operates by artifact design, not by access control. Statistical secrecy (a pre-existing legal figure in the region) backs the boundary: the MA is legally prohibited from releasing identified granular data.
Aggregates by province and stratum, with names, are public. Jurisdictions publicly compare against each other.
Three mechanisms prevent political capture of the data: the independent MA owns the data and publishes on a fixed automatic schedule (no minister can delay or veto a publication); anonymized microdata are accessible to credentialed researchers (an external immune system that detects what the official system would prefer not to see); and every intervention decision is reasoned and recorded, auditable against political double standards.
The realistic goal is not to eliminate fraud —it is to make it expensive, detectable, and unprofitable, and above all to remove the measurement instrument from the hands of those with incentive to falsify it. The defenses operate by design:
Penalties are graduated by role and scale (organized sale of exams is a serious crime; the individual teacher pressured by the very system should not be treated as a traitor to the nation). What deters is certainty of detection, not severity; the anti-fraud budget prioritizes detection over years of incarceration, and protects and rewards the whistleblower, who is empirically the best detector of organizational fraud.
The adoption timeline explicitly separates calibration from the regime with consequences:
Years 1 and 2 — Measurement without consequences. The full system runs idle: instrument calibration, statistical form equating, data pipeline cleanup, juror logistics, installation of cards and readers. No scholarships, no formal interventions with consequences, no data published with stakes. The premise is direct: a poorly calibrated high-stakes system is worse than no system at all, because it produces false data that misleads decision-makers. Money is connected to the instrument only once the instrument has demonstrated it is not lying. The pilot-phase cost for Argentina (including card capex) is estimated at USD 247 million (0.036% of GDP), modest in relation to the informational benefit.
Year 3 onward — Regime. First cohort with a grade-11 scholarship, active intervention triggers, randomized pilots in motion. Only if data quality from the calibration years justifies it.
Optional cultural layer: voluntary and celebratory academic olympiads, organized nationally or regionally, make learning visible and desirable without contaminating measurement or rewarding with system funds. They operate as cultural motor, not as evaluation instrument.
The annual cost of the framework for an adopting country (in USD) follows the parametric formula:
Where M is total public enrollment and M6–11 is enrollment in the target grades; c is the food-ration cost (between USD 0.25 for a snack and USD 3.50 for a full service, with USD 1.30 to 1.60 as the regional reference for lunch); d is the number of school days (~180); b is the reference monthly bonus and f is the targeted share (40% for low-stratum targeting); E is the annual number of graduates, p the share receiving scholarships, and B the average total scholarship with vesting; S is the number of public schools, and k = USD 6,500 averages the per-school intervention cost across the system; F = USD 45 million represents the fixed institutional cost of the MA, and r = USD 11 the variable cost per target student; t = USD 2.5 per student is the annual technology opex. The one-time capex of the tech infrastructure follows S·2,500 + 2·M.
The model's assumption about food-ration cost is anchored in existing regional programs (public sources 2024–2026):
| Program | Annual spending | Beneficiaries | USD/student/year | Service type |
|---|---|---|---|---|
| Brazil PNAE (federal) | USD 976 M | 40 M (100% basic education) | ~24 | Snack |
| Colombia PAE | ~USD 1,250 M | 5.6 M (80% coverage) | ~225 | Complement / lunch |
| Proposal (model) | — | — | 270 | Universal lunch |
| Chile JUNAEB PAE | USD 1,332 M | ~2 M | ~666 | Breakfast + lunch |
The model's assumption (USD 270 per student/year, equivalent to USD 1.50 per lunch over 180 days) falls between Colombian coverage and the Chilean standard: it is a verifiable regional market price, not an optimistic estimate.
Argentina's 2025 GDP was USD 680.7 billion; public-system enrollment reaches 8.12 million students (72% of the total). Consolidated educational spending sits around 3–4% of GDP; the legal target (Law 26,075, currently suspended in its application) is 6%. AUH pays approximately USD 100 monthly per minor to some 4.3 million children, already conditioned on school attendance —representing about 0.77% of GDP.
| Component | USD / year |
|---|---|
| Full measurement (Measurement Authority + grade 6–10 census + grade-11 exam with 50,000 jurors) | 99 M (0.014% of GDP) |
| Universal cafeteria (8.12 M students × USD 1.50 × 180 days) | 2,190 M |
| Attendance transfer — integrated into AUH (pre-existing program; the framework adds automatic daily verification) | ~0 |
| Grade-11 scholarships (40% of graduates × USD 2,000 average, with vesting) | 307 M |
| Intervention (escalation ladder + teacher retention bonus + infrastructure fund) | ~580 M |
| Technology (annual opex) | 20 M |
| TOTAL — Recommended scenario | ~2,900 M (0.43% of GDP) |
The recommended scenario costs less than the existing AUH, equals 13.5% of current consolidated educational spending, and fits within approximately one fifth of the gap toward the 6%-of-GDP target that Argentine law already mandates. It requires no new taxes: it requires traversing part of the legal path already laid out.
For a country that already has a school-attendance-conditioned cash transfer and a school feeding program (the situation of most of the region: Argentina, Brazil, Chile, Colombia, Mexico), adopting the full framework costs between 0.2% and 0.3% of GDP. For a country that adopts the framework starting from scratch on these two components, the range is 0.7%–0.8% of GDP. The pure measurement-intervention-scholarship system —the model's "intelligence", without the meals or the transfer— costs 0.10%–0.15% of GDP in any country. What is expensive in the framework is feeding children and compensating families against child labor: two policies that the region is already largely paying for.
Honest scope: the proposed framework is the information, welfare and intervention layer of the educational system. It is not a pedagogical reform. Curriculum, initial teacher training, teacher salaries, early childhood, and class size —the educational engine in the strict sense— are deliberately out of scope and require their own policies, with their own justification and their own costing. Measuring better does not, by itself, teach better. This framework exists so that pedagogical policies, when they appear, can be tested with evidence and scaled if they work, rather than adopted by conviction and discarded without measurement.
The present version is an architecture document. The following fine-design items are deliberately left open:
This framework was built iteratively over the months preceding the date of publication, through a process of systematic adversarial critique: each component was proposed, attacked in search of its structural failure, demolished when that failure appeared, and replaced by the design that survived the critique. Early versions contained incentive tournaments, performance-based school funding, individual prizes for test scores, blockchain smart contracts, and other mechanisms that the comparative literature has shown to be problematic. Each was preserved in the working log with its reason for discard, until reaching the current design.
The ideas that survived iterated critique compose this document: the thermometer/reward separation, citizen custody of the high-stakes exam, stratified sampling with self-correcting property, the conditional cash transfer as instrument against child labor, randomized pilots at the school level, and the operational principle that money only touches verifiable inputs. The full iteration log is preserved as a separate working document, available upon request.
The author explicitly invites technical critique of this document by specialists in educational policy, economics of education, psychometrics, social policy, and Latin American public administration. Contact information and updated versions are available at migueleonardortiz.com.ar.
The quantitative data used in this document come from the following public sources, consulted in June 2026: