¿No posee una cuenta?
Foundation-model governance pathways: from preference models to operative rules
Agustin V. Startari.
AI Power and Discourse, vol. 1, núm. 1, 2025, pp. 1-10.
Dirección estable:
https://www.aacademica.org/agustin.v.startari/223
Resumen
Current research on foundation model alignment concentrates on preference optimization and reward model design, yet it does not explain how these mechanisms become enforceable linguistic structures in model outputs. This paper introduces a formal bridge between training choices and governance-level effects by defining the operative rule as a compiled constraint that determines which clause types a model may produce. The framework maps policy inputs such as statutes, institutional directives, and redline restrictions into a preference graph over clause types, then compiles those directives into executable constraints that control decoding. It proposes measurable clause-level metrics including coverage, leakage, authority-bearing density, and constraint satisfaction, together with an auditable chain of custody that links governance inputs to observable textual outcomes. Cross-domain simulations in healthcare, securities disclosure, and administrative reporting demonstrate how governance parameters can be enforced without access to proprietary weights. The result is a verifiable clause calculus that operationalizes accountability and replaces abstract alignment narratives with testable governance artifacts connecting preference models to the operative law embedded in generated text.
DOI
Primary archive: https://doi.org/10.5281/zenodo.17533075
Secondary archive: https://doi.org/10.6084/m9.figshare.30589940
SSRN: Pending assignment (ETA: Q4 2025)
Texto completo
Dirección externa:
Esta obra está bajo una licencia de Creative Commons.
Para ver una copia de esta licencia, visite https://creativecommons.org/licenses/by-nc-nd/4.0/deed.es.
Para ver una copia de esta licencia, visite https://creativecommons.org/licenses/by-nc-nd/4.0/deed.es.
ARK:
Descargar
PDF
https://zenodo.org/records/17533075