How AI reads Slate without ever seeing a student's name
A technical walk-through of the no-PII architecture that lets CeliaConnect analyze Slate data without student names, emails, or addresses leaving Slate.
Every enrollment-management AI product I have reviewed has the same structural weakness: to do its analysis, it must see student PII. Names. Emails. Phone numbers. Sometimes transcripts and essays. The vendor will describe their security posture in great detail — encryption at rest, encryption in transit, SOC 2 underway, a signed DPA — and all of it is true, and none of it addresses the real concern, which is that the AI needs to read your students’ personal information in order to work.
CeliaConnect is built differently. Our AI is architecturally prohibited from receiving student PII. Not in the sense of “we promise we don’t log it” or “we scrub it before storage.” In the sense that the PII is never requested from Slate in the first place. If a regulator, an auditor, or a concerned parent asks what data the AI analyzed to produce a score, the honest answer is: anonymous student IDs and behavioral metadata. Nothing else.
This article is the technical walk-through. If you are responsible for security review at an institution considering CeliaConnect, this is the diagram and the controls you need to review.
The boundary
There is a clear line between what exists inside your Slate instance and what exists in the CeliaConnect edge. The Slate side holds every identifying attribute: name, email, address, phone, date of birth, parent contact information, financial aid details, essay text, transcript data. The CeliaConnect side holds none of that.
What CeliaConnect does hold, per institution:
- An anonymous Slate student ID (a UUID or integer from Slate’s internal table)
- Behavioral signals: login timestamps, email-open events, form-interaction events, portal activity
- Milestone states: application stage, checklist completeness, document submission status, days in stage
- Demographic categories at a bucket level: first-gen flag, in-state versus out-of-state, intended program
- Cohort and program codes (not names — codes)
- Institution-provided baselines: median days per stage, historical yield rates by cohort
- A per-institution data dictionary mapping Slate field names and codes to semantic meaning
That is the entire input to Celia’s reasoning. Everything else — literally everything that could be used to identify a student — stays in your Slate instance.
The five hops
From deposit-intent signal to score writeback, every piece of data touches five boundaries. Each has a preventive control (we do not pass PII) and a detective control (we fail closed if something slips through).
Hop 1: Slate to cc-slate-integration
Slate runs a Query that we define together during onboarding. The query returns a set of rows, one per student, with columns you and I agree on in advance. The query never selects first_name, last_name, email, phone, address, dob, or any field categorized as PII. The specific field list is pinned in your Slate Query service definition, so if someone on your side ever tries to add a PII field, it would require editing the Slate service itself and rerunning the handshake.
When cc-slate-integration (the edge Worker that pulls from Slate) receives the response, it runs a PII-shape detector against every value. Email-shaped strings, phone-number patterns, and obvious name patterns cause the fetch to fail closed and alert. We would rather lose a sync than silently ingest identifying data.
Hop 2: cc-slate-integration to per-tenant D1
Every CeliaConnect customer gets their own dedicated D1 database. Your institution’s data never shares a database with anyone else’s. Tenant isolation is architectural, not a row-level filter; cross-tenant queries are impossible because the D1 binding is scoped per request.
The data lands in a students table whose primary key is the Slate student ID. The raw Slate fields are stored as a JSON blob in a column called raw_fields. We do this because every institution configures Slate differently — the field names, the prompt codes, the cohort structure are all unique to you. A single canonical schema would either flatten your custom structure or fail to accommodate it, so we carry the shape you give us.
Hop 3: cc-celia to Anthropic
When it is time to score a student, cc-celia assembles a prompt. The prompt contains:
- Your data dictionary (which tells Claude what each field code means at your institution)
- The student’s anonymous Slate ID
- The non-PII fields from
raw_fields - Your institutional baselines
- The relevant system prompt describing the analysis being requested
Before the prompt is sent to Anthropic, it passes through a PII scrubber that matches against known PII patterns (emails, phones, names, SSNs, DOBs). A match fails the call closed and raises an incident. We would rather miss an analysis than leak.
Anthropic’s API receives the scrubbed prompt. Under the terms of our enterprise agreement, Anthropic does not retain prompt data for training, does not log prompts beyond operational necessity, and does not share data with third parties. But even absent those terms, the prompt does not contain identifying information about any student; it contains an internal Slate ID, behavioral patterns, and institution-specific semantic context. An attacker who breached Anthropic’s prompt logs tomorrow would learn that a student with internal ID 4827193 has not logged in for fourteen days. They would not learn that student’s name, email, or any way to contact them.
Hop 4: Anthropic back to cc-celia
Claude’s response also passes through a return-path PII detector. If Claude were to somehow produce an output containing a PII pattern — a hallucinated name, a fabricated email — the response is discarded and regenerated. This has happened roughly once per ten thousand calls in our internal testing. Defense in depth.
Hop 5: cc-celia to Slate (writeback)
CeliaConnect writes scores back to Slate using a Source Format POST. The writeback targets a set of institution-defined fields — usually prefixed SS_CELIA_ or similar — and the write is authenticated by a Slate service user credential that we hold in envelope-encrypted KV storage.
Every writeback is audit-logged. The audit record captures the actor (always cc-celia for automated writes, identified human for manual overrides), the timestamp, the target field, the before-value, and the after-value. The audit log is hashed into a chain every hour, and a hash anchor is published weekly to R2. If you ever want to prove that a particular writeback happened at a particular time and was not retroactively altered, the chain gives you that proof.
Credential handling
Slate service-user credentials do not live in D1. They live in Cloudflare KV, wrapped with envelope encryption: a per-tenant Data Encryption Key wrapped by a Cloudflare-held Key Encryption Key. In plain language: even if someone exfiltrated the entire KV store, they would have ciphertext for every tenant’s Slate credentials, with no way to unwrap any of it without also compromising a Cloudflare-held secret that never touches our code.
Rotation is available on-demand from the admin panel.
Why this matters for FERPA
FERPA covers the privacy of education records. An education record is generally understood to include any record maintained by or on behalf of the institution that contains personally identifiable information about a student.
CeliaConnect, by architecture, does not process personally identifiable information. The student cannot be identified from what Celia receives; there is no name, no email, no address, no DOB, no SSN, no photograph, no biometric, and no essay text. The only identifier in our possession is an internal Slate student ID, which is not meaningful outside your Slate instance.
This dramatically simplifies FERPA posture. Most institutions that have reviewed CeliaConnect have concluded that the product does not trigger the vendor-education-record flow that would otherwise require a school-official designation, directory-information analysis, and a DPA covering education records. What we do require is the narrow DPA covering the behavioral metadata and cohort data we process — which is materially simpler, and which most counsel can sign off on without a board review.
Why this matters for breach blast radius
If CeliaConnect were compromised tomorrow, the worst-case data exposure is: anonymous Slate IDs and behavioral metadata for students at breached institutions. An attacker could not contact those students, could not identify them, could not publish their names, could not do anything that materially harms them.
This is not true of AI vendors that process student names, emails, and essays. A breach at one of those vendors exposes the actual students to actual harm. Compare the two risk profiles when your CISO runs diligence.
What we cannot do (and why that is okay)
The obvious question: what analysis is CeliaConnect giving up by not seeing PII?
The honest answer is: almost nothing that matters.
The signals that predict melt, readiness, engagement, and yield are behavioral. They are about what the student does, not who the student is. A student who has not logged into the portal in fourteen days is at risk regardless of their name. A student whose application stage has not advanced in three weeks is signaling friction regardless of their email address. The predictive power of these signals is in the pattern, not the identity.
What we cannot do is generate personalized content that addresses a student by name. We cannot write “Dear Jessica, we noticed you have not logged in.” That kind of content has to be composed and sent by Slate or by your counselors using Slate’s data. CeliaConnect’s role is to tell them which students to compose it for, and why.
That turns out to be the valuable role.
What is in the security whitepaper
The full security whitepaper, available on request, covers:
- The detailed PII scrubber implementation and its test corpus
- Key management architecture including rotation procedures
- Audit chain construction and independent verification procedure
- Incident response plan including customer notification timelines
- Sub-processor list and data-flow diagram
- DPA template
- Third-party review summary (when SOC 2 completes)
If you are inside an institution evaluating CeliaConnect and you need this level of detail for IT or legal review, please request the whitepaper on our security page.
Tagged