ADR-001: Multi-Tenant Architecture
Status: Accepted (control-plane); partially superseded for the data-plane by ADR-023 Date: 2026-04-28 Updated: 2026-05-16 (ADR-023 introduced data-plane separation)
Context
TalkIDE is a multi-tenant SaaS platform. Each user owns their own projects and generated apps. Data isolation is a core security requirement at two distinct layers:
- Control-plane data (TalkIDE platform DB) — users, tenants, projects, conversations,
billing, activity logs. Lives in cluster A (
talkide-prod-pg, DO Managed PG 18). - Data-plane data (user-app runtime DB) — each generated user app has its own
per-app schema for application data. Lives in cluster B (
talkide-dataplane-pg).
These two layers have physically separate connection budgets and pooler topologies.
Decision
- Control-plane multi-tenancy —
tenant_idcolumn on every tenant-scoped entity in the TalkIDE platform DB. Every JPA query filters bytenantIdextracted from the JWT token. Single shared schema, row-level isolation enforced at the application layer. - Data-plane multi-tenancy — schema-per-app in a shared data-plane DB on cluster B.
Each generated app gets a dedicated PG role + schema + per-role
ALTER ROLE SET search_path. REVOKE on cross-schema access enforces isolation at the DB layer. See ADR-023 for the full mechanism (self-host PgBouncer withauth_query, SCRAM-SHA-256 verifier in Kotlin,dataplane_authcredential table, etc.).
The control-plane tenant_id approach is unchanged; ADR-023 only adds the data-plane
layer (which did not exist when this ADR was first written).
Consequences
- Control-plane: simple row-level isolation, suitable for the platform metadata scale (thousands of tenants × millions of rows). Cross-tenant access prevented by the backend on every request.
- Data-plane: per-schema isolation gives DB-level enforcement (revoked roles cannot
cross-query). Per-app
search_pathsurvives transaction pooler thanks to role default attribute. Backup granularity ispg_dump -n <schema>per app. - Connection budget separation: control-plane and data-plane never compete for slots — each cluster has its own pool budget. Solves the legacy ADR-016 problem of per-app HikariCP pools exhausting the platform’s shared 25-slot DO Basic cap.
ADR-002: Claude Agent Runtime for Vibecoding
Status: Accepted — superseded for production by ADR-024 (LIVE since 2026-05-21) Date: 2026-04-28 Updated: 2026-05-22 (ADR-024 cut-over)
Context
Vibecoding requires orchestrating multiple AI coding agents (backend developer, frontend developer, devops, tester, etc.) to build a functional web application from a natural language description. This requires a powerful agentic framework with tool access (file system, shell, etc.).
Decision (current)
TalkIDE uses two MaraExecutor implementations selectable per environment:
ClaudeCliExecutor— local-dev path. Spawns theclaudeCLI as a child process per conversation (Max plan auth, noANTHROPIC_API_KEYneeded). Used by developers running TalkIDE locally.NetworkWorkerExecutor— production / cloud path (default since 2026-05-21). Talks over HTTP+SSE to a dedicatedtalkide-workerpod in the{tenant}-{env}K8s namespace. Worker runs the Anthropic Agent SDK in-process and calls Anthropic via a control-plane gateway-proxy (the worker never holds the rawANTHROPIC_API_KEY). See worker-runtime.md and ADR-024.
The legacy in-process AgentSidecarExecutor (Node child process via ProcessBuilder +
NDJSON stdin/stdout pipe, lived inside the BE pod) was deleted by the ADR-024 cut-over
commit 70fd510 — not migrated.
Consequences
- Production runs cloud-side: user laptops are no longer required to run agents. Worker
pods scale per tenant-env and survive BE redeploys (3-week session resume via NFS
CLAUDE_CONFIG_DIR). - Local-dev keeps CLI: Max plan billing covers the developer without per-token cost.
- Secret isolation:
ANTHROPIC_API_KEYlives only in control-plane BE pod. Compromising a tenant-env worker pod cannot leak the key.
ADR-003: Project Versioning Model
Status: Accepted Date: 2026-04-28
Context
Users should be able to iterate on their app through conversation and apply changes selectively. They need a concept of “versions” so they can see what changed and roll back if needed.
Decision
Every time a user completes a vibecoding conversation and asks the PM to “apply” the changes,
a new ProjectVersion record is created with an incrementing versionNumber. The version
captures a snapshot description of what changed. Applying a version triggers a rebuild and
restart of the user’s project. Only one version can be APPLIED at a time; applying a new version
automatically supersedes the previous one.
Consequences
Simple linear versioning model. Users get a clear history of their app’s evolution. Rollback is supported by re-applying a previous version. No git-level branching in MVP.
ADR-004: JWT Authentication with Email/Password
Status: Accepted Date: 2026-04-28
Context
TalkIDE needs stateless authentication for a multi-tenant SaaS application.
Decision
- JWT access tokens with short expiration (15 min) + refresh tokens (14 days)
- Tokens stored in localStorage on the frontend
- Axios interceptor handles automatic token refresh on 401
- Spring Security filter validates JWT on each request
- Email/password authentication only for MVP (no OAuth)
Consequences
Stateless backend, scalable. Token refresh is transparent to the user. No SSO or social login in MVP.
ADR-005: Feature-Based Package Structure
Status: Accepted Date: 2026-04-28
Context
Need a clear way to organize code that scales with the number of features.
Decision
Both backend and frontend use feature-based package structure:
- Backend:
features/<feature>/api/,features/<feature>/domain/,features/<feature>/data/ - Frontend:
screens/<feature>/with components, model, i18n subdirectories
Consequences
Each feature is self-contained. Cross-feature dependencies go through common/.
ADR-006: UseCase Bean Pattern
Status: Accepted Date: 2026-04-28
Context
Need a pattern for business logic that is testable and follows single responsibility.
Decision
Each business operation is a Spring @Service bean named <Action><Entity>UseCase
(e.g., CreateProjectUseCase). UseCase beans contain business logic, call repositories,
and return domain models. Controllers map HTTP requests to UseCase calls and convert results
to DTOs.
Consequences
Clear separation of concerns. Easy to test. One class per operation.
ADR-007: Design System with CSS Custom Properties
Status: Accepted Date: 2026-04-29
Context
TalkIDE needs a consistent visual language across all screens. The design handoff from Claude Design defined a warm charcoal palette with oklch accent colors, specific typography, and reusable component patterns.
Decision
Design tokens are implemented as CSS custom properties (:root level) and registered as
Tailwind CSS 4 theme values via @theme. This allows using tokens both as var(--bg-1) in
custom styles and as Tailwind classes (bg-bg-1). Shared UI components (TLogo, TAvatar, TPill,
TTopBar, etc.) are Vue SFCs using these tokens. Icons use Lucide Vue Next (stroke-based,
24x24 viewBox).
Consequences
- Single source of truth for design tokens in
main.css - Consistent look across all screens including auth flows
- Components are composable and reusable
- oklch colors provide perceptually uniform accent palette
ADR-008: Server-Sent Events (SSE) Convention
Status: Accepted Date: 2026-05-01
Context
TalkIDE uses SSE for real-time push notifications in two distinct contexts:
- Per-project workspace activity feed (UC-05001) — scoped to a single project
- Per-tenant Studio recent activity feed (UC-06001) — aggregated across all projects of a tenant
A unified convention for event names, payload shape, auth, and reconnect strategy prevents divergence between streams.
Decision
Standard SSE Event Names
All TalkIDE SSE streams use the following event names:
| Event name | Purpose | Payload |
|---|---|---|
connected | Server confirms the stream is established | {} |
activity | A new activity event is available | Serialised DTO — ActivityDto (per-project) or StudioActivityDto (per-tenant) |
heartbeat | Keep-alive ping emitted every 30 seconds | {} |
Authentication
Bearer token must be sent in the Authorization header.
The native browser EventSource API does not support custom headers, so TalkIDE uses a
fetch-based SSE parser on the frontend for all SSE streams.
Reconnect Strategy
Clients implement exponential backoff reconnect:
- Base delay: 1 second
- Maximum delay: 30 seconds
- Maximum attempts: ~10
- On reconnect, the client re-fetches the REST snapshot to fill any gap missed during disconnection.
Endpoints Using This Convention
| Stream | Endpoint | UC |
|---|---|---|
| Workspace activity (per-project) | GET /api/v1/projects/{projectId}/activities/stream | UC-05001 |
| Studio recent activity (per-tenant) | GET /api/v1/studio/recent-activity/stream | UC-06001 |
Consequences
- Consistent event names across all current and future SSE streams
- FE SSE parser is shared infrastructure (no stream-specific client code needed)
- Adding a new SSE stream requires only a new endpoint and DTO; the convention stays stable
ADR-009: Server-side Activity Feed Deduplication
Status: Accepted Date: 2026-05-01
Context
The activity feed can contain long consecutive runs of identical tool calls — for example, an agent
reading 20 files in a row produces 20 TOOL_USE rows with tool_name=Read. Displaying all of them
unfiltered would flood the UI with noise and make it hard to follow what is actually happening.
An early prototype placed the dedup logic in the frontend (Pinia computed property). This created a split brain situation: the business rule “what counts as a duplicate” lived in the UI layer, the backend stored raw events without any concept of categories, and the Studio cross-tenant feed and the per-project workspace feed each needed their own separate dedup implementation.
Decision
Deduplication is performed entirely on the backend:
-
ToolCategoryenum — a new business-level categorisation of tool names with valuesREADING | EDITING | EXECUTING | DELEGATING | BROWSING | OTHER. Stored astool_category(VARCHAR column) on theactivitiestable. Set at insert time byRecordActivityUseCase.recordToolUsevia a deterministic mapping fromtool_name. Non-tool events (TASK_STARTED,TASK_COMPLETED,AGENT_MESSAGE) storenull. -
Consecutive coalesce — the repository query uses a window function
LAG()over(id DESC)to detect “gaps and islands”: consecutive rows sharing the same(tool_category, agent_role, parent_activity_id)key are collapsed into a single representative row (the first one in the run). This is the standard SQL “gaps and islands” pattern. -
First-page only — dedup is applied only when no
afterIdcursor is provided (i.e. the initial/first page of results). “Load more” requests (withafterId) receive raw, undeduped events so that the full historical record remains accessible. -
DTO exposure —
ActivityResponse(workspace) andStudioActivityDto(Studio) both expose thetoolCategoryfield so the frontend can render a human-readable category label (e.g. “Reading files”) instead of a raw tool name. -
FE = pure renderer — the frontend applies no own dedup logic. It renders whatever the backend returns.
Consequences
- Single source of truth: the dedup rule (consecutive same-category / same-agent / same-parent → one row) is defined and enforced exactly once, in the backend repository query.
- Consistency: both the per-project Workspace feed (UC-05001) and the cross-tenant Studio feed (UC-06001) use the same dedup logic automatically.
- Default page limit 10 (Studio) guarantees approximately 10 visible rows on the initial snapshot; exact count depends on how varied the incoming event stream is.
- FE simplification: the frontend no longer needs computed dedup properties in the store; it is a pure rendering layer.
- DB requirement: the
LAG()window function requires a SQL engine that supports it. PostgreSQL 9.0+ supports it natively. H2 (used in tests) supports window functions since version 1.4 (which is the default in the Spring Boot BOM); no test-specific workaround is needed.
Alternatives Considered
| Alternative | Reason rejected |
|---|---|
| FE-side dedup (Pinia computed) | Business rule in the UI layer; duplicated logic across workspace and Studio feeds; harder to test |
BE dedup on raw tool_name (no category) | Read and Glob are both “reading” — grouping by raw name would miss same-function tools; also more fragile to tool name changes |
| Additional DB columns for a pre-shortened list | Over-engineering; adds write-time complexity without significant read-time benefit over the window function approach |
ADR-010: Auto-save with Optimistic Update for Preference Toggles
Status: Accepted Date: 2026-05-01
Context
The Sound Preferences section (UC-01007) and the Language picker in Account (UC-01006) contain low-stakes settings that users expect to take effect immediately — an explicit “Save” button adds friction without benefit. At the same time, the app must remain consistent if the API call fails.
Decision
- Auto-save on change: every toggle flip or dropdown selection fires
PUT /me/sound-preferences(orPUT /me/locale) immediately, without a save button. - Debounce for slider: the master volume range slider debounces API calls by 300 ms to avoid flooding the backend while the user drags.
- Optimistic update: FE applies the change to local state before the API response arrives.
If the call returns a non-2xx status, FE rolls back the local state to the previous value and
displays an error toast (
"Could not save. Please try again."). - Granular endpoints (
/me/localeand/me/sound-preferencesinstead of a single/me/preferences): locale and sound settings are changed from different UI sections with different save triggers. A unified endpoint would force full-object partial-update validation and complicate rollback (which field failed?). Separate endpoints keep validation and rollback scoped to exactly what changed.
Consequences
- Snappy UX — changes feel instant; no form submission ceremony.
- Each auto-save endpoint is idempotent (PUT with full preference object); safe to retry.
- FE must maintain “previous value” snapshot for rollback; implemented as a
refcaptured before the optimistic update. - No conflict resolution needed in MVP (single session per user assumed).
ADR-011: Per-Project Plugin & Structured Config
Status: Accepted Date: 2026-05-03
Context
Původní architektura sdílela jeden globální Mařin tým plugin (talkide-be/plugin/) napříč
všemi generovanými projekty. Per-projektovou konfiguraci (porty, URL) Mara dostávala jako
text v CLAUDE.md — typu “BE port: 8097, neuhybej na 9090”.
V praxi tato textová instrukce selhala: Mařin devops agent na příkaz kill-port.sh 9090
sundal samotnou TalkIDE platformu (BE běžící na 9090), ve které právě Mara pracovala.
Textová instrukce neměla dostatečnou autoritu vůči shell příkazu (incident SIGKILL
pekarna-u-jelena vs. TalkIDE).
Druhý problém: globální plugin nešlo per-projektově upravit (např. uživatel chce pro svůj projekt jiný tón komunikace Mary). Jakákoli změna ovlivňovala všechny projekty.
Decision
Přechod na per-project plugin & strukturovaný config:
-
<project>/.talkide/plugin/— rsync kopie globálního pluginu, obnovuje se při create projektu (eager) a před každým spawn Claude CLI (lazy guard, idempotent). Plugin se předává Claude CLI přes--plugin-dir <project>/.talkide/plugin. -
<project>/.talkide/project.yml— strukturovaný YAML s identitou projektu, tech stackem, URLs a porty. Plugin skripty čtou porty výhradně z tohoto souboru přesyq— nikdy z argumentů ani z textu CLAUDE.md. -
Porty deterministicky odvozené z
project.id:BE = 8090 + id,FE = 5200 + id. DB sequence proidstartuje na 1, takže BE port ≥ 8091, FE ≥ 5201 — kolize s platformními porty 9090/5200 je strukturálně nemožná. -
.talkide/team/— placeholder adresář pro budoucí per-project user overrides agentů (MVP: prázdný.gitkeep, žádná logika). -
CLAUDE.md — drop konkrétních hodnot portů. Místo toho statický pointer “See
.talkide/project.ymlfor ports, URLs, and configuration.” -
.project-config.ymldeprecated — sloučen do.talkide/project.yml(privacy boundary zachována: BE-only sidecar, gitignored, MAY contain sensitive data).
Detailní spec: per-project-architecture.md.
Consequences
- Strukturální bezpečnost: kolize s platformními porty fyzicky nemožná na úrovni datového modelu, ne až na úrovni textové instrukce agentovi.
- Per-projekt izolace: každý projekt má svou kopii pluginu — připraveno pro budoucí
per-project overrides agentů (
.talkide/team/). - Bug fixy v pluginu se propagují automaticky — lazy guard rsync přepíše per-project kopie při dalším spawn každého projektu.
- Větší disková stopa: každý projekt nese kopii pluginu (~stovky kB). Akceptovatelné při očekávaném počtu projektů na uživatele (jednotky až desítky).
- Závislost na
yqv plugin skriptech — povinný system tool, žádný fallback. Skript exit s chybou pokud chybí. - Migrace existujících projektů: žádná. Stávající
output-projects/*se wipnou (nikdo nepoužívá v produkci).
Pokračování v adr/
ADRy 012+ žijí jako samostatné soubory v documentation/adr/. Pro
orientaci klíčové novější ADRy a jejich vztah k inline ADRům výše:
| ADR | Téma | Vztah |
|---|---|---|
| ADR-013 | Git versioning per projekt | Specializuje ADR-003 (project versioning) na konkrétní Git layout |
| ADR-014 | fabric8 K8s client v BE | Foundation pro vše níže (ADR-015, 017, 019, 024) |
| ADR-015 | Namespace-per-tenant-env | Substrát pro worker extrakci a hosting (viz hosting-architecture.md) |
| ADR-019 | Kaniko Job pattern | Rozšíření na gradle/test Joby plánováno v ADR-024 |
| ADR-021 | Dynamic ingress per projekt | Implementuje <slug>.talkide.app routing |
| ADR-022 | Publish workflow (DRAFT → LIVE) | Definuje stack životního cyklu projektu — dev preview vs. published prod, FE flow viz flows/version-flow.md |
| ADR-023 | Schema-per-app data-plane | Mění ADR-001 sémantiku pro user-app DB — viz aktualizovaný ADR-001 výše a hosting-architecture.md |
| ADR-024 | talkide-worker extrakce | Mění ADR-002 sémantiku pro production — viz aktualizovaný ADR-002 výše, worker-runtime.md, worker-production.md |
| ADR-025 | Mailgun transactional email | Provider abstraction + audit log + spec viz transactional-email.md |
| ADR-026 | Environment first-class | DEFAULT + USER_CREATED prostředí; každý projekt patří do prostředí; tenant-env = jednotka K8s namespace + ResourceQuota + billing (UC-10010–UC-10015) |
Budoucí ADRy přidávejte do adr/ pro lepší granularitu. Pokud novější ADR mění některý
z inline ADR-001..011, aktualizuj inline ADR o “Superseded by” odkaz a stručný update —
neboř starší obsah, drž bidirectional traceability.
Thanks for the feedback.