Architecture Decisions -- TalkIDE Internal Docs

TalkIDE internal documentation

ADR-001: Multi-Tenant Architecture

Status: Accepted (control-plane); partially superseded for the data-plane by ADR-023 Date: 2026-04-28 Updated: 2026-05-16 (ADR-023 introduced data-plane separation)

Context

TalkIDE is a multi-tenant SaaS platform. Each user owns their own projects and generated apps. Data isolation is a core security requirement at two distinct layers:

Control-plane data (TalkIDE platform DB) — users, tenants, projects, conversations, billing, activity logs. Lives in cluster A (talkide-prod-pg, DO Managed PG 18).
Data-plane data (user-app runtime DB) — each generated user app has its own per-app schema for application data. Lives in cluster B (talkide-dataplane-pg).

These two layers have physically separate connection budgets and pooler topologies.

Decision

Control-plane multi-tenancy — tenant_id column on every tenant-scoped entity in the TalkIDE platform DB. Every JPA query filters by tenantId extracted from the JWT token. Single shared schema, row-level isolation enforced at the application layer.
Data-plane multi-tenancy — schema-per-app in a shared data-plane DB on cluster B. Each generated app gets a dedicated PG role + schema + per-role ALTER ROLE SET search_path. REVOKE on cross-schema access enforces isolation at the DB layer. See ADR-023 for the full mechanism (self-host PgBouncer with auth_query, SCRAM-SHA-256 verifier in Kotlin, dataplane_auth credential table, etc.).

The control-plane tenant_id approach is unchanged; ADR-023 only adds the data-plane layer (which did not exist when this ADR was first written).

Consequences

Control-plane: simple row-level isolation, suitable for the platform metadata scale (thousands of tenants × millions of rows). Cross-tenant access prevented by the backend on every request.
Data-plane: per-schema isolation gives DB-level enforcement (revoked roles cannot cross-query). Per-app search_path survives transaction pooler thanks to role default attribute. Backup granularity is pg_dump -n <schema> per app.
Connection budget separation: control-plane and data-plane never compete for slots — each cluster has its own pool budget. Solves the legacy ADR-016 problem of per-app HikariCP pools exhausting the platform’s shared 25-slot DO Basic cap.

ADR-002: Claude Agent Runtime for Vibecoding

Status: Accepted — superseded for production by ADR-024 (LIVE since 2026-05-21) Date: 2026-04-28 Updated: 2026-05-22 (ADR-024 cut-over)

Context

Vibecoding requires orchestrating multiple AI coding agents (backend developer, frontend developer, devops, tester, etc.) to build a functional web application from a natural language description. This requires a powerful agentic framework with tool access (file system, shell, etc.).

Decision (current)

TalkIDE uses two MaraExecutor implementations selectable per environment:

ClaudeCliExecutor — local-dev path. Spawns the claude CLI as a child process per conversation (Max plan auth, no ANTHROPIC_API_KEY needed). Used by developers running TalkIDE locally.
NetworkWorkerExecutor — production / cloud path (default since 2026-05-21). Talks over HTTP+SSE to a dedicated talkide-worker pod in the {tenant}-{env} K8s namespace. Worker runs the Anthropic Agent SDK in-process and calls Anthropic via a control-plane gateway-proxy (the worker never holds the raw ANTHROPIC_API_KEY). See worker-runtime.md and ADR-024.

The legacy in-process AgentSidecarExecutor (Node child process via ProcessBuilder + NDJSON stdin/stdout pipe, lived inside the BE pod) was deleted by the ADR-024 cut-over commit 70fd510 — not migrated.

Consequences

Production runs cloud-side: user laptops are no longer required to run agents. Worker pods scale per tenant-env and survive BE redeploys (3-week session resume via NFS CLAUDE_CONFIG_DIR).
Local-dev keeps CLI: Max plan billing covers the developer without per-token cost.
Secret isolation: ANTHROPIC_API_KEY lives only in control-plane BE pod. Compromising a tenant-env worker pod cannot leak the key.

ADR-003: Project Versioning Model

Status: Accepted Date: 2026-04-28

Context

Users should be able to iterate on their app through conversation and apply changes selectively. They need a concept of “versions” so they can see what changed and roll back if needed.

Decision

Every time a user completes a vibecoding conversation and asks the PM to “apply” the changes, a new ProjectVersion record is created with an incrementing versionNumber. The version captures a snapshot description of what changed. Applying a version triggers a rebuild and restart of the user’s project. Only one version can be APPLIED at a time; applying a new version automatically supersedes the previous one.

Consequences

Simple linear versioning model. Users get a clear history of their app’s evolution. Rollback is supported by re-applying a previous version. No git-level branching in MVP.

ADR-004: JWT Authentication with Email/Password

Status: Accepted Date: 2026-04-28

Context

TalkIDE needs stateless authentication for a multi-tenant SaaS application.

Decision

JWT access tokens with short expiration (15 min) + refresh tokens (14 days)
Tokens stored in localStorage on the frontend
Axios interceptor handles automatic token refresh on 401
Spring Security filter validates JWT on each request
Email/password authentication only for MVP (no OAuth)

Consequences

Stateless backend, scalable. Token refresh is transparent to the user. No SSO or social login in MVP.

ADR-005: Feature-Based Package Structure

Status: Accepted Date: 2026-04-28

Context

Need a clear way to organize code that scales with the number of features.

Decision

Both backend and frontend use feature-based package structure:

Backend: features/<feature>/api/, features/<feature>/domain/, features/<feature>/data/
Frontend: screens/<feature>/ with components, model, i18n subdirectories

Consequences

Each feature is self-contained. Cross-feature dependencies go through common/.

ADR-006: UseCase Bean Pattern

Status: Accepted Date: 2026-04-28

Context

Need a pattern for business logic that is testable and follows single responsibility.

Decision

Each business operation is a Spring @Service bean named <Action><Entity>UseCase (e.g., CreateProjectUseCase). UseCase beans contain business logic, call repositories, and return domain models. Controllers map HTTP requests to UseCase calls and convert results to DTOs.

Consequences

Clear separation of concerns. Easy to test. One class per operation.

ADR-007: Design System with CSS Custom Properties

Status: Accepted Date: 2026-04-29

Context

TalkIDE needs a consistent visual language across all screens. The design handoff from Claude Design defined a warm charcoal palette with oklch accent colors, specific typography, and reusable component patterns.

Decision

Design tokens are implemented as CSS custom properties (:root level) and registered as Tailwind CSS 4 theme values via @theme. This allows using tokens both as var(--bg-1) in custom styles and as Tailwind classes (bg-bg-1). Shared UI components (TLogo, TAvatar, TPill, TTopBar, etc.) are Vue SFCs using these tokens. Icons use Lucide Vue Next (stroke-based, 24x24 viewBox).

Consequences

Single source of truth for design tokens in main.css
Consistent look across all screens including auth flows
Components are composable and reusable
oklch colors provide perceptually uniform accent palette

ADR-008: Server-Sent Events (SSE) Convention

Status: Accepted Date: 2026-05-01

Context

TalkIDE uses SSE for real-time push notifications in two distinct contexts:

Per-project workspace activity feed (UC-05001) — scoped to a single project
Per-tenant Studio recent activity feed (UC-06001) — aggregated across all projects of a tenant

A unified convention for event names, payload shape, auth, and reconnect strategy prevents divergence between streams.

Decision

Standard SSE Event Names

All TalkIDE SSE streams use the following event names:

Event name	Purpose	Payload
`connected`	Server confirms the stream is established	`{}`
`activity`	A new activity event is available	Serialised DTO — `ActivityDto` (per-project) or `StudioActivityDto` (per-tenant)
`heartbeat`	Keep-alive ping emitted every 30 seconds	`{}`

Authentication

Bearer token must be sent in the Authorization header. The native browser EventSource API does not support custom headers, so TalkIDE uses a fetch-based SSE parser on the frontend for all SSE streams.

Reconnect Strategy

Clients implement exponential backoff reconnect:

Base delay: 1 second
Maximum delay: 30 seconds
Maximum attempts: ~10
On reconnect, the client re-fetches the REST snapshot to fill any gap missed during disconnection.

Endpoints Using This Convention

Stream	Endpoint	UC
Workspace activity (per-project)	`GET /api/v1/projects/{projectId}/activities/stream`	UC-05001
Studio recent activity (per-tenant)	`GET /api/v1/studio/recent-activity/stream`	UC-06001

Consequences

Consistent event names across all current and future SSE streams
FE SSE parser is shared infrastructure (no stream-specific client code needed)
Adding a new SSE stream requires only a new endpoint and DTO; the convention stays stable

ADR-009: Server-side Activity Feed Deduplication

Status: Accepted Date: 2026-05-01

Context

The activity feed can contain long consecutive runs of identical tool calls — for example, an agent reading 20 files in a row produces 20 TOOL_USE rows with tool_name=Read. Displaying all of them unfiltered would flood the UI with noise and make it hard to follow what is actually happening.

An early prototype placed the dedup logic in the frontend (Pinia computed property). This created a split brain situation: the business rule “what counts as a duplicate” lived in the UI layer, the backend stored raw events without any concept of categories, and the Studio cross-tenant feed and the per-project workspace feed each needed their own separate dedup implementation.

Decision

Deduplication is performed entirely on the backend:

ToolCategory enum — a new business-level categorisation of tool names with values READING | EDITING | EXECUTING | DELEGATING | BROWSING | OTHER. Stored as tool_category (VARCHAR column) on the activities table. Set at insert time by RecordActivityUseCase.recordToolUse via a deterministic mapping from tool_name. Non-tool events (TASK_STARTED, TASK_COMPLETED, AGENT_MESSAGE) store null.
Consecutive coalesce — the repository query uses a window function LAG() over (id DESC) to detect “gaps and islands”: consecutive rows sharing the same (tool_category, agent_role, parent_activity_id) key are collapsed into a single representative row (the first one in the run). This is the standard SQL “gaps and islands” pattern.
First-page only — dedup is applied only when no afterId cursor is provided (i.e. the initial/first page of results). “Load more” requests (with afterId) receive raw, undeduped events so that the full historical record remains accessible.
DTO exposure — ActivityResponse (workspace) and StudioActivityDto (Studio) both expose the toolCategory field so the frontend can render a human-readable category label (e.g. “Reading files”) instead of a raw tool name.
FE = pure renderer — the frontend applies no own dedup logic. It renders whatever the backend returns.

Consequences

Single source of truth: the dedup rule (consecutive same-category / same-agent / same-parent → one row) is defined and enforced exactly once, in the backend repository query.
Consistency: both the per-project Workspace feed (UC-05001) and the cross-tenant Studio feed (UC-06001) use the same dedup logic automatically.
Default page limit 10 (Studio) guarantees approximately 10 visible rows on the initial snapshot; exact count depends on how varied the incoming event stream is.
FE simplification: the frontend no longer needs computed dedup properties in the store; it is a pure rendering layer.
DB requirement: the LAG() window function requires a SQL engine that supports it. PostgreSQL 9.0+ supports it natively. H2 (used in tests) supports window functions since version 1.4 (which is the default in the Spring Boot BOM); no test-specific workaround is needed.

Alternatives Considered

Alternative	Reason rejected
FE-side dedup (Pinia computed)	Business rule in the UI layer; duplicated logic across workspace and Studio feeds; harder to test
BE dedup on raw `tool_name` (no category)	`Read` and `Glob` are both “reading” — grouping by raw name would miss same-function tools; also more fragile to tool name changes
Additional DB columns for a pre-shortened list	Over-engineering; adds write-time complexity without significant read-time benefit over the window function approach

ADR-010: Auto-save with Optimistic Update for Preference Toggles

Status: Accepted Date: 2026-05-01

Context

The Sound Preferences section (UC-01007) and the Language picker in Account (UC-01006) contain low-stakes settings that users expect to take effect immediately — an explicit “Save” button adds friction without benefit. At the same time, the app must remain consistent if the API call fails.

Decision

Auto-save on change: every toggle flip or dropdown selection fires PUT /me/sound-preferences (or PUT /me/locale) immediately, without a save button.
Debounce for slider: the master volume range slider debounces API calls by 300 ms to avoid flooding the backend while the user drags.
Optimistic update: FE applies the change to local state before the API response arrives. If the call returns a non-2xx status, FE rolls back the local state to the previous value and displays an error toast ("Could not save. Please try again.").
Granular endpoints (/me/locale and /me/sound-preferences instead of a single /me/preferences): locale and sound settings are changed from different UI sections with different save triggers. A unified endpoint would force full-object partial-update validation and complicate rollback (which field failed?). Separate endpoints keep validation and rollback scoped to exactly what changed.

Consequences

Snappy UX — changes feel instant; no form submission ceremony.
Each auto-save endpoint is idempotent (PUT with full preference object); safe to retry.
FE must maintain “previous value” snapshot for rollback; implemented as a ref captured before the optimistic update.
No conflict resolution needed in MVP (single session per user assumed).

ADR-011: Per-Project Plugin & Structured Config

Status: Accepted Date: 2026-05-03

Context

Původní architektura sdílela jeden globální Mařin tým plugin (talkide-be/plugin/) napříč všemi generovanými projekty. Per-projektovou konfiguraci (porty, URL) Mara dostávala jako text v CLAUDE.md — typu “BE port: 8097, neuhybej na 9090”.

V praxi tato textová instrukce selhala: Mařin devops agent na příkaz kill-port.sh 9090 sundal samotnou TalkIDE platformu (BE běžící na 9090), ve které právě Mara pracovala. Textová instrukce neměla dostatečnou autoritu vůči shell příkazu (incident SIGKILL pekarna-u-jelena vs. TalkIDE).

Druhý problém: globální plugin nešlo per-projektově upravit (např. uživatel chce pro svůj projekt jiný tón komunikace Mary). Jakákoli změna ovlivňovala všechny projekty.

Decision

Přechod na per-project plugin & strukturovaný config:

<project>/.talkide/plugin/ — rsync kopie globálního pluginu, obnovuje se při create projektu (eager) a před každým spawn Claude CLI (lazy guard, idempotent). Plugin se předává Claude CLI přes --plugin-dir <project>/.talkide/plugin.
<project>/.talkide/project.yml — strukturovaný YAML s identitou projektu, tech stackem, URLs a porty. Plugin skripty čtou porty výhradně z tohoto souboru přes yq — nikdy z argumentů ani z textu CLAUDE.md.
Porty deterministicky odvozené z project.id: BE = 8090 + id, FE = 5200 + id. DB sequence pro id startuje na 1, takže BE port ≥ 8091, FE ≥ 5201 — kolize s platformními porty 9090/5200 je strukturálně nemožná.
.talkide/team/ — placeholder adresář pro budoucí per-project user overrides agentů (MVP: prázdný .gitkeep, žádná logika).
CLAUDE.md — drop konkrétních hodnot portů. Místo toho statický pointer “See .talkide/project.yml for ports, URLs, and configuration.”
.project-config.yml deprecated — sloučen do .talkide/project.yml (privacy boundary zachována: BE-only sidecar, gitignored, MAY contain sensitive data).

Detailní spec: per-project-architecture.md.

Consequences

Strukturální bezpečnost: kolize s platformními porty fyzicky nemožná na úrovni datového modelu, ne až na úrovni textové instrukce agentovi.
Per-projekt izolace: každý projekt má svou kopii pluginu — připraveno pro budoucí per-project overrides agentů (.talkide/team/).
Bug fixy v pluginu se propagují automaticky — lazy guard rsync přepíše per-project kopie při dalším spawn každého projektu.
Větší disková stopa: každý projekt nese kopii pluginu (~stovky kB). Akceptovatelné při očekávaném počtu projektů na uživatele (jednotky až desítky).
Závislost na yq v plugin skriptech — povinný system tool, žádný fallback. Skript exit s chybou pokud chybí.
Migrace existujících projektů: žádná. Stávající output-projects/* se wipnou (nikdo nepoužívá v produkci).

Pokračování v `adr/`

ADRy 012+ žijí jako samostatné soubory v documentation/adr/. Pro orientaci klíčové novější ADRy a jejich vztah k inline ADRům výše:

ADR	Téma	Vztah
ADR-013	Git versioning per projekt	Specializuje ADR-003 (project versioning) na konkrétní Git layout
ADR-014	fabric8 K8s client v BE	Foundation pro vše níže (ADR-015, 017, 019, 024)
ADR-015	Namespace-per-tenant-env	Substrát pro worker extrakci a hosting (viz hosting-architecture.md)
ADR-019	Kaniko Job pattern	Rozšíření na gradle/test Joby plánováno v ADR-024
ADR-021	Dynamic ingress per projekt	Implementuje `<slug>.talkide.app` routing
ADR-022	Publish workflow (DRAFT → LIVE)	Definuje stack životního cyklu projektu — dev preview vs. published prod, FE flow viz flows/version-flow.md
ADR-023	Schema-per-app data-plane	Mění ADR-001 sémantiku pro user-app DB — viz aktualizovaný ADR-001 výše a hosting-architecture.md
ADR-024	`talkide-worker` extrakce	Mění ADR-002 sémantiku pro production — viz aktualizovaný ADR-002 výše, worker-runtime.md, worker-production.md
ADR-025	Mailgun transactional email	Provider abstraction + audit log + spec viz transactional-email.md
ADR-026	Environment first-class	DEFAULT + USER_CREATED prostředí; každý projekt patří do prostředí; tenant-env = jednotka K8s namespace + ResourceQuota + billing (UC-10010–UC-10015)

Budoucí ADRy přidávejte do adr/ pro lepší granularitu. Pokud novější ADR mění některý z inline ADR-001..011, aktualizuj inline ADR o “Superseded by” odkaz a stručný update — neboř starší obsah, drž bidirectional traceability.

Was this page helpful?

Thanks for the feedback.

ADR-001: Multi-Tenant Architecture

Context

Decision

Consequences

ADR-002: Claude Agent Runtime for Vibecoding

Context

Decision (current)

Consequences

ADR-003: Project Versioning Model

Context

Decision

Consequences

ADR-004: JWT Authentication with Email/Password

Context

Decision

Consequences

ADR-005: Feature-Based Package Structure

Context

Decision

Consequences

ADR-006: UseCase Bean Pattern

Context

Decision

Consequences

ADR-007: Design System with CSS Custom Properties

Context

Decision

Consequences

ADR-008: Server-Sent Events (SSE) Convention

Context

Decision

Standard SSE Event Names

Authentication

Reconnect Strategy

Endpoints Using This Convention

Consequences

ADR-009: Server-side Activity Feed Deduplication

Context

Decision

Consequences

Alternatives Considered

ADR-010: Auto-save with Optimistic Update for Preference Toggles

Context

Decision

Consequences

ADR-011: Per-Project Plugin & Structured Config

Context

Decision

Consequences

Pokračování v adr/

Pokračování v `adr/`