Why I Built Julie

December 22, 2024 — January 2025 · ~2 weeks

I thought Cluely was a solid idea early on, but over time it felt like it drifted toward pricing, friction, and controversy instead of pure usefulness. That got me wondering: could I build the same core idea, but stripped down, open source, and focused purely on giving people an unfair productivity advantage?

OpenAI has a desktop GPT app now, which is great, but it doesn't really solve the "don't break my flow" problem. You still end up switching tabs, dragging things around, or copy-pasting context. Every context switch costs you 20+ minutes of deep focus. It's not bad, but it's not invisible either.

So I built Julie over a weekend — mostly for fun and to see if I actually could. The result is something that genuinely feels like a superpower. Julie lives on top of your workspace, sees what you see, listens when you want it to, and responds without forcing you to context-switch. No cheating angle, no gimmicks. Just a pure productivity multiplier that keeps you in the same mental lane while you work.

Unlike traditional AI assistants, Julie operates as a ghost in your machine — appearing only when needed and disappearing while executing tasks. The interface is intentionally minimal. I'm trying hard not to turn it into a full chat app with tabs, settings, and feeds.

The Philosophy

The goal has always been the same: don't break my flow. Every time you alt-tab, open a new window, or copy-paste context, you're paying a tax on your focus. Julie eliminates that tax entirely. It helps for 20 seconds and disappears — not a second life you have to manage. Most assistants expect you to paste context and explain yourself. Julie flips that. Your screen is the context.

The result is an unfair advantage. While others are managing multiple windows, explaining their context to ChatGPT, and waiting for responses, you're already done. The AI just works — instantly, silently, and without breaking your train of thought. That's the multiplier effect I was chasing.

This isn't meant to replace anything or start drama. I mostly wanted to prove that this kind of assistant can exist without paywalls, subscriptions, or hype. If it's useful to others, that's a win. It's fully open source and costs $0.

It's opt-in for permissions (screen + accessibility/automation) and meant to be used with you watching, not silently running. Content protection keeps Julie invisible from screen recordings — she's truly a ghost.

Core Capabilities

General AI Assistant: hears what you hear, sees what you see, gives you real-time answers for any question instantly
Writing Agent: draft and rewrite in your voice, then iterate while staying in the overlay (no new workspace)
Coding Agent: implement and refactor with multi-step edits, while your editor stays the source of truth
Computer-Use Agent: take the next step (click, type, navigate) instead of just telling you what to do

System Control

Terminal Execution: run arbitrary shell commands with full stdout/stderr capture, sandboxed via Node.js child_process with configurable timeouts
Browser Automation: full Puppeteer control: navigate, click, type, scroll, read pages, execute scripts. Connects to existing Chrome sessions (preserving logins) or spawns fresh instances
Keyboard Automation: direct keystroke injection via AppleScript into any focused application, with multiline support and special character handling
Computer Vision & Mouse: full-screen capture via Electron's desktopCapturer, plus click, double-click, right-click, and scroll at any coordinates
Application Management: list running processes, switch focus, launch apps by name
Voice Input: real-time transcription via Groq's Whisper Large V3 Turbo with automatic silence detection

Demo Workflows

I built several demo workflows to showcase what Julie can do. These are fully reproducible, deterministic examples you can run on your own machine.

The Builder — creates a project folder on Desktop, writes a Three.js spinning cube HTML file, and opens it in browser
The Ghost Writer — navigates to OnlineGDB, selects all code, deletes it, and types a Fibonacci implementation while Julie hides during execution
The Researcher — goes to X.com, finds latest posts from a profile, reads page content via DOM extraction, and returns a parsed summary
Twitter Engagement — opens timeline, clicks like buttons, opens reply modal, types and submits comments
The Journal — opens macOS Notes, types a reminder, opens Calendar, creates a new event, then reappears with confirmation

Watch the Demos

The Builder — creating a Three.js project from scratch

Under the Hood

Julie runs on a multi-process Electron architecture with strict separation between the renderer (React 18) and main process (Node.js). The main process handles all system-level operations — spawning child processes for shell execution, managing Puppeteer browser instances, and dispatching AppleScript commands through osascript. The renderer stays lightweight, focused purely on UI state and message rendering.

IPC channels are typed and explicitly defined: call-agent for inference requests, run-command for terminal execution, trigger-browser-action for Puppeteer dispatch, trigger-keyboard-action for keystroke injection, and trigger-computer-action for mouse/screen control. Each handler is isolated — a failed browser action doesn't crash terminal execution.

Memory management uses a sliding-window context compression algorithm. Token count is estimated at ~4 characters per token, with an 8K context budget reserving 2K for response generation. When the window fills, older messages are dropped FIFO while preserving the system prompt. This prevents unbounded memory growth during long sessions.

Browser automation uses connection pooling — Puppeteer connects to an existing Chrome DevTools Protocol session on port 9222 when available (preserving cookies/auth state), or spawns a fresh instance with a persistent user data directory. Connections are held open across actions to avoid the ~2s cold-start penalty per navigation.

Shell commands execute in sandboxed child processes via Node's child_process.exec() with configurable timeouts. Stdout and stderr are captured and streamed back to the conversation context. Path resolution uses os.homedir() for cross-user portability. AppleScript injection runs through temporary file execution to handle multiline scripts and special character escaping.

Screen capture leverages Electron's desktopCapturer for full-resolution frame grabs. Mouse events are dispatched via CGEventPost for precise coordinate targeting. The overlay window uses setContentProtection(true) to exclude itself from screen recordings and captures — Julie stays invisible in your own recordings.

Voice input implements debounced silence detection — audio buffers are analyzed for amplitude thresholds before triggering transcription, reducing unnecessary inference calls. Messages stream incrementally to the UI using async iterators, keeping the interface responsive during long-running operations.

Local Intelligence

Julie now runs fully offline using local models, optimized specifically for M4 silicon. For text reasoning, I settled on Qwen3 8B, and for vision capabilities, Qwen3-VL 4B.

The choice of these specific parameter sizes is intentional. On an M4 Mac, they sit in the sweet spot — small enough to reside entirely in high-bandwidth memory without choking the system, but large enough to maintain coherent reasoning. This allows Julie to run comfortably in the background, leaving the main system resources free for your actual work.

To make this viable for real-time interaction, I leaned heavily into inference acceleration. The engine uses Metal Performance Shaders (MPS) to offload matrix operations directly to the GPU, paired with aggressive 4-bit quantization. This combination keeps token generation snappy and thermal impact minimal, ensuring that "local" doesn't mean "slow" or "hot".

What's Next

I'm still iterating on v1.0 and curious about two things: What safety and UX patterns actually feel acceptable for daily computer-use agents? And what's the one workflow you'd want this to do end-to-end without context switching?

Headless Browser Mode — move browser automation to fully async, headless execution. This means Puppeteer runs in the background without stealing focus, enabling true parallel task execution while you keep working. The goal is zero visual interruption during browser workflows.
Deterministic Execution — make workflows more reproducible by reducing reliance on timing heuristics and flaky selectors. I want to implement proper wait conditions, retry logic with exponential backoff, and DOM-based action verification so the same workflow produces consistent results every time.
Context Compression — implement smarter context windowing using summarization and semantic chunking. Instead of FIFO message dropping, compress older context into summaries that preserve key facts. This extends effective memory while staying within token budgets.

This is still a minimal version. Let me know your thoughts — I'd love to hear what works and what doesn't. The repo and installers are up on GitHub if you want to try it or poke holes in it.

Built with Electron 28+, React 18, TypeScript, Vite, Groq API (Llama 3.3), Whisper Large V3 Turbo, Puppeteer, and CSS Variables.

Julie: The Unfair Advantage For Productivity