← Back to AI Hub

🎬 Using Wyltek Studio

A local-first creative suite for image, audio, and video generation — everything running on your own GPU, no API keys for the core workflows. This guide covers install, configuration, and the small number of concepts that separate "it's generating garbage" from "that looks like a finished asset."

Contents

1. What Wyltek Studio is (and isn't)
2. Quick-start install
3. Feature map
4. Image generation — choosing a model
5. Prompt craft & the Prompt Optimizer
6. LoRAs — styling your output
7. Projects — assembling outputs into timelines
8. Claude Code integration

1. What Wyltek Studio is (and isn't)

Wyltek Studio is a local generative-AI creative suite built around ComfyUI and a small set of purpose-built studio pages. The single Python server coordinates:

Image generation — 15+ models (SDXL, Flux dev/schnell, SD 1.5, and full-res photoreal/artistic checkpoints) with LoRA support and compare mode
Prompt enhancement — local Ollama models transform crude prompts into detailed SDXL-ready input
Studio tools — frame extraction, background removal, SAM click-to-segment, TTS, voice cloning, music generation, and video via AnimateDiff
Projects — shared timeline/asset system so you can assemble generations across tools
Auto-scoring — CLIP aesthetic scores on every image so you can sort by quality

It isn't a cloud SaaS, and it isn't a thin wrapper around DALL-E or Midjourney. Every generative step runs on your hardware. The trade-off is you need a capable GPU (8 GB+ VRAM minimum for SDXL) and a few gigabytes of disk for checkpoints.

2. Quick-start install

git clone https://github.com/toastmanAu/wyltek-studio ~/open-palette
cd ~/open-palette
./install.sh

The install script walks you through:

Python venv + dependencies
ComfyUI install (if not already present)
Ollama detection + default model pull (optional — only if you want the Prompt Optimizer)
Baseline model catalog — SDXL base, Flux Schnell, LoRAs, VAEs

Then launch:

cd ~/open-palette
./run.sh    # or: python server.py

Studio opens at http://localhost:7860. Config lives at config.yaml in the project root.

Hardware sanity check. SDXL at 1024×1024 fits comfortably on 8 GB cards (RTX 3060 Ti, RTX 2070, Arc A770). Flux dev and FLUX.2-klein want 16 GB+. If you're on a CPU-only box, stick to SD 1.5 models and expect 1–3 minute gens per image.

3. Feature map

Image Generator

Main page. Prompt → image, with LoRA, IP-Adapter, reference images, upscaler.

Compare mode renders the same prompt through two models side-by-side.

Prompt Optimizer

"OP my prompt" button next to every prompt field. Uses a local Ollama LLM to enrich your prompt and suggest a negative prompt.

Per-prompt model override via dropdown.

Image Tools

rembg background removal (5 models), SAM click-to-segment, manual mask tools (brush / rect / ellipse / lasso).

Frame Cutter

Extract PNG frames from any video file. Frame-by-frame scrubber. Send directly to Image Tools.

TTS Studio

Piper (fast CPU), Kokoro (expressive), XTTS v2 (voice cloning), Bark (emotions).

Music Studio

MusicGen (text-to-music), beat builder, sample packs.

Video Studio

AnimateDiff Lightning (fast text-to-video), Stable Video Diffusion, timeline compositor.

Sprite Tools

SDXL + pixel-art LoRA batch sprite generation. Auto-tile for game assets.

Projects

Shared asset + timeline system. Any output from any tool lands here with metadata for later assembly.

4. Image generation — choosing a model

Wyltek Studio's model catalog (Settings → Models) lists what's available and what's installed. Pick based on intent:

Intent	Model recommendation	Why
Photoreal portraits / architecture	`RealVisXL V5.0`	Heavily finetuned photo SDXL. Excellent skin tones, interiors, landscapes.
Versatile "all-rounder"	`juggernautXL v9`	Photoreal-leaning but comfortable with illustration. Good with LoRAs.
Illustration / fantasy	`DreamShaper XL Turbo`	Only needs 6–10 steps (Turbo distilled). Great for concept art iteration.
Neutral "canvas" for LoRA testing	`SDXL base 1.0 (Q4)`	Doesn't fight LoRAs the way heavily-finetuned bases do. Best for evaluating new LoRAs.
State-of-the-art quality	`Flux dev` or `FLUX.2-klein`	Best prompt following and coherence. Slow + heavy (16 GB VRAM class).
Fast iteration	`Flux Schnell` or SDXL + `Lightning LoRA`	4-step distilled. Draft-quality previews in seconds.

Compare mode is the fastest way to settle arguments about which model suits a prompt. Type your prompt, pick two models, hit Compare — Wyltek Studio renders both with identical seed and lets you score them side-by-side.

5. Prompt craft & the Prompt Optimizer

The single biggest lever on image quality isn't the model — it's the prompt. "A cat" gives SDXL nothing; "A detailed, whimsical cat character, vibrant colors, professional illustration, trending on ArtStation" gives it a direction. The Prompt Optimizer (OP my prompt) does that transformation via a local Ollama LLM.

The dropdown next to the OP button lets you swap Ollama models per call. Your selection persists in localStorage across sessions; the server default (configured in Settings → Prompt Optimizer) is the fallback.

Full deep-dive: see LoRAs, Prompt Optimizer & Style Control for benchmarks of every Ollama model we tested, which to use as a daily driver, and when to reach for the MoE.

6. LoRAs — styling your output

Style LoRAs transform the aesthetic of your output — voxel, pixel art, watercolour, crayon, etc. Wyltek Studio ships with six style LoRAs preconfigured. Three things matter when using them:

Pick one that matches your intent — voxel for game-ready 3D, crayon for hand-drawn illustration, pixel-art for retro-game sprites, and so on.
Trigger words — each LoRA has activator tokens. "voxel art" for voxel-xl, "crayon drawing" for crayon-style-xl. Without them, the LoRA barely fires. Wyltek Studio auto-injects these for you when your prompt is bare.
Strength — model strength ~0.8–1.2 and clip strength ~0.5–0.9 are the usable range. Below 0.7 you get a faint tint; above 1.2 images burn.

We benchmarked per-LoRA sweet spots and documented three production-ready presets (voxel game character, crayon illustration, watercolour) in LoRAs, Prompt Optimizer & Style Control. That guide also includes the scripts you can run to empirically derive your own presets from the LoRAs you install.

7. Projects — assembling outputs into timelines

Every image, audio clip, and video frame you generate lands in Wyltek Studio's Projects system. Each project has a shared assets folder and a timeline where you arrange generations in sequence for export. The flow:

Create or pick a project from the Projects page.
Generate content in any studio tool — outputs auto-save into the project.
Drag assets onto the timeline, adjust durations, compose audio / video / image layers.
Render the composed timeline as a video file.

This is what separates Wyltek Studio from a loose collection of AI tools — you can iterate on a music track, generate matching frames, run TTS for narration, and assemble them all in one place without leaving the app.

8. Claude Code integration

Wyltek Studio ships several skills for Claude Code that treat the studio as a callable service. Relevant ones:

/cut-subject — background-removal skill that wraps Wyltek Studio's rembg + SAM endpoints. Run /cut-subject ~/photos/x.jpg and Claude picks the right model, calls Wyltek Studio, and returns the transparent PNG.
/ingest-doc — extracts text / frames / audio from any document and pushes it through Wyltek Studio's processing pipeline.
/quote-meme — quote-card meme generator using Wyltek Studio's image tools.

See the Skills guide for the full list.

Further reading

LoRAs, Prompt Optimizer & Style Control — the deep dive into image-gen quality
Local LLM Inference Setup — picking Ollama models for your hardware
Wyltek Studio source on GitHub