A practical guide to getting sharp, intentional output from Wyltek Studio's image generator โ prompt enhancement via local LLMs, style LoRAs, strength settings, and the trigger words that make them click.
The most common reason an image comes out disappointing isn't the model โ it's the prompt. Four words of crude human input ("a pokemon style cat") gives Stable Diffusion nothing to anchor on. What you actually want is something like:
A highly detailed, adorable Pokรฉmon-style cat character design,
vibrant colors, professional character sheet illustration,
dynamic pose, volumetric lighting, clean vector art style,
trending on ArtStation.
The Prompt Optimizer (reachable via the "OP my prompt" button next to every prompt field) does that transformation for you โ a local Ollama LLM enhances your shorthand into a prompt that actually tells SDXL what you meant.
The enhancer also suggests a negative prompt โ descriptors to avoid โ which usually improves output more than people expect. That's auto-populated into the Negative prompt field when OP fires.
Wyltek Studio ships with a per-prompt model dropdown next to the OP button, plus a global default you set in Settings. You can switch any time โ different models produce genuinely different prompt styles, and different subjects benefit from different styles.
We benchmarked every model we had on disk (3Bโ35B parameters) on CPU-only inference, warm cache, using the server's default system prompt. Here's what held up:
| Model | Size | Typical latency (CPU) | Verdict |
|---|---|---|---|
gemma3n:e4b |
~7B Q4 | ~10 s | Speed champ. Terse 20-word enhancements, well-chosen style tokens. |
gemma4:latest |
~8B Q4 | ~10 s | Recommended default. Rich descriptive vocabulary ("volumetric lighting", "trending on ArtStation"), stable across creative subjects. |
qwen2.5:14b |
14B Q4 | ~40 s | Consistent, wordy (40+ word outputs). Good fallback if gemma4 misses the subject. |
qwen3.5:35b-a3b |
35B MoE | 30โ220 s | Highest-quality prompts when it behaves, but latency explodes on creative subjects (reasoning-mode engages unpredictably). Keep for tricky prompts, not daily driver. |
qwen3:8b, qwen3:14b |
8B / 14B | 30โ120 s | Reasonable, but variance is high. qwen3:14b consistently runs 2ร slower than qwen2.5:14b at the same size. |
qwen3.5:9b, carnice-9b |
9B | 200 s+ or timeout | Skip. Both exhibit runaway thinking-mode or don't produce valid JSON output. |
Our server passes num_gpu: -1 to Ollama (auto-offload all GPU layers that fit). Historically this was set to 0 to avoid VRAM contention with the image generator. On hardware without a GPU-capable Ollama daemon, num_gpu: -1 is still ~3ร faster than 0 because it unblocks Ollama's normal multi-threaded CPU path. The 0 value forced a strict fallback path that left performance on the floor.
gemma4:latest. Fast enough to use on every gen.qwen3.5:35b-a3b for one call, accept the 30โ200 s wait.gemma3n:e4b โ smallest model, tightest output.A LoRA (Low-Rank Adaptation) is a small fine-tune โ typically 40โ500 MB โ that nudges a base diffusion model toward a specific style without retraining the whole thing. Wyltek Studio ships with six style LoRAs plus two speed LoRAs:
| LoRA | Type | What it does |
|---|---|---|
pixel-art-xl |
Style | Pixel / 8-bit / retro-game art |
voxel-xl |
Style | Minecraft-style blocky 3D geometry |
crayon-style-xl |
Style | Crayon / wax drawing aesthetic |
watercolor-xl |
Style | Soft watercolour painting |
sticker-style-xl |
Style | Die-cut vinyl sticker look |
anime-detailer-xl |
Detailer | Sharpens anime features. Needs an anime subject + "anime" in the prompt to engage. |
sdxl_lightning_4step / 8step |
Speed | Distillation LoRAs. Let you run SDXL at 4โ8 steps instead of 25โ30. Pair with any style LoRA if you have the code to stack them. |
Picking a LoRA tells the model what style you want. But there are two hidden controls that make-or-break the result: trigger words and strength.
Every style LoRA was trained with specific activator tokens in its training captions. If those tokens don't appear in your prompt, the LoRA barely fires. You'll get a tint toward the style rather than the style itself.
| LoRA | Trigger word(s) |
|---|---|
pixel-art-xl | pixel art, pixel-art, 8-bit |
voxel-xl | voxel art, voxel |
crayon-style-xl | crayon drawing, crayon, wax crayon |
watercolor-xl | watercolor painting, watercolor, watercolour |
sticker-style-xl | die-cut sticker, sticker |
anime-detailer-xl | anime, anime style |
original_prompt + trigger_injected so you always know what was added. This is the "hybrid" strategy โ auto-help for bare prompts, hands-off for deliberate ones.
In a 48-image strength ร trigger sweep, the same LoRA at the same strength consistently produced dramatically more recognisable style when its trigger word was present. Voxel-xl is the clearest example:
No amount of strength alone produces the trigger effect. Strength and triggers are multiplicative โ you need both.
Each LoRA exposes two strength knobs:
Conventional wisdom: model strength should be slightly higher than CLIP strength. Style LoRAs want model โ 0.8โ1.2 with clip โ 0.5โ0.9. Below 0.7 on model, the style is a faint tint. Above 1.2, most LoRAs start to oversaturate or fry details.
These are our own results running each LoRA on a pokemon-style-cat subject, juggernautXL_v9 base, gemma4-enhanced prompt, with the trigger word appended. "Sweet spot" means where the LoRA first produces its full aesthetic without visible burn:
| LoRA | Sweet spot (model/clip) | Notes |
|---|---|---|
voxel-xl |
1.2 / 0.9 +trigger | Needs max strength + trigger to produce actual blocky 3D art. Perfect for game-ready character assets. |
pixel-art-xl |
1.0 / 0.7 +trigger | Needs 1.0+ to override base model's photoreal tendency. At 0.8, you get hints of pixel style; at 1.0+, full 8-bit. |
crayon-style-xl |
1.0 / 0.7 +trigger | Robust โ works well at all strengths. At 1.0+ with trigger it looks like elite hand-drawn crayon illustration. Usable at 0.6+ for subtle styling. |
watercolor-xl |
0.8โ1.0 / 0.5โ0.7 +trigger | Charming at lower strengths with trigger. Burns to muddy at 1.2. |
sticker-style-xl |
1.0 / 0.7 +trigger | Needs trigger + standard strength to produce the die-cut vinyl look. |
anime-detailer-xl |
Only useful on anime subjects | A detailer, not a style transformer. On a pokemon cat it's largely a no-op regardless of strength. Use with anime-native prompts. |
Three combinations that came out of our testing as ready-to-use. Each is a specific {base, LoRA, strength_model, strength_clip, trigger} tuple that reliably produces its named aesthetic:
juggernautXL_v9.safetensorsvoxel-xl, voxel art to promptjuggernautXL_v9.safetensorscrayon-style-xl, crayon drawing to promptjuggernautXL_v9.safetensorswatercolor-xl, watercolor painting to promptEvery claim in this guide was produced by two Python scripts that ship with Wyltek Studio. Running them on your own setup lets you verify the conclusions against your actual LoRAs, GPU, and Ollama models:
Measures cold-load and warm inference latency for each Ollama model you have installed. Useful for picking a daily-driver OP model.
cd ~/open-palette
python scripts/op_latency_bench.py gemma4:latest qwen3.5:35b-a3b \
--trials 2 --num-gpu -1
Enhances the same prompt with every OP model you have, renders it against each style LoRA. Outputs a gallery you can scroll side-by-side. ~30 minutes for a full matrix on consumer GPU.
python scripts/lora_matrix_test.py "a pokemon style cat"
python scripts/matrix_gallery.py # renders HTML gallery of latest run
Holds the prompt and OP model fixed, varies LoRA strength (4 tiers) and trigger presence (A/B). Answers "what strength should I run this LoRA at?" empirically. ~10 minutes for 48 images.
python scripts/lora_strength_sweep.py "a pokemon style cat" --with-trigger-ab
python scripts/matrix_gallery.py
Each script writes a browseable HTML gallery with hover tooltips showing the exact prompt, LoRA, and strength for every image. Compare rows/columns to find your own sweet spots.