dskjal
広告
広告

Z Image Turbo を ComfyUI で実行する

カテゴリ:deeplearning

Z Image Turbo はベースモデルではなく蒸留モデル(DMD)であることに注意。

軽量モデルの中ではダントツの基本性能を持っており、SDXL 以降のモデルをファインチューンして使うなら Z Image の Turbo ではないベースモデル一択だろう。

Automatic1111 互換の forge-neo でも実行できる。

Qwen Image Edit や FLUX.2 は編集も生成もできるが、Z Image はそうではない。

目次

特徴

公式モデルなので当然 NSFW やキャラ、アーティストは認識しない。

欠点

必要スペック

モデル+テキストエンコーダー+VAE で 21 GB なので RAM 32 GB で足りる。

最新の ComfyUI は VRAM の量は重要ではなくなっている。なぜなら RAM にモデルをロードして、必要な分を VRAM に転送して処理するようなアルゴリズムになったからだ。画像生成 AI は演算ボトルネックなので、RAM からモデルを転送しつつ推論しても生成速度はほとんど低下しない。

ComfyUI のバージョンは 0.3.75。

モデル

配置場所モデル
models/unetz_image_turbo_bf16.safetensors
models/text_encodersqwen_3_4b.safetensors
models/vaeae.safetensors

ワークフローは ComfyUI_examples の画像を ComfyUI にドラッグする。

RTX 4000 以降を使用している場合、fp8_scaled を使うと推論速度が 50% 以上高速になるのでおすすめ。RAM が少ない環境なら GGUF や DFloat11 が選択肢に入る。DFloat11 は bf16 をロスレス圧縮し、モデルのメモリ使用量を 30% 削減する。VRAM 12 GB に Z Image のモデルをすべて載せられる。

量子化URL
GGUFjayn7/Z-Image-Turbo-GGUF
fp8_scaledKijai/Z-Image_comfy_fp8_scaled
fp8_scaledZ-image Turbo [TensorCoreFP8]
DFloat11z_image_turbo_bf16-DF11.safetensors
SVDQuantDisty0/Z-Image-Turbo-SDNQ-uint4-svd-r32

LoRA

ai-toolkit がサポートしている。ostris/zimage_turbo_training_adapter が必要。

しかし蒸留モデルの Z Image Turbo でファインチューンが上手くいくかわからないので、ベースモデルが公開されるまで待った方がいい。

ai-toolkit は block swap を実装しているので低 VRAM(それでも 12 GB はあった方がいい)でも学習は可能。

クラウドでの学習

しかしクラウドなら1回 1,000 円未満で LoRA が作成できるので、LoRA を量産するのでない限り GPU を買い替える必要はない。クラウドは VRAM の量が多く、より高解像度の画像で学習できるので品質も良くなる。

実行時間

環境

共通設定

実行時間

解像度推論速度
(s/it)
RAM 使用量の
ピーク(GB)
1,024 x 1,0245.230
1,536 x 1,0247.230
1,536 x 1,53610.730
2,048 x 1,53616.130

高速生成

4ステップ生成は細部の甘さが目立つが、プロンプトの検証では役に立つ。上記の施策を適用した RTX 5090 は数秒で 2k の画像を生成する。

プロンプトガイド

否定形を使わない

描いてほしくないものはネガティブプロンプトに入力する。ただし推論時間が2倍になる。なお Turbo のような蒸留モデルはネガティブプロンプトと相性が悪い。

否定形を使う方法

あいまいな形容詞を避ける

✖: beautiful girl

〇:A girl with flowing silver hair that catches the moonlight

記述した方がいい項目

項目
位置関係standing at the top of stairs, sitting under the tree
視線looking down at viewer, gazing upward at the sky
感情with a confident smile, mysterious expression
空気感in a dreamlike atmosphere, surrounded by magical sparkles
質感silk-like hair, crystalline dress
動きhair swaying in the breeze, petals falling around her

構造化する

markdown で構造化するとプロンプトが無視される確率が下がる。

悪い例
She says "セリフ1" in a speech bubble in the left top.
She says "セリフ2" in a speech bubble in the left middle.
She says "セリフ3" in a speech bubble in the left bottom.
良い例
# speech bubble
There are three speech bubbles.
- speech bubble 1 (left top): "セリフ1"
- speech bubble 2 (left middle): "セリフ2"
- speech bubble 3 (left bottom): "セリフ3"

その他のプロンプト例

# person
There are three person.
- person 1 (left): a girl, short red hair, blue eyes, sitting on a stool
- person 2 (middle): a boy, short brown hair, grey eyes, standing, v sign
- person 3 (right): a girl, long silver hair, red eyes, sitting on a sofa, holding a card with the word "test" written on it.
# background
In the luxury interior.
a king size bed, a disheveled sheets, a disheveled satin dress, a lamp on a bedside table, a pillow and a large window with a view from a high-rise building in the background.
# background
## left
There is a bookshelf, many piled books and a potted plant.
## middle
There is a window, curtains and a bed.
## right
There is a book and a cup on a desk.
# style
Highly detailed illustration style blending anime aesthetics with painterly realism.

作例

作例は以下のリンクを参照。


There are three girls in a room.

The girl on the left has short red hair and blue eyes. She is sitting on a stool holding a card with the word "左" written on it.

The girl in the middle has long silver hair and red eyes. She is standing and holding a card with the word "中" written on it in both hands.

The girl on the right has medium brown hair and green eyes. She is sitting on a stool and holding a card with the word "右" written on it.

There are potted plants and a kitchen in the background.

Anime style.

プロンプトの指示をすべて満たし、画質もよく、手が溶けておらず、背景も破綻していない。軽量モデルの中ではダントツの基本性能を持っている。

以下のような、markdown + タグの羅列も機能する。

There are three girls in a room.

# person
There are three girls.
- left: short red hair, blue eyes, sitting on a stool, holding a card with the word "左" written on it.
- middle: long silver hair, red eyes, standing, holding a card with the word "中" written on it in both hands.
- right: medium brown hair, green eyes, sitting on a stool, holding a card with the word "右" written on it.

# background
potted plants, a kitchen.

# style
Anime style.
There is a piece of paper with "あいうえおアイウエオ
かきくけこカキクケコ
さしすせそサシスセソ
たちつてとタチツテト
なにぬねのナニヌネノ
はひふへほハヒフヘホ
まみむめもマミムメモ
やゆよヤユヨ
らりるれろラリルレロ
わをんワヲン" written on it.

中国語にある漢字は描けるが、ひらがな・片仮名が描けない。

An elf girl's upper body. She has silver hair and green eyes. She is holding a white board with handwritten "It works!". She wears a red coat, with one eye closed. The background features a snowy night with bokeh. 

Anime style.
Z Image Turbo

Z Image Turbo

Qwen Image Edit 2509

Qwen Image Edit 2509

Anime style.

There are two girls in a room. One is standing, the other is sitting on a stool. The soft lighting creates a humorous atmosphere.

On the left, Kotonoha Aoi, with long blue hair, is standing. She has her arms behind her back, wearing a white dress and knee boots. She is standing.

On the right, Kotonoha Akane, with long red hair, is sitting on a stool with her legs spread and leaning forward. She has her arms stretched out in front of her and her hands on the stool. She is wearing a black dress and knee boots.

In the background are a window, curtains, a table, and a potted plant.

画像としての品質は Qwen Image の方が良いが、プロンプトに忠実なのは Z Image Turbo(legs spread と 'a' potted plant が反映されている)。

構造化版

# person
There are two girls in a room.
- left girl: kotonoha aoi, log blue hair, standing, arms behind back, white dress, knee boots, standing
- right girl: kotonoha akane, long red hair, sitting on a stool, spread legs, leaning forward, arms stretched out in front of her, hands on the stool, black dress, knee boots

# background
There is a window, curtains, a table and a potted plant.

# style
Anime style. The soft lighting creates a humorous atmosphere.
Z Image Turbo

Z Image Turbo

Qwen Image Q3_K_M

Qwen Image Q3_K_M

The illustration of a chibi girl sitting in a chair eating a piece of pizza. She is saying "This is fine!" in a speech bubble. There is a table and a window in the indoor room.
anime style young woman, slender body, long white hair with bangs and an elaborate frilly headpiece, fair skin, large red eyes, wearing a black and white frilly gothic lolita dress with intricate ruffled details, posing with hands together near her face, adorned with a silver cross earring and a silver ring, against a plain light background, mood is solemn and elegant, soft lighting highlighting the delicate textures of the frills and hair, close-up shot focusing on the character's detailed features and accessories
The game screen capture style.

The Genshin Impact character, Hu Tao(胡桃), is depicted in a lively pose on the right side of the image. She wears a black outfit with red and white accents on the sleeves and skirt lining. Her long, dark brown hair flows in the wind and she wears a distinctive flower-adorned hat. Her right hand is near her face, and in her left she holds a glowing red polearm. A distinctive white ghost floats at her feet.

The background is a rich autumn landscape, with shrubs and trees colored in orange and red, creating depth. The ground is covered with mossy rocks and lush vegetation, and is decorated with a small lantern and plants with pink flowers. A warm light shines throughout, highlighting the character.
This illustration combines pen and ink drawing with watercolor techniques, featuring highly saturated colors.

A girl sits on a chair, smiling as she holds chopsticks.

A vibrant, an abundant tuna and salmon sushi and sashimi platter neatly arranged within a simple, light brown cardboard takeaway box. The subtle texture of the box's matte surface provides a minimalist backdrop, allowing the vibrant colors of the sushi to truly pop. The left side features a meticulously arranged, full tuna and salmon sashimi rose with delicate, orange-pink layers, accompanied by two simple, thick slices of salmon. To the right, a delightful array of individual salmon-wrapped rice balls are generously topped with swirls of creamy white cream cheese and garnished with bright green, finely chopped scallions or chives. Below these, two organized rows of inside-out salmon and cream cheese sushi rolls (uramaki) showcase their fluffy white rice exteriors, heavily speckled with contrasting black sesame seeds, revealing the rich salmon filling in their cleanly cut cross-sections. The scene is illuminated by soft, diffused natural light, creating gentle highlights on the creamy fillings and the glossy salmon, enhancing the appetizing appeal without harsh shadows. The edges of the box are slightly visible, grounding the scene and emphasizing its 'to-go' nature.

The box is placed on a table. There is a soy source in a saucer on the left side of the box.

# Core Feature
- Character Type: A cute female figure in a chibi (Q-style) aesthetic.
- Facial Features: Large red eyes, a small nose and mouth, long, wavy golden hair, and bangs.
- Headwear: A large bow on her head, made of white, gray, and dark blue striped fabric. A blue cross and two red lines are affixed to her forehead.
- Pose: Left hand waving, right hand making a "V" sign.

# Clothing Details
- Overall: Wears a white, multi-layered ruffled dress.
- Neckline and Cuffs: Delicate white lace trim on the neckline and cuffs.
- Chest Accessory: A blue and white striped bow tied at the chest, with a red round gemstone in the center.
- Pendant: Two dark blue ribbons hang from the bow, each ending in a gold cross pendant.
- Other: Gold buttons adorn the dress.

# Background Elements
There is a charactere in the blurry background.
- Environment: Another character is faintly visible in the background. This character has golden hair and black demon/bat-wing-like horns, and wears black clothing.

# Style and Texture
- Style: Japanese anime style, emphasizing cuteness and refinement.
- Materials: The texture of the clothing fabric, the delicacy of the lace, and the luster of the metallic pendants all need to be depicted.
Atmospheric anime illustration of a beautiful young woman with long flowing black hair standing on an old school staircase, looking back over her shoulder directly at the viewer with an enigmatic expression. Character wears a classic school uniform consisting of a white blouse with navy blue tie, dark pleated skirt, black knee-high socks and leather shoes. Dramatic cinematic lighting from above creates strong contrast between illuminated figure and shadowy staircase surroundings, casting sharp shadows across the tile walls. Environment features worn ceramic tiles in checkerboard pattern on walls and steps, wrought iron handrail with peeling paint, and a vintage directional sign mounted on the wall showing Japanese characters for "出口".  Thick-coated 3D texture rendering with visible brushstrokes creates tangible surface quality on tiles, fabric and metal. Cool color palette dominated by deep blues and grays with strategic highlights in warm tones on skin and hair. Highly detailed illustration style blending anime aesthetics with painterly realism, emphasizing the visual impact of light and shadow interplay. Mysterious, contemplative mood with slight gothic undertones, professional digital painting with exceptional material rendering.
Atmospheric anime illustration.

There is a round mini table on the center of the image. There is a empty coffee cup on the table. A beautiful young girl with long flowing black hair standing on the right side of the table.The girl wears a classic school uniform consisting of a white blouse with navy blue tie, dark pleated skirt, black knee-high socks.

The soft lighting creates a humorous atmosphere.

In the living room, there are a window, curtains, bookshelfs, piled books, a table, and a potted plant.

「画面の右側に人がいる」のようなプロンプトは無視され、画面中央に人が配置される。「画面中央に○○があり、その右側に人がいる」のような書き方をしないと画面位置の指示が効かない。

Z Image Turbo

Z Image Turbo

A young woman is standing on a subway platform.

She is short black hair, wearing a light gray cap, a light colored cropped top and short black pants.

She is holding an iPhone in her left hand.

# background

There are signs, Braille blocks and blurred figures, with a sense of urban transit in the subway station. There is a sign written "出口" on it.

# style

japanese manga, line art, pen art, black and white, halftone

物を持たせると手が怪しい。点字ブロックは知っているが描くのはうまくはない。

公平な比較ではないが、Qwen Image Edit 2509 Lightning LoRA 4step のテキストエンコーダーに虎ノ門三丁目方面改札を示す矢印(虎ノ門ヒルズ駅)の無料の写真素材を入力して同じプロンプトで生成したものが以下になる。

Qwen Image Edit 2509 Lightning LoRA 4step<br/>参照画像付き

Qwen Image Edit 2509 Lightning LoRA 4step
参照画像付き


広告
広告

カテゴリ