RTX3050+ComfyUI で SageAttention の導入

1. ComfyUI のインストールと venv の有効化
2. 環境情報の取得
3. Visual Studio 2015-2022 用 Visual C++ 再頒布可能パッケージのインストール
4. triton-windows のインストール
5. SageAttention のインストール
6. ComfyUI で SageAttention の有効化
7. TorchCompile のパス制限について
Patch Sage Attention KJ ノード

1. ComfyUI のインストールと venv の有効化

ComfyUI のインストールはこの記事では省略。

./venv/Script/activate

ComfyUI 公式は cu130 を推奨しているが、RTX3050 では動作しなかったので、cu128 をインストールしている。

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu128

2. 環境情報の取得

python -V

Python 3.12.9

pip show torch

Version: 2.9.1+cu128

CUDA は後方互換性があるので比較的高いバージョンを入れても問題は少ない。whl でインストールする場合は CUDAToolkit をインストールする必要はない。

サポートしてる CUDA のバージョンを調べる：

nvidia-smi

3. Visual Studio 2015-2022 用 Visual C++ 再頒布可能パッケージのインストール

libtriton.pyd のコンパイルに MSVC が必要になる。以下のインストーラーでインストールする。

https://aka.ms/vs/17/release/vc_redist.x64.exe

4. triton-windows のインストール

PyTorch のバージョンと triton のバージョンの対応表

PyTorch	triton
2.4	3.1
2.5	3.1
2.6	3.2
2.7	3.3
2.8	3.4
2.9	3.5

インストール

torch のバージョンが 2.9.1 なので triton 3.5 をインストールする。

pip install -U "triton-windows<3.6"

-U は --upgrade で、指定された条件でパッケージの最新版をインストールする。オプションは pip help install で見ることができる。

triton_windows-3.5.1.post23-cp312-cp312-win_amd64.whl がインストールされた。

インストールの検証

以下の test_triton.py を作成し、以下のコマンドで実行する。

python ./test_trition.py

# test_triton.py 
import torch
import triton
import triton.language as tl

@triton.jit
def add_kernel(x_ptr, y_ptr, output_ptr, n_elements, BLOCK_SIZE: tl.constexpr):
    pid = tl.program_id(axis=0)
    block_start = pid * BLOCK_SIZE
    offsets = block_start + tl.arange(0, BLOCK_SIZE)
    mask = offsets < n_elements
    x = tl.load(x_ptr + offsets, mask=mask)
    y = tl.load(y_ptr + offsets, mask=mask)
    output = x + y
    tl.store(output_ptr + offsets, output, mask=mask)

def add(x: torch.Tensor, y: torch.Tensor):
    output = torch.empty_like(x)
    n_elements = output.numel()
    grid = lambda meta: (triton.cdiv(n_elements, meta["BLOCK_SIZE"]),)
    add_kernel[grid](x, y, output, n_elements, BLOCK_SIZE=1024)
    return output

a = torch.rand(3, device="cuda")
b = a + a
b_compiled = add(a, a)
print(b_compiled - b)
print("If you see tensor([0., 0., 0.], device='cuda:0'), then it works")

以下の出力が表示されれば成功。

tensor([0., 0., 0.], device='cuda:0')
If you see tensor([0., 0., 0.], device='cuda:0'), then it works

この時点で、TorchCompileModel ノードや WanVideo Torch Compile Settings などのノードが使えるようになる。

5. SageAttention のインストール

https://github.com/woct0rdho/SageAttention/releases から環境に合ったものを選ぶ。

2.2.0.post4 をインストールした。post3 は Qwen Image (Edit) で黒画像が生成される問題がある。post4 はその確率が低い。

pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.2.0-windows.post4/sageattention-2.2.0+cu128torch2.9.0andhigher.post4-cp39-abi3-win_amd64.whl

動作検証

tests/test_sageattn.py を保存して実行する。

python ./test_sageattn.py

以下のような出力なら成功。

q (4, 32, 64, 128) cuda:0 torch.float16
sage vs math: mean_rtol=0.0373 max_rtol=2 mean_atol=0.0244 max_atol=0.0244
The above (except max_rtol) should be < 0.05 (on RTX 20xx/30xx) or < 0.1 (on RTX 40xx/50xx)

6. ComfyUI で SageAttention の有効化

オプションをつけて起動する。

python ./main.py --use-sage-attention

成功すると以下のメッセージが表示されて ComfyUI が起動する。

Using sage attention

7. TorchCompile のパス制限について

Torch compile は windows のパス長制限の 260 文字を超えるパスのキャッシュファイルを作ることがある。その場合 FileNotFoundError になる。

対処法としては Windows 10 バージョン 1607 以降で長いパスを有効にするやグループポリシーエディタを使う方法がある。

外部リンク

triton-windowsとSageAttentionの導入(ComfyUIポータブル版)

Detailed Step-by-Step Full ComfyUI with Sage Attention install instructions for Windows 11 and 4k and 5k Nvidia cards.

SageAttention3 の品質比較：someone posted today about sage attention 3, I tested it and here is my results

Patch Sage Attention KJ ノード

kijai/ComfyUI-KJNodes の Patch Sage Attention KJ ノードではモデルごとに SageAttention の有効無効を切り替えられる。