0% found this document useful (0 votes)

24 views11 pages

Fisher Information in Conditional Diffusion

The paper presents a novel Fisher information-based conditional diffusion (FICD) model aimed at improving the efficiency and quality of conditional image generation. By leveraging Fisher information as a weight to enhance the informativeness of conditions during the generation process, FICD achieves up to 2x speed improvements while maintaining high-quality outputs. Experimental results indicate that FICD outperforms existing training-free conditional diffusion methods in both speed and generation quality.

Uploaded by

z1716866166

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views11 pages

Fisher Information in Conditional Diffusion

Uploaded by

z1716866166

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

I MPROVING T RAINING - FREE C ONDITIONAL D IFFUSION M ODEL

VIA F ISHER I NFORMATION

Kaiyu Song, Hanjiang Lai

Sun Yat-Sen University
arXiv:2404.18252v2 [[Link]] 12 Nov 2024

{songky7, laihanj3}@[Link]

A BSTRACT
Training-free conditional diffusion models have received great attention in conditional image gen-
eration tasks. However, they require a computationally expensive conditional score estimator to
let the intermediate results of each step in the reverse process toward the condition, which causes
slow conditional generation. In this paper, we propose a novel Fisher information-based conditional
diffusion (FICD) model to generate high-quality samples according to the condition. In particular,
we further explore the conditional term from the perspective of Fisher information, where we show
Fisher information can act as a weight to measure the informativeness of the condition in each gen-
eration step. According to this new perspective, we can control and gain more information along
the conditional direction in the generation space. Thus, we propose the upper bound of the Fisher
information to reformulate the conditional term, which increases the information gain and decreases
the time cost. Experimental results also demonstrate that the proposed FICD can offer up to 2x
speed-ups under the same sampling steps as most baselines. Meanwhile, FICD can improve the
generation quality in various tasks compared to the baselines with a low computation cost.

1 Introduction

Recently, unconditional diffusion models (DMs) [1, 2, 3] have shown great success in image generation tasks. How-
ever, people often want to generate images with the properties they want. Conditional diffusion models [4], which
incorporate the conditions to generate the desired properties, have emerged as a crucial role for various generation
tasks, e.g., text-image generations [5, 6, 7], style-driven generation [8], and image edit-driven generation [9, 4, 10].
Many researchers [11, 9] have proposed to reuse/fine-tune the mature unconditional diffusion models to improve con-
ditional generation tasks.
From unconditional to conditional diffusion models [12], the main problem is how to incorporate conditional infor-
mation into the diffusion model to guide the sample generation. A new conditional score estimator [13, 5] should
be introduced into the diffusion models. However, it is not trivial to define the conditional score estimator. This is
because the diffusion model is a multi-step generation. In each step, the intermediate-generated results are sampled
from different noisy distributions. This hinders us from directly measuring the distance between the condition and
intermediate results to generate desired images. Training-based methods [5, 10] have first been proposed to solve this
problem. For all time steps, they retrain the time-dependent conditional score estimator [4] to measure the distance
between the condition and each intermediate result. Meanwhile, retraining the conditional score estimator requires a
large amount of computation.
Another line of research is training-free conditional diffusion models [12, 14, 9, 15, 16], which reuse the unconditional
score function for the conditional score estimator and do not require retraining. To make the conditional score estimator
not a time-dependent score function and training-free, one solution is posterior sampling [12]. Specifically, given the
t-th step intermediate result xt and the condition c, it uses two steps to make the conditional term not be a time-
dependent function: 1) Posterior mean. It first approximates the posterior mean x̂0 from xt by using the unconditional
score function; and 2) Measurement. It then uses the energy function [9] or other measures [14] to learn the relationship
between the time-independent x̂0 and the condition c. Hence, we can achieve a training-free method by reusing the
unconditional score function in the first step and using the same measure for different time steps since the x̂0 is time-
independent. Both are training-free. In this way, the conditional term will guide the intermediate results to be close to
the condition. However, the iterative generation process still causes slow conditional generation.
Previous methods alleviated the time cost by introducing additional hypotheses. For example, Gabriel et al. [16]
redefined the conditional term based on the Bayesian framework. RED [14] fine-tuned the start point of the reverse
process based on variational inference. Further, MPGD [15] introduced the linear-based manifold hypotheses to
decouple the dependency between the conditional term and the diffusion model.
In this paper, we offer a novel view based on the Fisher information [17] for the conditional score estimator, and a
novel Fisher information-based conditional diffusion (FICD) is proposed to reduce the time cost while maintaining
high-quality generation.
Concretely, according to the two steps, i.e., the posterior mean and measurement, in posterior sampling for training-
free methods, we also divide the conditional score estimator into 1) the posterior part (more details in Sec. 4) and
2) the measurement part (more details in Eq. 5). Then, we find that the posterior part can be redefined as the Fisher
information, where the Fisher information could be exactly reflected in the information gain. Based on this, the
posterior part could be further regarded as the weight function for the measurement part, which measures how much
information could be gained for the condition in each time step. Following this new view, we also find that the weight
function may sometimes hinder the reverse process from being close to the condition. Therefore, this motivates us to
use the upper bound of Fisher information to increase the overall information gain. In this way, with more information
gained following the conditional, the reverse process could generate images closer to the condition. Meanwhile, the
upper bound also helps us cancel the calculation of the diffusion model’s gradients simultaneously. Therefore, our
FICD could accelerate the conditional generation while maintaining high quality. To sum up, the main contributions
of this paper are:

• We propose a novel FICD to decrease the computation cost but increase the generation quality, which uses the
upper bound of the Fisher information to approximate the posterior mean by incorporating the information
theory.
• The proposed method provides a new perspective to understand and improve the existing training-free con-
ditional diffusion methods, where the key is to accumulate enough information gain for the condition in the
reverse process.
• The experimental results show that FICD accelerates the generation speed while maintaining high quality
compared with SOTA methods.

2 Related Work
Training-based conditional diffusion models. These methods aim to fine-tune the parameters of the score estimator
for different downstream tasks. For example, DreamBooth [11] directly fine-tuned the UNet of the diffusion model for
the condition. ControlNet [18] introduced an additional UNet and fine-tuned it while freezing the original one. Stable
Diffusion [5] introduced the transformer layers to re-train. Training-based methods could generate high-quality images
according to the condition, but the cost is too high since it needs to be fine-tuned each time for different downstream
tasks.
Training-free conditional diffusion methods. Some training-free methods focus on utilizing the structure and po-
tential of the UNet. For example, Tumanyan et al. [8] leveraged the attention maps from attention layers. Jaeseok et
al. [19] found the “h-space" among the UNet of the diffusion model. Wu et al. [20] directly changed the inputs of the
UNet. Except for the success, these methods need a special design based on the downstream tasks, thus limiting their
generalization.
The other methods [12, 21, 9, 22] focus on sampling from the posterior distribution. For example, DPS [12] first
used the distance norm as the metric based on the inverse problem. FreeDom [9] further used the energy function
as the metric, which extended it from solving the inverse problem to more widely downstream tasks such as face ID
generation [23]. Yong-Hyun [24] leveraged the bias vector discovered in the latent space to guide the diffusion models.
For the inverse problem, RED [14] and Gabriel [16] fine-tuned this start point and rebuilt the reverse process based
on the Bayesian framework to improve the generation quality and decrease the time cost. DSG [25] alleviated the
estimation bias of the condition in the reverse process to improve the FreeDom and mainly for the inverse problem.
Recently, MPGD [15] has been proposed to cancel the posterior part based on the manifold assumption. Compared
to the MPGD, FICD offers an additional view based on the information theory to accelerate the sampling process
while maintaining high-quality generation. MPGD introduced the manifold hypothesis based on linear assumption to
decouple the dependency, while our method has no additional assumption. Meanwhile, it could be found that dropping

2
the score function’s gradient will inevitably lose useful information, which is different from the MPGD and may lead
to an unstable generation for some tasks.
Information theory with the diffusion models. In light of the SDE [2] that introduced the score function to ex-
plain the behavior of DMs, information theory [17] shows the potential to make further improvements to DMs. Both
information theoretical diffusion [17] and InforDiffusion [26] leveraged mutual information to interpret the correla-
tion behind the observed and hidden variables to improve the generation quality. Interpretable diffusion [27] further
explained the most important part of the input for DMs similar to CAM [28].

3 Preliminary
Training-free conditional generation. Training-free methods aim to solve conditional generation tasks without re-
training the unconditional diffusion model. Suppose that we have an unconditional diffusion model, the uncondi-
tional score function is ∇xt log p(xt ) in the t-th timestep [1]. To keep the consistency with SDE [2], we also denote
sθ (xt , t) ≈ ∇xt log p(xt ), where sθ (xt , t) is the score function based on the neural network with θ parameters. Now
given the condition c, the conditional score function can be formulated as ∇xt log p(xt |c). The problem becomes
how to define this conditional score estimator.
By Bayesian rule [12], we have:
∇xt log p(xt |c) = ∇xt log p(xt ) + ∇xt log p(c|xt ) . (1)
| {z } | {z }
Unconditional term Conditional term
The unconditional term is the unconditional score function sθ (xt , t). To estimate the conditional term, a differentiable
metric ε(xt , c) (also called the energy function [9]) is proposed to measure the distance between xt and c:
exp−λε(xt ,c)
p(c|xt ) = , (2)
Z
where λ is a temperature coefficient and Z > 0 is a normalizing constant.
Following the posterior sampling [9], we calculate the posterior mean x̂0|t of the xt :
1
x̂0|t ≈ √ (xt + (1 − α̂t )sθ (xt , t)), (3)
α̂t
where α̂t is also a known parameter from the noise schedule [1] related to the t-th timestep. And then we use x̂0|t to
measure the distance between the condition c under the data domain instead of the noise domain [12]. Based on Eq. 2
and [12], we have:
log p(c|xt ) ≈ log p(c|x̂0|t ) ∝ ε(x̂0|t , c). (4)
Motivation. Based on the Eq. 1-Eq. 4, we take the derivative of the log p(c|xt ) with respect to xt and have
∂ x̂0|t ∂ε(x̂0|t , c)
∇xt log p(c|x̂0|t ) = . (5)
∂xt ∂ x̂0|t
According to Eq. 5, the derivative of the conditional term with respect to xt in training-free methods can be further
∂ x̂
divided into two parts. The first part is the derivative of the posterior mean x̂0|t with respect to xt : ∂x0|tt
, called
it posterior part. This part does not contain the condition c. The second part is the derivative of the measurement
∂ε(x̂ ,c)
function with respect to the posterior mean: ∂ x̂0|t0|t
. We refer to it as the measurement part, which contains the
condition c.
In this paper, we show that the Fisher information could explain the posterior part as the information gain. Following
this, the posterior part is similar to acts as the weight function for the measurement part to control the accumulated
information about how xt is close to the condition c. In this case, we further show an interesting finding: we could
increase the information gain of the posterior part to increase the generation quality. To achieve this, we propose using
the Fisher information’s upper bound as the new posterior part, where the upper bound could further reduce the time
cost.

4 Methodology
∂ x̂
In our work, we introduce the Fisher information to cancel the posterior part ∂x0|t
t
. Specifically, we leverage the upper
bound of the Fisher information to redefine the posterior part. Additionally, we provide an additional view from the
information theory to explain how the Fisher information improves the training-free conditional generation.

3
Algorithm 1 The overall algorithm for FICD
Input: c, T , sθ , the noise schedule parameter α̂t , βt , the differentiable metric function ε and the hyperparameter
ρt .
Output: x0 ▷ The generated image based on c
1: xT ∼ N (0, 1)
2: for t in [T − 1, ..., 1] do
3: ϵ ∼ N (0, 1) √
4: xt−1 = (1 + 0.5βt )xt + βt ∇xt log p(xt ) + βt ϵ
5: x̂0|t = √1α̂ (xt + (1 − α̂t )sθ (xt , t)) ▷ The MMSE estimation
t
2
6: gt = â ∇x̂0|t ε(x̂0|t , c)
√
t
7: xt−1 = xt−1 − ρt gt
8: end for
Return: x0

Fisher information-based conditional diffusion. We expand the posterior part as follows:

∂ x̂0|t 1 ∂sθ (xt , t)
= √ (1 + (1 − α̂t ) ). (6)
∂xt α̂t ∂xt
The intractable part is ∂sθ∂x
(xt ,t)
t
.In this condition, we introduce the Fisher information to help us investigate ∂sθ∂x
(xt ,t)
t
.
We have the following definition:
Definition 1. The gradient of the unconditional score function with respect to xt is precisely the definition of the
Fisher information [29].
∂sθ (xt , t)
I(xt ) = , (7)
∂xt
where I(xt ) is the Fisher information related to the variable xt , which measures the information that xt carried out
in the t-th time.
The detailed proofs are provided in the supplementary.

|| xlogp(x)||2 || xlogp(x)||2 || xlogp(x)||2 || xlogp(x)||2

1RUP9DOXH

1RUP9DOXH

7LPH6WHSV 7LPH6WHSV 7LPH6WHSV 7LPH6WHSV

(a) T = 200 (b) T = 150 (c) T = 100 (d) T = 50

Figure 1: An empirical study for gradient norm based on the style guidance tasks with stable diffusion. We show the
value of the ||∇xt log p(c|x̂0|t )||2 among different timesteps to show the information gain. Concretely (a), (b), (c),
and (d) report the values under 200, 100, 50, and 30 sampling steps respectively.

Cramér-Rao bound estimation. Interestingly, we could leverage the upper bound of I(xt ), the Cramér-Rao bound
estimation to cancel the posterior part:
Theorem 1. Given the sequence {xT , xT −1 , ..., xt , ..., x1 }, where t ∈ [T, 0) and xT is the initial state of the reverse
process, the I(xt ) is bounded to the Cramér-Rao bound:
1
I(xt ) < . (8)
1 − α̂t

We replace the I(xt ) by Cramér-Rao bound directly. In the end, the Eq. 6 could be estimated as:
∂ x̂0|t 2
≈√ . (9)
∂xt ât
Eq. 9 shows the upper bound for I(xt ) cancel the posterior part based on the noise scheduler parameters.

4
Condition
blonde Young
FIGD
MPGD
FreeDom beauty man

Figure 2: Qualitative examples of using a single condition human face images. The included conditions are (a) text,
(b) face parsing maps, and (c) sketches. We compare the results with those of three baselines. It can be found that
MPGD is invalid since these tasks break the linear hypothesis theory, and FICD performs well.

Sampling process for FICD. In the end, we could derive a new sampling process for FICD. By Eq. 5 and Eq. 9, we
have:
2 ∂ε(x̂0|t , c)
∇xt log p(c|x̂0|t ) ≈ √
ât ∂ x̂0|t
(10)
2
= √ ∇x̂0|t log p(c|x̂0|t ).
ât
Then by Eq. 1 and Eq. 10, the sampling process of FICD is:
2ρt
xt−1 = m̂t−1 − √ ∇x̂0|t ε(x̂0|t , c)
ât (11)
p
m̂t−1 = (1 + 0.5βt )xt + βt ∇xt log p(xt ) + βt ϵ,
where ρt is the hyperparameter regarded as the learning rate to control the strength of guidance. The detailed algorithm
is shown in Algorithm. 1.

4.1 An explanation via information-based perspective

We offer an additional explanation to illustrate why our method can perform better than the existing training-free
methods. Our explanation is based on the information perspective. Definition 1 illustrates that the posterior part could
be regarded as information gain. This helps us formulate an information-based perspective view.
∂ x̂ ∂ε(x̂ ,c)
Concretely, the measure part offers the direction following the condition. The overall ∂x0|t t
0|t
∂ x̂0|t could further re-
gard the accumulated information following the direction of the condition. Its norm |∇xt log p(c|x̂0|t )|2 denotes to re-
flect the amount of the information gain in the t time steps. It could be noticed that the larger norm |∇xt log p(c|x̂0|t )|2
means more information.
First, for the overall conditional term ∇xt log p(c|x̂0|t ), some recent work [9] had shown that the early and latter
phases could be skipped, and the critical phase is the middle phase in the conditional term. To further explore it, we
also make an empirical study based on the style-generation task shown in Fig. 3 via showing the |∇xt log p(c|x̂0|t )|2 in
different phases. As shown in Fig. 3, two observations can be concluded: 1) There is less information in the early and
later phases since |∇xt log p(c|x̂0|t )|2 is smaller. Small |∇xt log p(c|x̂0|t )|2 generates a small weight for the condi-
tional term to optimize the Eq. 2. 2) The key for conditional generation is the middle phase, since |∇xt log p(c|x̂0|t )|2

5
∇ xt logp(c|x̂ 0|t ) ∇ xt logp(c|x̂ 0|t ) ∇ xt logp(c|x̂ 0|t ) ∇ xt logp(c|x̂ 0|t )

),C' ),C' ),C' ),C'

*UDGLHQW1RUP

*UDGLHQW1RUP

7LPHVWHSV 7LPHVWHSV 7LPHVWHSV 7LPHVWHSV

(a) T = 200 (b) T = 100 (c) T = 50 (d) T = 30

Figure 3: The comparison between FICD and ||∇xt log p(c|x̂0|t )||2 . (a) and (c) shows the value of the gradient norm
between ∇xt log p(c|x̂0|t ) and FICD under T = 200 and T = 50 respectively. (b) and (d) is the sub-view for (a) and
(c) repetitively.

keeps in a suitable level. Different timesteps T show a similar trend. In this information-based perspective, in the early
and late phases, the existing methods [12] do not generate samples well toward the condition, where a lot of time steps
are wasted.
Intuitively, more information should be accumulated in both the early and late phases, especially for the early phase,
where the distance between xt in the early phase and c is very large. But, the above empirical results show that the
existing methods do not provide enough information for the conditions. Furthermore, we show that the upper bound
of the proposed method can help us increase the information about the condition, especially in the early phase. Please
note that after reformulating the posterior part by the upper bound, since 1 − ât will tend to be small, thus √2â now
t
tend to large. In this condition, more information could be gained in the early phase, which generates a significant
weight from the posterior part for the measure part. The early phase could contribute more to optimizing the Eq. 2. In
this way, FICD could generate the conditional sample in the early phase, which reduces the time cost. We also report
the empirical studies shown in Fig. 3 to show that the upper bound could increase the weight of the measure part in
the early phase, which proves our explanation.

5 Experiment

5.1 Experiment Settings

To demonstrate the potential of FICD, we focus on tasks in which the differentiable metrics are nonlinear based
on various open-source diffusion models. The tasks in this paper mainly contain face-related tasks, style-guided
generation with stable diffusion, and ControlNet-related generations with multiple guidance.
For the face-related tasks, we introduce text, segmentation maps, and sketches as the condition to guide the generation
process following the FreeDom [9]. For the style-guided generation, we introduce a style image as the condition to
guide the generation process. For the ControlNet-related generations, we introduce complex multi-guidance. The
detailed settings, including the measurement for these tasks and the hyperparameters we have used, are listed in the
supplementary.
We use a single RTX4090 GPU to finish all the experiments. The baselines we used in our paper are three SOTA
methods: FreeDom, MPGD, and LGD [30]. We use the time-travel strategy [9]. To make a fair comparison, all the
pre-trained diffusion models we used are the same as the FreeDom.

5.2 Qualitative Results

Face-related tasks. To begin with, we first show the result of FICD in face-related tasks, including text, face parsing
maps, and sketches, as shown in Fig 2. It could be found that FICD generates high-quality images related to the
condition compared with the MPGD. Meanwhile, compared with the FreeDom, FICD generates similar images. This
proves the validity of our theory analysis, where FICD increases the information to generate high-quality images
further. We also report the qualitative results in the supplementary.
Style-guided generation with Stable Diffusion. To further show the potential of the FICD, we change the tasks
to the style transfer based on the diffusion model. This task is more complicated than face-related tasks since there
are two guides: 1) the text prompts and 2) the style image. We report the qualitative examples shown in Fig. 4 and
the qualitative results in Table 1. It can be seen that FICD the SOTA performance. Under the T = 100, FICD
achieves 225.83 Style metric while maintaining the 29.59 CLIP, which shows a SOTA trade-off compared to the other

6
Prompt: “a cat wearing glasses”
FreeDom FreeDom MPGD MPGD FIGD FIGD
Style (𝑇 = 100) (𝑇 = 50) (𝑇 = 100) (𝑇 = 50) (𝑇 = 50) (𝑇 = 30)

Figure 4: Qualitative examples of style-guided generation with Stable Diffusion experiment based on FICD compared
with the three baselines.

baselines. Meanwhile, with the decrease in the time step from T = 50 to T = 30, FICD achieved the SOTA results.
These results show that improving the information gain could enhance efficiency while verifying that the conflict may
lead to insufficient accumulated information.
ControlNet-related generations with multiple guidance. To further show the effectiveness of the FICD, following
the FreeDom, we implement it in the multiple guidance tasks based on the ControlNet, which includes two tasks: 1)
face guidance and style tasks. Concretely, for the face guidance tasks, we use the text prompts and pose mappings
as the input of ControlNet to generate images with similar poses and satisfy the description of text prompts. In this
condition, we add face ID images as the independent condition to guide the ControlNet in generating similar faces
and poses that satisfy the text prompt and face ID. Then, We report the qualitative examples and the qualitative results
shown in Fig 5. Since the MPGD will generate obvious mismatch generation based on the qualitative examples, we
further compare the FreeDom shown in Table 2. It can be noticed that FICD improves the pose distance metric from
54.21 to 40.01 compared to the FreeDom. Then, the time cost reduces from the 193s to 42s. This illustrates the
efficiency of the FICD, which further verifies the validity of the Fisher information.
For the style task, we changed the condition of the ControlNet to the sketch and used the style image as independent
guidance to prove the feasibility of the FICD. We report the qualitative examples shown in Fig. 6. It can be noticed
that FreeDom works well compared to the MPGD. Thus, we make a further comparison between FreeDom and FICD
shown in Table 3. It can be further found that FICD improves the CLIP metric from 69.71 to 70.27. Then, FICD
improves the Style metric from 282.25 to 244.50. The time cost has been reduced from 299s to 32s by FICD. These
results show the efficiency of the FICD. Meanwhile, it also proves that the conflict will truly cause the accumulated
information not to be enough, and improving the information gained could enhance the generation quality.

7
Table 1: Quantitative results of style-guided generation with Stable Diffusion experiment. We compared FICD with
the three baselines, FreeDom, MPGD, and LGD, in style-guided generation tasks with the Stable Diffusion; we get
the overall improvement both in the Style score [9], CLIP score, and the time cost since our method could generate
high-quality images with few sampling steps. NA means that we used the shared memory since the requirement GPU
VRAM of LGD-MC is over 24GB, and we cannot get the precise time cost.
Method Style↓ CLIP↑ Time(s)
LGD-MC (T = 100) 247.39 29.96 NA
FreeDom (T = 100) 226.26 28.07 63
MPGD (T = 100) 351.16 25.08 27
FICD (T = 100) 225.83 29.59 27
MPGD (T = 50) 408.28 23.23 17
FreeDom (T = 50) 377.96 30.01 33
FICD (T = 50) 245.51 30.32 17
FICD (T = 30) 281.67 29.07 11

Table 2: Qualitative results of face-related tasks based on ControlNet experiments, where Pose distance is the norm
between pose maps and generated images.
Method Pose Distance↓ Time(s)
FreeDom 54.21 193
FICD 40.01 42

Ablation study for ρt . We study the effect of ρt from small to large shown (ρt = 1 as the beginning) in the supple-
mentary. We can see that FICD is scalable, and the user can set different values according to the requirements.

6 Conclusion

We proposed the Fisher information conditional diffusion to achieve training-free condition generation called FICD.
FICD first finds that the conditional term under the training-free conditional generation could be split into the measure-
ment and posterior parts. Then, the posterior part could be modeled using Fisher’s information. In this novel view, the
function of the conditional term could be explained as accumulating enough information following the measurement
part. The posterior part plays a role in controlling how much information can be accumulated. In this case, the novel
insight is to use the upper bound of the Fisher information to approximate the posterior part. This leads to an increase
in the overall information gain in the generation process. An interesting finding emerges here: increasing the infor-
mation gain could enable FICD to achieve conditional generation with fewer steps, thus first reducing the time cost.
This also improves the generation quality since more information can be accumulated during the overall generation
process. To prove our theory, we offer an information theory-based explanation, giving an illustration based on the
information theory to show why FICD could work well. In the end, the experimental results show that FICD could
successfully increase the generation quality while reducing the time cost.
Limitation. However, there are some limitations. We have shown that dropping the gradient will cause some in-
formation to be lost. No theory identifies whether such information always has a negative influence. Thus, this will
inevitably cause a potential threat. For example, based on further exploration, we find that dropping the gradient
will inevitably aggregate conflict among various conditions. This is obvious in the ControlNet-related tasks. This is

Table 3: Qualitative results of style image guidance with ControlNet experiments, where CLIP is the cosine similarity
of the clip embedding of generated images and sketch, Style is the distance norm between style images and generated
images.
Method CLIP↑ Style↓ Time(s)
FreeDom 69.71 282.25 299
FICD 70.27 244.50 32

8
Condition

“Young
Man
Realistic
Photo”

FreeDom FIGD FreeDom FIGD

Figure 5: Qualitative examples of face-related tasks based on ControlNet experiments. We compared FICD with
FreeDom and MPGD.

another reason why our upper bound could work better than only increasing the factor of the condition (Detailed in
supplementary). In future work, we will further study to improve FICD.

References

[1] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. CoRR, abs/2006.11239,
2020.
[2] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-
based generative modeling through stochastic differential equations. In International Conference on Learning
Representations, 2021.
[3] Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models, 2022.
[4] Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models,
2023.
[5] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image
synthesis with latent diffusion models, 2021.

9
Condition

“Young
Man
Realistic
Photo”

FreeDom MPGD FIGD FreeDom MPGD FIGD

Figure 6: Qualitative examples of the style task based on ControlNet experiments. We compared FICD with the
FreeDom. The text prompts are “bike" for a sketch bike.

[6] Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever,
and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models,
2022.
[7] Bram Wallace, Akash Gokul, Stefano Ermon, and Nikhil Naik. End-to-end diffusion latent optimization improves
classifier guidance, 2023.
[8] Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. Plug-and-play diffusion features for text-driven
image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog-
nition (CVPR), pages 1921–1930, June 2023.
[9] Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, and Jian Zhang. Freedom: Training-free energy-
guided conditional diffusion model. Proceedings of the IEEE/CVF International Conference on Computer Vision
(ICCV), 2023.
[10] Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit:
Guided image synthesis and editing with stochastic differential equations, 2022.
[11] Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth:
Fine tuning text-to-image diffusion models for subject-driven generation. In IEEE/CVF Conference on Computer
Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pages 22500–22510,
2023.
[12] Hyungjin Chung, Jeongsol Kim, Michael Thompson Mccann, Marc Louis Klasky, and Jong Chul Ye. Diffusion
posterior sampling for general noisy inverse problems. In The Eleventh International Conference on Learning
Representations, 2023.
[13] Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis. CoRR, abs/2105.05233,
2021.
[14] Morteza Mardani, Jiaming Song, Jan Kautz, and Arash Vahdat. A variational perspective on solving inverse
problems with diffusion models, 2023.
[15] Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao,
Yuki Mitsufuji, J. Zico Kolter, Ruslan Salakhutdinov, and Stefano Ermon. Manifold preserving guided diffusion,
2023.

10
[16] Gabriel Cardoso, Yazid Janati El Idrissi, Sylvain Le Corff, and Eric Moulines. Monte carlo guided diffusion for
bayesian linear inverse problems, 2023.
[17] Xianghao Kong, Rob Brekelmans, and Greg Ver Steeg. Information-theoretic diffusion. In International Con-
ference on Learning Representations, 2023.
[18] Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, and Chen Chen. Control-
net++: Improving conditional controls with efficient consistency feedback, 2024.
[19] Jaeseok Jeong, Mingi Kwon, and Youngjung Uh. Training-free content injection using h-space in diffusion
models, 2024.
[20] Chen Henry Wu and Fernando De la Torre. A latent space of stochastic diffusion models for zero-shot image
editing and guidance. In ICCV, 2023.
[21] Jiaming Song, Arash Vahdat, Morteza Mardani, and Jan Kautz. Pseudoinverse-guided diffusion models for
inverse problems. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali,
Rwanda, May 1-5, 2023, 2023.
[22] Youcan Xu, Zhen Wang, Jun Xiao, Wei Liu, and Long Chen. Freetuner: Any subject in any style with training-
free diffusion, 2024.
[23] Zalan Fabian, Berk Tinaz, and Mahdi Soltanolkotabi. Adapt and diffuse: Sample-adaptive reconstruction via
latent diffusion models, 2023.
[24] Yong-Hyun Park, Mingi Kwon, Jaewoong Choi, Junghyo Jo, and Youngjung Uh. Understanding the latent space
of diffusion models through the lens of riemannian geometry. In Proceedings of the Advances in Neural Infor-
mation Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS
2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023.
[25] Lingxiao Yang, Shutong Ding, Yifan Cai, Jingyi Yu, Jingya Wang, and Ye Shi. Guidance with spherical gaussian
constraint for conditional diffusion, 2024.
[26] Yingheng Wang, Yair Schiff, Aaron Gokaslan, Weishen Pan, Fei Wang, Christopher De Sa, and Volodymyr
Kuleshov. InfoDiffusion: Representation learning using information maximizing diffusion models. In Proceed-
ings of the 40th International Conference on Machine Learning, pages 36336–36354, 2023.
[27] Xianghao Kong, Ollie Liu, Han Li, Dani Yogatama, and Greg Ver Steeg. Interpretable diffusion via information
decomposition, 2023.
[28] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv
Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. International Journal
of Computer Vision, 128(2):336–359, October 2019.
[29] Andrew R. Barron. Entropy and the central limit theorem. The Annals of Probability, 14(1):336–342, 1986.
[30] Jiaming Song, Qinsheng Zhang, Hongxu Yin, Morteza Mardani, Ming-Yu Liu, Jan Kautz, Yongxin Chen, and
Arash Vahdat. Loss-guided diffusion models for plug-and-play controllable generation. In International Confer-
ence on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, pages 32483–32498, 2023.

Statistical Theory of Conditional Diffusion
No ratings yet
Statistical Theory of Conditional Diffusion
92 pages
Enhancing Diffusion Models with ICG and TSG
No ratings yet
Enhancing Diffusion Models with ICG and TSG
26 pages
Manifold Constraints in Diffusion Models
No ratings yet
Manifold Constraints in Diffusion Models
29 pages
NeurIPS 2024 Neural Flow Diffusion Models Learnable Forward Process for Improved Diffusion Modelling Paper Conference
No ratings yet
NeurIPS 2024 Neural Flow Diffusion Models Learnable Forward Process for Improved Diffusion Modelling Paper Conference
34 pages
Denoising Score Distillation for Image Generation
No ratings yet
Denoising Score Distillation for Image Generation
31 pages
Conditional Diffusion Model
No ratings yet
Conditional Diffusion Model
11 pages
Universal Guidance for Diffusion Models
No ratings yet
Universal Guidance for Diffusion Models
20 pages
Transfer Learning in Diffusion Models
No ratings yet
Transfer Learning in Diffusion Models
28 pages
Denoising Diffusion Bridge Models (DDBMs)
No ratings yet
Denoising Diffusion Bridge Models (DDBMs)
26 pages
Overview of Diffusion Models and Applications
No ratings yet
Overview of Diffusion Models and Applications
39 pages
Learning Stochastic Dynamical Systems
No ratings yet
Learning Stochastic Dynamical Systems
23 pages
[23] ReDi
No ratings yet
[23] ReDi
16 pages
STABLE GENERATIVE MODELING USING DIFFUSION MAPS
No ratings yet
STABLE GENERATIVE MODELING USING DIFFUSION MAPS
23 pages
Inductive Moment Matching for Efficient Sampling
No ratings yet
Inductive Moment Matching for Efficient Sampling
36 pages
Inductive Moment Matching for Generative Models
No ratings yet
Inductive Moment Matching for Generative Models
36 pages
Manifold Learning for Optimization Acceleration
No ratings yet
Manifold Learning for Optimization Acceleration
30 pages
Model-Guidance for Diffusion Models
No ratings yet
Model-Guidance for Diffusion Models
11 pages
Diffusion Predictive Control in Robotics
No ratings yet
Diffusion Predictive Control in Robotics
14 pages
Robust Diffusion Models with Noisy Labels
No ratings yet
Robust Diffusion Models with Noisy Labels
44 pages
Complete Recipe for Diffusion Models
No ratings yet
Complete Recipe for Diffusion Models
12 pages
Kernel-Based Approach for Class Labeling
No ratings yet
Kernel-Based Approach for Class Labeling
9 pages
Statistical Learning in Inverse Problems
No ratings yet
Statistical Learning in Inverse Problems
18 pages
Data Selection Strategies for ERMs
No ratings yet
Data Selection Strategies for ERMs
32 pages
Bayesian Inference for Nonlinear Diffusion Models
No ratings yet
Bayesian Inference for Nonlinear Diffusion Models
20 pages
Enhancing Conditional Diffusion Models
No ratings yet
Enhancing Conditional Diffusion Models
18 pages
PAG Paper
No ratings yet
PAG Paper
15 pages
Gaussianization for Enhanced Diffusion Models
No ratings yet
Gaussianization for Enhanced Diffusion Models
17 pages
Guided Diffusion for PDE Sampling
No ratings yet
Guided Diffusion for PDE Sampling
30 pages
Consistency Models
No ratings yet
Consistency Models
42 pages
Trajectory Flow Matching for Clinical Time Series
No ratings yet
Trajectory Flow Matching for Clinical Time Series
27 pages
CADS: Enhancing Diffusion Model Diversity
No ratings yet
CADS: Enhancing Diffusion Model Diversity
33 pages
Trajectory Flow Matching for Time Series
No ratings yet
Trajectory Flow Matching for Time Series
21 pages
ANFIS: Neuro-Fuzzy Modeling Techniques
No ratings yet
ANFIS: Neuro-Fuzzy Modeling Techniques
7 pages
Stochastic Optimization with Drift Analysis
No ratings yet
Stochastic Optimization with Drift Analysis
56 pages
Interval-Valued Fermatean Fuzzy Sets Analysis
No ratings yet
Interval-Valued Fermatean Fuzzy Sets Analysis
17 pages
Probabilistic Forecasting with FLOWTIME
No ratings yet
Probabilistic Forecasting with FLOWTIME
14 pages
Pseudoinverse-Guided Diffusion Models
No ratings yet
Pseudoinverse-Guided Diffusion Models
30 pages
Wu Et Al - 2025 - Diffusing States and Matching Scores
No ratings yet
Wu Et Al - 2025 - Diffusing States and Matching Scores
26 pages
Data-Centric Optimization for Imbalanced Datasets
No ratings yet
Data-Centric Optimization for Imbalanced Datasets
11 pages
Noise Scheduling in Diffusion Models
No ratings yet
Noise Scheduling in Diffusion Models
11 pages
Discrete Diffusion for Complex Reasoning
No ratings yet
Discrete Diffusion for Complex Reasoning
24 pages
Use of Information Uncertainty in Identification Tasks
No ratings yet
Use of Information Uncertainty in Identification Tasks
6 pages
Adaptive Diffusion Guidance Control
No ratings yet
Adaptive Diffusion Guidance Control
15 pages
Invariant Feature Extraction in Gaussian Models
No ratings yet
Invariant Feature Extraction in Gaussian Models
26 pages
Analyzing Guidance in Diffusion Models
No ratings yet
Analyzing Guidance in Diffusion Models
40 pages
Understanding Diffusion Models in AI
No ratings yet
Understanding Diffusion Models in AI
151 pages
Beginner's Guide to Diffusion Models
No ratings yet
Beginner's Guide to Diffusion Models
8 pages
Machine Learning Workflow Guide
No ratings yet
Machine Learning Workflow Guide
6 pages
A New Correlation Belief Function in Dempster-Shafer Evidence Theory and Its Application in Classification
No ratings yet
A New Correlation Belief Function in Dempster-Shafer Evidence Theory and Its Application in Classification
20 pages
Progressive Guidance for Diffusion Models
No ratings yet
Progressive Guidance for Diffusion Models
13 pages
Non-Stationary Function Modeling Techniques
No ratings yet
Non-Stationary Function Modeling Techniques
35 pages
Sine-Trigonometric Fuzzy Decision Model
No ratings yet
Sine-Trigonometric Fuzzy Decision Model
13 pages
INDigo: INN-Driven Diffusion for Inverse Problems
No ratings yet
INDigo: INN-Driven Diffusion for Inverse Problems
6 pages
Multi-Fidelity Data for Differential Equations
No ratings yet
Multi-Fidelity Data for Differential Equations
11 pages
Context-Aware Drift Detection Framework
No ratings yet
Context-Aware Drift Detection Framework
25 pages
Hesitation Analysis in Decision-Making
No ratings yet
Hesitation Analysis in Decision-Making
20 pages
Physics-Informed Diffusion Models
No ratings yet
Physics-Informed Diffusion Models
10 pages
Principles of Diffusion Models Explained
No ratings yet
Principles of Diffusion Models Explained
470 pages
Theoretical Guarantees For Sampling and Inference in Generative Models With Latent Diffusions
No ratings yet
Theoretical Guarantees For Sampling and Inference in Generative Models With Latent Diffusions
31 pages
The Inverse Method Parametric Verification of Real Time Unbedded Systems 1st Edition Etienne André Sample
No ratings yet
The Inverse Method Parametric Verification of Real Time Unbedded Systems 1st Edition Etienne André Sample
83 pages
Enhancing Seismic Data with MSSA
No ratings yet
Enhancing Seismic Data with MSSA
112 pages
Inverse Boundary Value Problem Analysis
No ratings yet
Inverse Boundary Value Problem Analysis
17 pages
3D AVO Inversion Methodology Explained
No ratings yet
3D AVO Inversion Methodology Explained
4 pages
PINNs for Channel Flow Roughness Coefficients
No ratings yet
PINNs for Channel Flow Roughness Coefficients
29 pages
Deep Learning for EM Inversion Techniques
No ratings yet
Deep Learning for EM Inversion Techniques
16 pages
PINNs for Inverse Pollution Source Localization
No ratings yet
PINNs for Inverse Pollution Source Localization
130 pages
Bayesian Inversion: Concepts & Examples
No ratings yet
Bayesian Inversion: Concepts & Examples
104 pages
Assam SLET Earth Sciences Syllabus
No ratings yet
Assam SLET Earth Sciences Syllabus
11 pages
Daily Land Surface Albedo from Meteosat
No ratings yet
Daily Land Surface Albedo from Meteosat
16 pages
Report Template For Major Project For B Dot Tech SRM Institute of Science and Technology
No ratings yet
Report Template For Major Project For B Dot Tech SRM Institute of Science and Technology
22 pages
Elastic Impedance Estimation in Nova Scotia
No ratings yet
Elastic Impedance Estimation in Nova Scotia
9 pages
Optimized Resistivity Arrays for Geotech
No ratings yet
Optimized Resistivity Arrays for Geotech
17 pages
sLORETA vs eLORETA in EEG Analysis
No ratings yet
sLORETA vs eLORETA in EEG Analysis
10 pages
Bayesian Inference for COVID-19 Dynamics
No ratings yet
Bayesian Inference for COVID-19 Dynamics
26 pages
Frequency vs Time Domain Methods for Transmission Line Parameter Reconstruction
No ratings yet
Frequency vs Time Domain Methods for Transmission Line Parameter Reconstruction
37 pages
ZONDRES2D Software Overview and Guide
No ratings yet
ZONDRES2D Software Overview and Guide
167 pages
Improved Neural Network for Inverse Scattering
No ratings yet
Improved Neural Network for Inverse Scattering
17 pages
Real-Time Optimization in 3D Printing
No ratings yet
Real-Time Optimization in 3D Printing
9 pages
Invisibility Physics Past Present and Future 2013 Progress in Optics
No ratings yet
Invisibility Physics Past Present and Future 2013 Progress in Optics
50 pages
Hydrodispersive Soil Properties Estimation
No ratings yet
Hydrodispersive Soil Properties Estimation
11 pages
Backsolution: Inverse Problems with AD
No ratings yet
Backsolution: Inverse Problems with AD
7 pages
10th Grade Mathematics Textbook Guide
No ratings yet
10th Grade Mathematics Textbook Guide
60 pages
M.Tech Geophysics Course Structure
No ratings yet
M.Tech Geophysics Course Structure
36 pages
MASW 2D Shear-Velocity Mapping Guide
No ratings yet
MASW 2D Shear-Velocity Mapping Guide
12 pages
Electrical Resistivity Testing Method
No ratings yet
Electrical Resistivity Testing Method
18 pages
(Ebook PDF) Project Management Achieving Competitive Advantage 5th Edition by Jeffrey Pinto 1292269146 978-1292269146 Full Chapters Ebook PDF
100% (3)
(Ebook PDF) Project Management Achieving Competitive Advantage 5th Edition by Jeffrey Pinto 1292269146 978-1292269146 Full Chapters Ebook PDF
57 pages
ThermoSysPro Modeling with OpenModelica
No ratings yet
ThermoSysPro Modeling with OpenModelica
8 pages
Inverse Problems: Week 2 Overview
No ratings yet
Inverse Problems: Week 2 Overview
40 pages
Inverse Heat Transfer Fundamentals and Applications First Edition M Necat Ozisik Ebook Full Bonus Content
100% (2)
Inverse Heat Transfer Fundamentals and Applications First Edition M Necat Ozisik Ebook Full Bonus Content
56 pages

Fisher Information in Conditional Diffusion

Uploaded by

Fisher Information in Conditional Diffusion

Uploaded by

I MPROVING T RAINING - FREE C ONDITIONAL D IFFUSION M ODEL

VIA F ISHER I NFORMATION

Kaiyu Song, Hanjiang Lai

Fisher information-based conditional diffusion. We expand the posterior part as follows:

(a) T = 200 (b) T = 150 (c) T = 100 (d) T = 50

4.1 An explanation via information-based perspective

(a) T = 200 (b) T = 100 (c) T = 50 (d) T = 30

5.1 Experiment Settings

5.2 Qualitative Results

FreeDom FIGD FreeDom FIGD

FreeDom MPGD FIGD FreeDom MPGD FIGD

You might also like