Deep Learning-based Optical Image Super-Resolution via Generative Diffusion Models

Conditional Latent Diffusion for Layer-wise In-situ LPBF Monitoring

Additive Manufacturing

Diffusion Model

Super-Resolution

Author

Donghyun Ko

Published

February 11, 2026

Paper Overview

Title: Deep Learning based Optical Image Super-Resolution via Generative Diffusion Models for Layerwise in-situ LPBF Monitoring
Authors: Francis Ogoke et al. (Carnegie Mellon University & Sandia National Laboratories)
Preprint: arXiv:2409.13171

Laser Powder Bed Fusion (LPBF) presents a fundamental trade-off in in-situ monitoring:
high-resolution (HR) optical imaging enables accurate defect detection and surface roughness estimation, yet it is computationally expensive and difficult to scale for real-time deployment. Conversely, low-cost webcam imaging is scalable but lacks the spatial fidelity required to resolve fine powder-bed texture and geometric irregularities. This work proposes a conditional latent diffusion framework that probabilistically reconstructs high-resolution optical images from low-resolution webcam images, effectively learning the conditional distribution: \(p(x_{HR} | x_{LR})\) Rather than producing deterministic upscaled images, the model captures nonlinear, multimodal, and high-frequency structures of powder-bed texture, preserving realism and uncertainty.

Core Contributions and Methodology

1. Conditional Denoising Diffusion Probabilistic Model (DDPM)

The framework implements a conditional DDPM, where:

HR images are progressively corrupted via a forward diffusion process.
A U-Net denoising network learns to predict injected noise.
LR images condition the reverse denoising process.
The model learns the full conditional distribution \(p(x_{HR} | x_{LR})\).

This overcomes the inherent ill-posed nature of deterministic super-resolution, which often produces over-smoothed reconstructions.

2. Latent Diffusion for Real-Time Feasibility

Pixel-space diffusion is computationally intensive.
To reduce inference cost, the authors adopt a latent diffusion model (LDM):

Two autoencoders encode HR and LR images into compact latent spaces.
Diffusion is performed in a reduced 16×16 latent space.
Only the HR latent is diffused; LR latent acts as condition.
The decoded output reconstructs the HR image.

Result:
Inference time is reduced from 0.13 s/sample (pixel diffusion)
to 0.01 s/sample (latent diffusion) — approximately 13× faster, enabling practical in-situ deployment.

Training Strategy

To handle large build-plate images:

Patch size: 64×64
115 patches per layer
80:20 train-test split
Autoencoder training: 100 epochs
Diffusion training: 300 epochs

A second experiment performs part-based training, ensuring no part-level data leakage.

Evaluation Metrics

Image Reconstruction Metrics

MAE (Mean Absolute Error)
PSNR (Peak Signal-to-Noise Ratio)
SSIM (Structural Similarity Index)

Texture-Level Metric

Normalized Covariance Distance (nCVD)
Based on phase-harmonic wavelet covariance operators
→ Captures preservation of high-frequency powder-bed texture.

Latent diffusion significantly reduces MAE and covariance distance while increasing PSNR and SSIM.

3D Morphology Reconstruction

To evaluate geometric fidelity beyond 2D metrics:

Each layer is segmented using Segment Anything (SAM).
Masks are stacked to reconstruct 3D morphology.
Three geometry metrics are computed:

IoU (Intersection-over-Union)
Hausdorff Distance
Voxel mismatch

Latent diffusion improves IoU and reduces geometric error compared to low-resolution and bicubic interpolation baselines.

Surface Roughness Estimation

Surface roughness metrics are computed from reconstructed contours:

Ra (Mean absolute roughness)
Rq (Root-mean-square roughness)
Rz (Peak-to-valley height)

Example (Dataset A):

HR: Ra = 30.7 µm
Latent Diffusion: Ra = 32.8 µm
Bicubic Upsampling: Ra = 57.8 µm

Latent diffusion closely approximates true surface roughness.

Zero-Shot Generalization

The study evaluates transferability across part geometries:

Synthetic low-resolution images created via Gaussian degradation.
Entire part builds held out during training.
Model tested on unseen geometries.

Result:

Stable PSNR/SSIM performance
Covariance distance improves as training diversity increases
Robust performance under increasing blur kernel size

This demonstrates strong inter-layer and inter-part generalization.

Key Findings

Latent diffusion reconstructs fine powder-bed texture.
Significant reduction in MAE and covariance distance.
Improved PSNR and SSIM.
~13× faster inference than pixel diffusion.
Improved 3D geometric accuracy (IoU ↑, H ↓, V ↓).
Accurate surface roughness recovery.
Strong zero-shot generalization.

Reviewer’s Takeaway

This work demonstrates how conditional latent diffusion models enable scalable, high-fidelity, real-time optical monitoring in LPBF. By integrating probabilistic generative modeling, latent compression, wavelet-based texture evaluation, and 3D geometric reconstruction, the framework advances beyond conventional super-resolution toward physics-aware additive manufacturing monitoring. The combination of distribution modeling and geometric validation makes this study a strong reference for researchers working on:

In-situ LPBF monitoring
Generative models in manufacturing
Diffusion-based super-resolution
Closed-loop defect detection systems

Slide deck: Download PDF