Image Reconstruction from Human Brain Activity

Image Reconstruction from Human Brain Activity explores how generative AI can be used to reconstruct visual experiences directly from neural signals. Positioned at the intersection of computational neuroscience and computer vision, this project leverages latent diffusion models to translate functional MRI (fMRI) activity into high-resolution visual reconstructions.

Our framework integrates Versatile Diffusion with latent representations derived from brain activity using VDVAE, enabling a principled bridge between neural representations and visual perception. Additionally, CLIP-based multimodal alignment enhances semantic consistency between reconstructed images and their intended visual content.

Technologies: Stable Diffusion, VDVAE, CLIP, Python, PyTorch, Neuroimaging Pipelines, fMRI

Image Reconstruction from Human Brain Activity

Image Creation Pipeline. The process includes fMRI input, base image generation using an autoencoder, CLIP embedding extraction (visual and textual), and conditional refinement using the diffusion model.

Demo

Methodology

Our approach builds on recent advances in diffusion-based generative modeling, which have demonstrated remarkable performance in synthesizing high-resolution, multimodal content.

Key components include:

Latent Representation Extraction (VDVAE)
fMRI signals are mapped into latent representations aligned with the generative model’s latent space.
Latent Diffusion with Versatile Diffusion
These latent codes are injected into the diffusion process, guiding image generation from noise to structured visual content.
Multimodal Alignment with CLIP
Vision–language embeddings are used to reinforce semantic coherence between reconstructed images and their underlying perceptual meaning.

Together, these components enable a robust pipeline for translating neural activity into visually meaningful reconstructions.

Main Contributions

A diffusion-based framework for reconstructing visual stimuli directly from fMRI data
Integration of latent neural representations into the diffusion generation process
Demonstration of the strengths and limitations of diffusion models for neural decoding
Empirical insights into how scene complexity affects reconstruction fidelity