Image Reconstruction from Human Brain Activity
High-fidelity image reconstruction from human brain activity using latent diffusion models
Image Reconstruction from Human Brain Activity explores how generative AI can be used to reconstruct visual experiences directly from neural signals. Positioned at the intersection of computational neuroscience and computer vision, this project leverages latent diffusion models to translate functional MRI (fMRI) activity into high-resolution visual reconstructions.
Our framework integrates Versatile Diffusion with latent representations derived from brain activity using VDVAE, enabling a principled bridge between neural representations and visual perception. Additionally, CLIP-based multimodal alignment enhances semantic consistency between reconstructed images and their intended visual content.
Technologies: Stable Diffusion, VDVAE, CLIP, Python, PyTorch, Neuroimaging Pipelines, fMRI
Demo
Methodology
Our approach builds on recent advances in diffusion-based generative modeling, which have demonstrated remarkable performance in synthesizing high-resolution, multimodal content.
Key components include:
-
Latent Representation Extraction (VDVAE)
fMRI signals are mapped into latent representations aligned with the generative model’s latent space. -
Latent Diffusion with Versatile Diffusion
These latent codes are injected into the diffusion process, guiding image generation from noise to structured visual content. -
Multimodal Alignment with CLIP
Vision–language embeddings are used to reinforce semantic coherence between reconstructed images and their underlying perceptual meaning.
Together, these components enable a robust pipeline for translating neural activity into visually meaningful reconstructions.
Main Contributions
- A diffusion-based framework for reconstructing visual stimuli directly from fMRI data
- Integration of latent neural representations into the diffusion generation process
- Demonstration of the strengths and limitations of diffusion models for neural decoding
- Empirical insights into how scene complexity affects reconstruction fidelity