A chair can still look like a chair even when its surface is reduced to a sparse cloud of points. Humans are remarkably good at recognizing objects from this kind of minimal 3D information. A new study by SFI Program Postdoctoral Fellow Shuhao Fu and co-authors asks whether deep learning models represent 3D shapes in ways similar to human vision, or arrive at object recognition by different means.
The study compares human observers with two leading models for point-cloud recognition: the convolution-based “Dynamic Graph Convolutional Neural Network,” or DGCNN, and the transformer-based “Point Transformer.” Across three experiments, the researchers made recognition harder by reducing point density, distorting local geometric structure, and scrambling object parts. Human performance remained strong when point clouds became sparse or when local geometry was altered, but dropped sharply when part configuration was disrupted. That pattern suggests that human 3D vision depends strongly on global shape and the spatial arrangement of parts.
Among the models, the Point Transformer more closely matched human performance. To understand why, the authors conducted ablation studies, systematically removing parts of the model to see which ones mattered most for this human-like behavior. They found that the key was hierarchical downsampling, which lets the model build increasingly abstract shape representations across layers. Removing that module reduced human-like performance, while adding it to DGCNN improved it.
“Hierarchical abstraction was the critical factor identified in the ablation tests,” Fu says. “It encourages the model to integrate information across the entire shape, leading to more robust and human-like behavior.” The finding suggests a promising direction for future AI models that need to recognize 3D objects more robustly.
Read the full paper “Hierarchical abstraction drives human-like 3-D shape processing in deep learning models” in PLoS Comput Biol (March 13, 2026). DOI:10.1371/journal.pcbi.1014047