EquiDexFlow

Contact-Grounded SE(3)-Equivariant
Dexterous Grasp Generative Flows

Clinton Enwerem, John S. Baras, & Calin Belta
Input object
Input $\mathcal{P}$
$R_z(120^\circ)$
Non-equivariant baseline
Baseline
EquiDexFlow equivariant
EquiDexFlow

Under a 120° object rotation, a non-equivariant baseline keeps the original grasp, while EquiDexFlow co-rotates the grasp and preserves contacts.

0%
Friction Violations
(architectural guarantee)
<0.04°
Wrist Rotation Error
(SE(3) equivariance)
0.46 Nm
Wrench Residual
(lowest among all variants)
8,100
Training Grasps
(81 objects, Allegro Hand)

Abstract

Most learned dexterous grasp generators relegate contact forces to a downstream verification step, so a kinematically-plausible pose can still violate the conditions for a stable physical grasp. We address this with EquiDexFlow, an SE(3)-equivariant flow-matching model that jointly predicts wrist pose, joint angles, fingertip contacts, surface normals, and contact forces from an object point cloud. Our architecture projects contacts onto the object surface and forces into the Coulomb friction cone by construction, so placement and friction compliance hold without loss penalties.

We prove end-to-end SE(3) equivariance and verify it empirically over 200 rotations, with wrist residuals below 0.04° and exactly zero joint deviation. Trained on 8,100 force-closure grasps across 81 objects for the 16-DoF Allegro Hand,EquiDexFlow achieves zero friction violations, the best composite score, and the lowest wrench residual among all ablation variants. We retarget decoded fingertip contacts to a 16-DoF LEAP Hand via per-finger inverse kinematics, and our hardware-feasible refinement places every joint at least 5% inside its actuator envelope while preserving wrench balance. On the physical robot, retargeted grasps complete open-loop pick-and-hold trials on all six test objects, with every asymmetric object succeeding at both the canonical pose and a 120° co-rotation.

Architecture

EquiDexFlow takes an object point cloud and a kinematic model of a $D$-DoF, $M$-fingered robotic hand and produces, in a single forward pass: a wrist SE(3) pose, $D$ joint angles from a conditional normalizing flow, a set of $M$ contact points projected onto the object surface, and per-contact forces projected into the friction cone, all jointly consistent with the learned distribution. The released Allegro checkpoints use $D{=}16$ and $M{=}4$. Both are set per-hand in the model config.

VN-DGCNN Encoder

SO(3)-equivariant point-cloud encoder producing features $z_O \in \mathbb{R}^{341 \times 3}$.

SE(3) Flow

Wrist pose via flow matching on the Lie group, integrated with Munthe-Kaas RK4.

Contact Decoder

Per-finger contacts projected onto the object surface by construction.

Force Decoder

Contact forces projected into the Coulomb friction cone architecturally.

Joint Flow

Conditional Real-NVP normalizing flow for multimodal finger configurations.

Key Ideas

Force-Aware Generation

EquiDexFlow jointly predicts hand configuration, fingertip contacts, and contact forces from a point cloud in one forward pass, without post-hoc force recovery. Cone projection enforces Coulomb friction and differentiable surface projection keeps contacts on the object. Both hold by construction for every sample.

End-to-End Equivariance

The full pipeline (encoder, SE(3) flow, contact/normal/force decoders, and wrist refinement) is SE(3)-equivariant by construction. Rotating the input object rotates the generated grasp identically, with wrist rotation residual below 0.04° and zero joint deviation across 200 SO(3) rotations.

Cross-Embodiment Transfer

EquiDexFlow's contact-grounded representation is embodiment-agnostic: the encoder, flow backbone, and decoder heads do not depend on the hand model. To demonstrate this fact, we retarget Allegro grasps to a physical LEAP Hand via inverse kinematics and confirm that the grasps are transfer to hardware.

SE(3) Equivariance

Grasps co-rotate with the object under arbitrary rotations. Wrist residual <0.04°, joint deviation exactly zero.

SNS Cup 0°
SNS Cup 120°
$R_z(120^\circ)$
SNS Cup 240°
$R_z(240^\circ)$

Extended gallery: six objects across three input rotations (0°, 120°, 240° about the vertical axis). The wrist pose and finger configuration co-rotate with the object.

Six-object SE(3) equivariance gallery

Objects, left to right then top to bottom in triples: YCB pudding box, EGAD! G6, box primitive, YCB mustard bottle, EGAD! E4, and EGAD! G5. Within each object, the three frames are 0°, 120°, and 240°.

Generated Grasps

Allegro Hand grasps produced by EquiDexFlow across EGAD!, YCB, and primitive objects. Each grasp is decoded from a point cloud and seated on the surface by wrist refinement.

Allegro grasp gallery 2x8
Wrist refinement: pre-contact IK (left) versus contact IK (right) on a tomato soup can

Pre-contact IK (left) and contact IK (right) on the YCB tomato soup can

Wrist Refinement

The flow-decoded wrist typically places fingertips a few centimeters off the object surface. Test-time optimization jointly refines the full 6-DoF wrist pose and the 16-DoF joint vector so the kinematic fingertips reach the predicted contacts, with trust-region regularization and mesh penetration penalties. The refinement is frame-invariant and commutes with any SE(3) action, preserving the equivariance guarantee.

Results

Grasp quality on 811 test grasps, 10 samples each. All variants achieve 0% friction violations (architectural guarantee).

Method Contact Err (m) ↓ Force Err (N) ↓ Fric. Viol. (%) ↓ Top-1 Score ↑ Top-3 Score ↑ Wrench Res (Nm) ↓
PoseOnly 0.040 1.57 0.0 −2.52 −3.25 1.29
ContactOnly 0.041 1.84 0.0 −2.57 −3.46 1.36
GeomOnly 0.041 1.84 0.0 −3.29 −4.02 1.58
EquiDexFlow (ours) 0.042 1.99 0.0 −0.96 −1.18 0.46

Equivariance by Rotation Angle

Angle Bin $\Delta R_w$ (°) $\Delta x_w$ (mm) $\max \Delta q_h$ (°)
30–60°0.0370.0020.00
60–90°0.0370.0020.00
90–120°0.0360.0010.00
120–150°0.0360.0020.00
150–180°0.0360.0020.00

Inference-Time Ablations

Configuration Top-1 ↑ Top-3 ↑ FVR (%) ↓
Stochastic−0.95−1.220.0
Deterministic ($z{=}0$)−0.97−1.240.0
No cone projection−5.92−5.95100

Disabling cone projection degrades the composite score by over 6× and drives friction violations to 100%, confirming the 0% rate is a structural guarantee.

Cross-Embodiment Transfer to the LEAP Hand

Allegro grasps retargeted to the LEAP Hand via inverse kinematics, validated in simulation, and executed on physical hardware.

Retargeted Grasps

LEAP retargeting grasp gallery 2x6

Front view. Retargeted from Allegro to LEAP via contact-point IK. Mean fingertip residual 14 mm. Objects, left to right and top to bottom: box primitive, SNS cup, sugar box, tennis ball, tomato soup can, apple, EGAD G6, cube, softball, EGAD E4, EGAD A4, and EGAD F6.

Simulated Shake-Test Robustness

Gravity-off robustness check (GenDexGrasp/GAGrasp protocol in Drake): a ±xyz inertial load is applied to the object along all six axes, and a grasp passes if the object drifts under 2 cm in every direction. Both objects pass at the canonical pose and its 120° co-rotation, and the held object stays nearly stationary throughout.

Mustard Bottle, 0° · 3.2 mm max drift

Mustard Bottle, 120° · 3.4 mm max drift

Potted Meat Can, 0° · 0.9 mm max drift

Potted Meat Can, 120° · 9.2 mm max drift

Hardware Execution

Equivariant grasps executed on a physical LEAP Hand mounted on a 6-DoF ZArm. All six hardware test objects complete open-loop pick-and-hold trials, with every asymmetric object succeeding at both 0° and 120°.

 Click any clip to enlarge

Box Primitive
120°
Potted Meat Can
120°
Mustard Bottle
120°
Cube
120°
Symmetric Objects (rotation-invariant)
Cylinder
Tennis Ball

Each pair is the same generated grasp co-transformed by equivariance from 0° to 120°, not re-planned.

Retargeting Reachability (IK)

Object IK Tip (mm) 120°
Cube14.3YesYes
Box14.4YesYes
Meat14.5YesYes
Mustard14.3YesYes
Reachable4/44/4

BibTeX

@misc{equidexflow2026,
  author        = {Clinton Enwerem and John S. Baras and Calin Belta},
  title         = {{EquiDexFlow}: Contact-Grounded {SE}(3)-Equivariant Dexterous Grasp Generative Flows},
  year          = {2026},
  archivePrefix = {arXiv},
  primaryClass  = {cs.RO},
}