Autonomous surgical needle path planning using Q-Learning and DQN in a PyBullet physics simulation with a Franka Panda robotic arm.
University of Maryland, College Park | Yaxita Amin & K. Manasanjani
This project presents a reinforcement learning approach for autonomous path planning in robotic brain surgery simulation. An AI agent learns to navigate a surgical needle through complex 3D brain vasculature β avoiding blood vessels while reaching tumor targets β using Q-Learning and Deep Q-Network (DQN) algorithms.
Key highlights:
- 98β100% training success rate with tabular Q-Learning over 3,000 episodes
- 80β100% generalization to unseen tumor targets
- 0% vessel collision rate vs. 40% for straight-line approaches
- β₯4mm safety margin guaranteed from all blood vessels
- Full integration with a Franka Panda robotic arm via Inverse Kinematics
For more in detail information check out this document - (https://drive.google.com/file/d/1P-ZHq9-bp2RFAncW7hEp7lvw01aQN47K/view?usp=drive_link)
WITHOUT ROBOT ARM
WITH ROBOT ARM AND NEEDLE
- Python 3.10+
- Ubuntu (recommended) or macOS
pybullet
trimesh
scipy
numpy
torch
matplotlibpython final13.pypython brain_surgery_dqn.py| Episode | Q-Learning | DQN | Epsilon |
|---|---|---|---|
| 500 | 84.8% | 95.2% | ~0.22 |
| 1000 | 98.2% | 98.0% | ~0.05 |
| 2000 | 99.6% | 97.2% | ~0.01 |
| 3000 | 99.6% | 98.8% | 0.01 |
| Metric | Q-Learning | DQN |
|---|---|---|
| Training Time | ~2 min | ~15 min |
| Training Success | 94.6% | 89.2% |
| Test Success | 80% | 75% |
| Path Quality | 1.1Γ | 1.15Γ |
| Memory Usage | 4,003 states | 128KB model |
| Inference Time | <1ms | ~5ms |
| Metric | Dijkstra/A* | RRT | Potential Fields | Q-Learning |
|---|---|---|---|---|
| Computation | 30β35s | 5β10s | 2β5s | ~2 min train |
| Path Quality | 1.0β1.3Γ | 1.4β1.6Γ | 1.2β1.4Γ | ~1.1Γ |
| Success Rate | 60β65% | 70% | 65β70% | 98β100% |
| Vessel Safety | Occasional | Generally | >4mm* | Always β₯4mm |
| Reproducibility | High | Low | Medium | High |
| Learning | None | None | None | Yes |
3D voxel grid position (2mm resolution) relative to tumor location.
6 discrete moves: {+X, βX, +Y, βY, +Z, βZ} (2mm per step)
R(s, a, s') = R_goal + R_collision + R_timeout + R_shaping
R_goal = +100β reaching the tumorR_collision = -100β vessel proximity < 4mmR_timeout = -50β exceeding 100 stepsR_shaping = Ξdβ distance reduction to target
| Parameter | Value |
|---|---|
| Learning rate Ξ± | 0.15 |
| Discount factor Ξ³ | 0.95 |
| Epsilon (start β end) | 1.0 β 0.01 |
| Epsilon decay | 0.997 |
| Max steps/episode | 100 |
| Training episodes | 3,000 |
- Robot: Franka Emika Panda (7 DOF, simulated in PyBullet)
- IK Solver: PyBullet damped least-squares with joint limit handling
- IK success rate: 98%
- End-effector accuracy: <0.5mm positioning error
- Average execution time: 8.5 seconds per path
brain_surgery_docker/
βββ data/ # STL models & brain vasculature data (~7.5MB)
βββ final13.py # Q-Learning main script (path planning + simulation)
βββ brain_surgery_dqn.py # Deep Q-Network implementation
βββ Dockerfile # Docker container configuration
βββ requirements.txt # Python dependencies
βββ README.md
A Dockerfile is included for containerized, reproducible execution.
docker build -t brain-surgery-rl .docker run --rm brain-surgery-rlNote: PyBullet GUI visualization requires passing through a display. On Linux, use:
docker run --rm -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix brain-surgery-rl
- Static environment (no tissue deformation modeling)
- Discrete action space limits path smoothness
- Single-needle, straight-segment paths only
- Simulated environment differs from real surgical conditions
- Extend to continuous state/action spaces using actor-critic methods
- Incorporate curved needle steering for challenging targets
- Multi-objective optimization (path length, clearance, energy)
- Dynamic replanning with real-time MRI/CT intraoperative feedback
- Sim-to-real transfer for physical Franka Panda deployment
- Training on diverse anatomical models for patient-agnostic planning
We thank Dr. Jerry Wu for guidance throughout this project, and teaching assistants Siddhant and Aswin for valuable feedback during development.
This project is intended for educational and pre-operative planning research purposes only. Not for clinical use.


