SARAS endoscopic vision challenge for surgeon action detection


This challenge is part of the Medical Imaging with Deep Learning (MIDL, 2020) conference. The  MIDL conference will be held from 6 to 8 July 2020 in Montreal, Canada. 

Challenge description

Minimally Invasive Surgery (MIS) is a very sensitive medical procedure, whose success depends on the competence of the human surgeons and the degree of effectiveness of their coordination. The SARAS (Smart Autonomous Robotic Assistant Surgeon) EU consortium,, is working towards replacing the assistant surgeon in MIS with two assistive robotic arms. To accomplish that, an artificial intelligence based system is required which not only can understand the complete surgical scene but also detect the actions being performed by the main surgeon. This information can later be used infer the response required from the autonomous assistant surgeon. The correct detection of surgeon action and its localization is a critical task to design the trajectories for the motion of robotic arms. This challenge has recorded four sessions of complete prostatectomy procedure performed by expert surgeons on real patients with prostate cancer. Later, expert AI and medical professions annotated these complete surgical procedures for the actions. Multiple action instances might be present at any point during the procedure (as, e.g., the right arm and the left arm of the da Vinci robot operated by the main surgeon might perform different coordinated actions). Hence, each frame is labeled for multiple actions and these actions can have overlapping bounding boxes. 

The bounding boxes, in the training data, are selected to cover both the ‘tool performing the action’ and the ‘organ under the operation’. A set of 21 actions is selected for the challenge after the consultation with the expert medical professionals. From a technical point of view, then, a suitable online surgeon action detection system must be able to: (1) locate and classify multiple action instances in real time; (2) connect the detection associated bounding boxes. 

To the best of our knowledge, this challenge presents the first benchmark dataset for action detection in the surgical domain, and paves the way for the introduction, for the first time, of partial/full autonomy in surgical robotics. Within computer vision, other datasets for action detection exist, but are of limited size.


The objective of this challenge is to provide a unique and benchmark dataset for development and testing of the action detection algorithms in the field of medical computer vision. This challenge will help in evaluating different types of computer vision systems for this specific task. It will also lay the foundation for more robust algorithms which will be used in future surgical systems to accomplish the tasks; like, autonomous assistant surgeon, surgeon feedback systems, surgical anomaly detection etc.


The task for this challenge is to detect the actions performed by the main surgeon or the assistant surgeon in the current frame. There are 21 action classes in the challenge dataset.

Metrics: The task will use mAP the evaluation metric which is a standard metric in all of the detection tasks. As this is first of its kind task and correct detection of action in the surgical environment is difficult, we will use IOU threshold value of 0.25 for the evaluation of the submission.

Important dates

Challenge opens for registration: 1 March 2020

Training/Validation data release: 1 April 2020

Test data release: 10 June 2020

Result submission deadline: 25 June 2020

Final result announcement: 30 June 2020