SARAS endoscopic vision challenge for surgeon action detection


This challenge was part of the Medical Imaging with Deep Learning (MIDL, 2020) conference. The event has been concluded but the dataset and baseline models can be downloaded from the Download section.


[06-04-2021]    The next challenge event in continuation of this challenge series is organized under MICCAI 2021 conference. The challenge for year 2021 is named SARAS-MESAD  and is open for registration.

[17-08-2020]    The SARAS-ESAD challenge has been featured as Best of MIDL conference in Computer Vision News Magazine. Follow the link to read more about it: Link

[15-07-2020]    Recording of the virtual challenge event is now available online on Youtube: Watch

[06-06-2020]    Please register in advance at the following link to attend the virtual challenge event: Registration 

After registering, you will receive a confirmation email containing information about joining the meeting. The event will contain talks from field experts and selected participants.

[30-06-2020]    Selected participants have been invited for the talk at challenge event in MIDL 2020.  Please check your email ids registered with The event will be held on 9th July 2020 from 9:00 to 13:00 Montreal time. 

[15-06-2020]    The baseline paper with details of baseline model and ESAD dataset is now accessible at:

[11-06-2020]    Evaluation phase starts and the leaderboard will be active during this period. Participants are allowed to make one submission per day. The submission file should be in the described format. Otherwise, results will not be evaluated as it is an automatic evaluation system.

[10-06-2020]    update: Test dataset is released and can be downloaded from the Download section of the website.

[29-04-2020]    update: Challenge will only accept submissions for the test data.  Participants are provided with the target label files for validation set and they should themself evaluate performance of their models. Hence, please don't submit your results on the validation data.

[14-04-2020]    Baseline model available at Github for reproducing the results.

[01-04-2020]    Training and validation dataset released.

Challenge description

Minimally Invasive Surgery (MIS) is a very sensitive medical procedure, whose success depends on the competence of the human surgeons and the degree of effectiveness of their coordination. The SARAS (Smart Autonomous Robotic Assistant Surgeon) EU consortium,, is working towards replacing the assistant surgeon in MIS with two assistive robotic arms. To accomplish that, an artificial intelligence based system is required which not only can understand the complete surgical scene but also detect the actions being performed by the main surgeon. This information can later be used infer the response required from the autonomous assistant surgeon. The correct detection of surgeon action and its localization is a critical task to design the trajectories for the motion of robotic arms. This challenge has recorded four sessions of complete prostatectomy procedure performed by expert surgeons on real patients with prostate cancer. Later, expert AI and medical professions annotated these complete surgical procedures for the actions. Multiple action instances might be present at any point during the procedure (as, e.g., the right arm and the left arm of the da Vinci robot operated by the main surgeon might perform different coordinated actions). Hence, each frame is labeled for multiple actions and these actions can have overlapping bounding boxes. 

The bounding boxes, in the training data, are selected to cover both the ‘tool performing the action’ and the ‘organ under the operation’. A set of 21 actions is selected for the challenge after the consultation with the expert medical professionals. From a technical point of view, then, a suitable online surgeon action detection system must be able to: (1) locate and classify multiple action instances in real time; (2) connect the detection associated bounding boxes. 

To the best of our knowledge, this challenge presents the first benchmark dataset for action detection in the surgical domain, and paves the way for the introduction, for the first time, of partial/full autonomy in surgical robotics. Within computer vision, other datasets for action detection exist, but are of limited size.


The objective of this challenge is to provide a unique and benchmark dataset for development and testing of the action detection algorithms in the field of medical computer vision. This challenge will help in evaluating different types of computer vision systems for this specific task. It will also lay the foundation for more robust algorithms which will be used in future surgical systems to accomplish the tasks; like, autonomous assistant surgeon, surgeon feedback systems, surgical anomaly detection etc.


The task for this challenge is to detect the actions performed by the main surgeon or the assistant surgeon in the current frame. There are 21 action classes in the challenge dataset.

Evaluation Metrics: The task will use mAP the evaluation metric which is a standard metric in all of the detection tasks. As this is the first of its kind task and correct detection of action in the surgical environment is difficult, we will be used a bit relaxed metric for the evaluation. The evaluation will be performed at three different levels of IOU: 0.1, 0.3 and 0.5. The final score will be mean of all the Average Precision values.

Important dates

Challenge opens for registration: 1 March 2020

Training/Validation data release: 1 April 2020

Test data release: 10 June 2020

Start of evaluation phase for test data: 11 June 2020

End of evaluation phase for test data: 25 June 2020

Final result announcement: 30 June 2020

Virtual challenge event: 09July 2020

Online challenge event

There will be a virtual event for SARAS-ESAD challenge on July 09, 2020. Please register on the following link to attend: Link.

Schedule of the event is below:

Time slot (London time)

Name of presenter


2:00 - 2:20

Dr Vivek Singh

Oxford Brookes University

Opening remark and information on the challenge

2:20 - 2:40

Prof. Riccardo Muradore

University of Verona

Research in SARAS and its scope

2:40 - 3:10

Prof. Juan Pablo Wachs

Purdue University

Keynote: challenges of computer vision in medical robotics

3:10 - 3:35

Xi Xiang

Perceptual Computing Research Center,

Harbin Institute of Technology University


3:35 - 4:00

Shang Zhao

George Washington University

Enhancing Surgeon Action Detection with Split Attention

4:00 - 4:15



4:15 - 4:40

Liangzhi Li

Osaka University

Data and Backbone Engineering for Action Detection on Surgical Videos

4:40 - 5:05

Ruijia Wu

Beijing University of Posts and Telecommunications, Beijing

The Application of RetinaNet with ResNeXt in SARAS-ESAD Event

5:05 - 5:30

Prof. Fabio Cuzzolin

Oxford Brookes University

Closing remarks


If you use this dataset in your research or have participated in the associated challenges, please cite the following publication:

Bawa, Vivek Singh, et al. "The SARAS Endoscopic Surgeon Action Detection (ESAD) dataset: Challenges and methods." arXiv preprint arXiv:2104.03178 (2021).

Bawa, Vivek Singh, et al. "ESAD: Endoscopic Surgeon Action Detection Dataset." arXiv preprint arXiv:2006.07164 (2020).