Workshop on Evaluation and Benchmarking of Human-Centered AI Systems

Date: September 20th, 2019
Location: Milton Keynes, UK
Time: 13:30 - 17:00

Motivation

There is a push in Europe toward the development of AI systems that put the human and their values at the center of their design, development and operation. Such human-centred AI systems should be able to operate in the physcal world of humans, to collaborate with humans, and to explain their behaviour to humans. In order to allow their seamless integration with humans, it is paramount that these systems are properly evaluated.

Evaluating the performance and properties of AI systems is an important and open problem. From a scientific perspective, this is the problem of finding performance metrics for cognitive systems. From an industrial perspective, this means to be able to quantify the added value of using an AI solution. When dealing with human-centered AI systems, the evaluation problem is further complicated by the fact that such systems often integrate different AI paradigms and methods, and they interact with the humans and with the physical world.

Objectives

This workshop is devoted to the discussion of methods and tools to evaluate human-centered AI systems. The topics of the workshop include, but are not limited to:

Keywords: HRI Testing, Performance Evaluation, Benchmarking, Empirical Evaluation, AI Measures, Performance Metrics.

Program

Schedule

13:30 - Introduction, including Gerhard Kraetzschmar commemoration, Daniele Nardi and Alessandro Saffiotti

Session 1: Fundamental questions in evaluating human-centered intelligent systems

13:50 - Anne Bajart (Deputy Head of Unit, EC, Luxembourg) -- Title: Perspectives on Benchmarking for AI and Robotics
14:10 - Matteo Matteucci (Politecnico di Milano, Italy) -- Title: Intelligent robots are at the door ... but can they open it? (abstract)

Performance evaluation of autonomous robotic systems is still in its infancy. Indeed, evaluating complex systems in uncertain and dynamic contexts poses several challenges, first of all, the quest for methodological instruments to understand the marginal contribution of the different components interacting to the successful completion of a task. In recent years, several EU projects and EU funded robot competitions, such as RAWSEEDS, Eurobench, RoCKin, euRathlon, the European Robotics League, and SciRoc, have been proposed as a mean for evaluating physical intelligent artifacts in real-world scenarios. In this talk I will discuss the evolution of such projects/competitions in the light of a more objective and informative evaluation of autonomous robots.
14:30 - Luciano Serafini (Fondazione Bruno Kessler, Italy) -- Title: Foundational Aspects in Human-Centered AI (abstract)

The main objective of this short presentation is to propose an integrated picture of the model that emerges by integrating the di erent aspects of AI considered in AI4AI (namely: explainable, verifiable, collaborative, integrative, and physical AI). This is by no means a definition of Human Centered artificial intelligence, it is only a schema that should help people to see HC AI as a holistic research field in which different aspects are interconnected and interdependent.
14:50 - Paul Lukowicz (DFK - German Research Center for Artificial Intelligence, Germany) -- Title: Hard Evaluation criteria for "soft" goals: How can we quantify "human centricity" of AI systems?

15:15 - Coffee break

Session 2: Benchmarking human-robot systems

15:45 - Sigmund Akselsen (Telenor, Norway) -- Title: On the need for explanations, visualisations and measurements in data-driven air quality monitoring and forecasting
16:00 - Rachid Alami (Laboratory for Analysis and Architecture of Systems - LAAS, France) -- Title: On the pertinence of robot decisions in HRI contexts (abstract)

There is a need for principled methods for the assessment of the models, designs and algorithms elaborated in Human-Robot Interaction studies. In this work, we aim to adapt some principles from other research fields that might be used in HRI user studies. More precisely we discuss some frequent issues concerning recruited users, evaluation methods and replication of the studies, and how some methodological practices could circumvent them. We will also discuss more generally HRI studies, claiming that they need methods and assessment techniques specific to their particularities. We will finally discuss more particularly the evaluation of the robot decisional abilities in HRI context and focus more particularly on one key question: the evaluation of the pertinence of robot decisions and actions.
16:20 - Mary Ellen Foster (Univ. of Glasgow, UK) -- Title: Face-to-face conversation with socially intelligent robots (abstract)

When humans talk to each other face-to-face, they use their voices, faces, and bodies together in a rich, multimodal, continuous, interactive process. For a robot to participate fully in this sort of natural, face-to-face conversation in the real world, it must also be able not only to understand the social signals of its human partners, but also to produce appropriate signals in response. I will present recent research in this area, and will also discuss the emerging ethical implications of real-world deployment of socially intelligent robots.

Short bio: Dr Mary Ellen Foster is a Senior Lecturer in the School of Computing Science at the University of Glasgow. Her primary research interests are human-robot interaction, social robotics, and embodied conversational agents. She is the coordinator of the MuMMER project, a European Horizon 2020 project in the area of socially aware human-robot interaction. She obtained her PhD from the University of Edinburgh in 2007, and has previously worked at the Technical University of Munich and Heriot-Watt University. Her homepage is http://maryellenfoster.uk/
16:40 - Alan Winfield (Univ. of the West of England, UK) -- Title: What could possibly go wrong? Why we need Robot Accident Investigation (abstract)

In this talk I will outline Developing Responsible Robots for the Digital Economy: a new 5 year project in which we will be staging mock human-robot accidents in order to deeply explore the problem of robot accident investigation and develop both technical (i.e. data logging and explainer systems) and process solutions (i.e. frameworks for how to responsibly conduct such investigations). We will explore 3 scenarios: assisted living (care) robots; robot toys and Autonomous Vehicles - with human volunteers role playing as the subject of, witnesses to, and investigators of the accident. We believe this will be the first research project in the world to fully and systematically study this important aspect of real world robotics.

Short bio: Alan Winfield is Professor of Robot Ethics at the Univ. of the West of England, Bristol, and visiting Professor at the Univ. of York. Winfield co-founded the Bristol Robotics Laboratory and his research is focussed on the science, engineering and ethics of intelligent robots. Winfield is an advocate for robot ethics; he sits on the executive of the IEEE Standards Association Global Initiative on Ethics of Autonomous and Intelligent Systems, and chairs Working Group P7001, drafting a new IEEE standard on Transparency of Autonomous Systems. He has published over 240 works including Robotics: A Very Short Introduction (Oxford Univ. Press, 2012).

17:00 - End of workshop and social gathering

Program Committee

Organizers

The workshop is co-organized by SciRoc.eu (Smart CIty RObotic Challenge) and by AI4EU.eu (Building the European AI on-demand Platform), both funded by the EC under the H2020 programme.

Venue

The workshop will take place at The Connected Places Catapult, 170 Midsummer Boulevard, Milton Keynes, UK, MK9 1BP. The location is about 10-15 minutes walk from the MK shopping centre, where the SciRoc competition takes place.

Call for Contributions

Interested participants must submit an extended abstract reporting work relevant to the workshop's themes. Reports of work in progress and reports of recently published work are acceptable. Abstracts should be between 500 and 1500 words in length, in free format, and must include the name, affiliation and contact information of all authors. Abstracts must be submitted as PDF files via easychair at the following URL:

Contributions will be selected on the basis of relevance, quality and expected impact. Accepted contributions will be published on the Symposium website.