Workshop on

Deep Learning and Computer Vision for Drone Imaging and Cinematography - 2019

30th-31st of May, 2019

SLTC - Padukka, Sri Lanka


The course provides an overview of the various computer vision and deep learning problems encountered in drone imaging and cinematography, which is one of the main application areas of drone technologies. The same machine learning and computer vision problems do occur in other drone applications as well, e.g., for land/marine surveillance, search&rescue, building and machine inspection.


Computer Vision & Machine Learning

Provide a solid background on the necessary topics of computer vision (Image acquisition, camera geometry, Stereo and Multiview imaging, Motion estimation) and machine learning (Introduction to neural networks, Perceptron, backpropagation, Deep neural networks, Convolutional NNs). It also provides an Introduction to multiple drone imaging and the relevant architectures.

Multiple Drone Cinematography

Provide in-depth views of the various topics encountered in multiple drone cinematography, ranging from the definitions of drone audiovisual shooting types (Drone cinematography) to drone mission planning and control, drone localization and mapping, target detection and tracking, HCI issues, imaging for safety (crowd detection, emergency landing, map updating), drone mission simulations, privacy protection, ethics and regulatory issues.


  • 30th of May, 2019

  • Introduction to Computer Vision

    A detailed introduction in computer vision will be made, mainly focusing on 3D data types as well as color theory. The basics of color theory will be presented, followed by the several color coordinate systems, and finally image and video content analysis and sampling will be thoroughly described.

  • Image acquisition, camera geometry

    After a brief introduction to image acquisition and light reflection, the building blocks of modern cameras will be surveyed, along with geometric camera modeling. Several camera models, like the pinhole and the weak-perspective camera model, will subsequently be presented, with the most commonly used camera calibration techniques closing the lecture.

  • Stereo and Multiview imaging

    The workings of stereoscopic and multiview imaging will be explored in depth, focusing mainly on stereoscopic vision, geometry and camera technologies. Subsequently, the main methods of 3D scene reconstruction from stereoscopic video will be described, along with the basics of multiview imaging.

  • Motion estimation

    Motion estimation principals will be analyzed. Initiating form 2D and 3D motion models, displacement estimation as well as quality metrics for motion estimation will subsequently be detailed. One of the basic motion estimation techniques, namely block matching, will also be presented, along with three alternative, faster methods. Phase correlation will be described, next, followed by optical flow equation methods and a brief introduction to object detection and tracking.

  • Introduction to neural networks. Perceptron, backpropagation.

    This lecture will cover the basic concepts of Neural Networks: Biological neural models, Perceptron, Multi-layer perceptron, Classification, Regression, Design of neural networks, Training neural networks, Deployment of neural networks, Activation functions, Loss types, Error Backpropagation, Regularization, Evaluation, Generalization.

  • Deep neural networks. Convolutional NNs.

    From multilayer perceptrons to deep architectures. Fully connected layers. Convolutional layers. Tensors and mathematical formulations. Pooling. Training convolutional NNs. Initialization. Data augmentation. Batch Normalization, Dropout. Deployment on embedded systems, Lightweight deep learning.

  • Introduction to multiple drone imaging

    This lecture will provide the general context for this new and emerging topic, presenting the aims of drone vision for cinematography and media production, the challenges (especially from an image/video analysis and computer vision point of view), the important issues to be tackled, the limitations imposed by drone hardware, regulations and safety considerations etc.

  • Multidrone system architecture and communications

    In this lecture, the overall system architecture for using multiple drones in cinematography will be presented. The hardware and software issues both for the on-drone system and for the ground station will be detailed. Communication issues will be reviewed. Finally, ROS implementation and integration issues will be reviewed.

  • Parallel GPU and multicore CPU architectures. GPU programming (1hour)

    In this lecture, various GPU and multicore CPU architectures are reviewed, used notably in GPU cards and in embedded boards, like NVIDIA TX1, TX2 and Xavier. The principles of the parallelization of various algorithms on GPU and multicore CPU architectures is reviewed. Then the essentials of GPU programming are presented, Finally, special attention is paid on a) fast and parallel linear algebra operations (e.g., using cuBLAS) b) convolution and FFT algorithms, as all of them have particular importance in deep machine learning (CNNs) and in real-time computer vision.

  • 31st of May, 2019

  • Drone cinematography

    The main building blocks of drone cinematography will be surveyed, especially focusing on UAV shot types (framing and camera motion types). Additionally, the state-of-the-art on autonomous capture of cinematic UAV footage will be described, with an emphasis on relevant algorithms, commercial products and tools-of-the-trade.

  • Drone mission planning and control

    n this lecture, first the audiovisual shooting mission is formally defined. The introduced audiovisual shooting definitions are encoded in mission planning commands, i.e., navigation and shooting action vocabulary, and their corresponding parameters. The drone mission commands, as well as the hardware/software architecture required for manual/autonomous mission execution are described. The software infrastructure includes the planning modules, that assign, monitor and schedule different behaviours/tasks to the drone swarm team according to director and environmental requirements, and the control modules, which execute the planning mission by translating high-level commands to into desired drone+camera configurations, producing commands for autopilot, camera and gimbal of the drone swarm.

  • Drone HCI issues

    Drone systems used in cinematography should typically have interfaces to a) a director that is responsible for the artistic aspects and b) flight supervisor who is responsible for the safety aspects. This lecture is focussed on the design and development of the related GUIs. The director’s dashboard has particular emphasis on the overall requirements that a complex multi-drone system has to comply with in a professional media production environment.

  • Mapping and localization

    The lecture includes the essential knowledge about how we obtain/get 2D and/or 3D maps that robots/drones need, taking measurements that allow them to perceive their environment with appropriate sensors. Semantic mapping includes how to add semantic annotations to the maps such as POIs, roads and landing sites. The section Localization is exploited to find the 3D drone or target location based on sensors using specifically Simultaneous mapping and localization (SLAM). Finally, the Fusion in drone localization section describes the improving accuracy on localization and mapping in Multidrone to exploit the synergies between different sensors.

  • Deep learning for target detection

    Target detection using deep neural networks, Detection as search and classification task, Detection as classification and regression task, Modern architectures for target detection, RCNN, Faster RCNN, YOLO, SSD, lightweight architectures, Data augmentation, Deployment, Evaluation and benchmarking.

  • 2D Target tracking and 3D target localization

    Target tracking is a crucial component of many computer vision systems. Many approaches regarding face/object detection and tracking in videos have been proposed. In this lecture, video tracking methods using correlation filters or convolutional neural networks are presented, focusing on video trackers that are capable of achieving real time performance for long-term tracking on a UAV platform.

  • Imaging for drone safety

    Safety issues as well as adherence to safety-related regulations are of ultimate importance for drones. Image, video and 3D data analysis techniques can be used as an aid for the automatic or semi-automatic handling of such issues. Examples include crowd detection in images or videos captured by drones (so as to avoid flying over or near crowds), potential emergency landing site detection using 3D terrain data or images, annotating or updating the safety-related annotations of a map (e.g. no-fly zones) etc. This lecture will try to provide an overview of the related issues as well as the solutions provided by the current state of the art.

  • Drone mission simulations

    Drone Simulations are mainly used in Multidrone for three tasks. The first task is to use simulations in order to characterize the optimal drone parameters for specific scenarios and shot types in term of viewing experience. Another task for carrying out drone simulations is to generate UAV large-scale training and test data. For these two cases Unreal Engine 4 had been chosen because of its the high-level graphics capabilities. Finally, simulations are very important in order to test the control and behavior of the overall system (drones, supervision station). In this case, Gazebo was chosen as the most appropriate environment.

  • Privacy protection, ethics, safety and regulatory issues

    In drone cinematography, privacy and data protection issues arise in many cases. For example, the protection of personal data must be ensured in the acquired video and/or images. All types of data stored in drones and ground infrastructure or transmitted over the air must be protected. Also, all data to be distributed must be anonymous in order to make it impossible to link acquired data to real people and so numerous of face detection obfuscation, face de-identification, human body de-identification and car plate de-identification methods must be examined. Furthermore, flight safety is an important issue that is tackled in reference to the European and national regulatory issues.

Resource Persons

Prof. Ioannis Pitas (IEEE fellow, IEEE Distinguished Lecturer, EURASIP fellow) received the Diploma and PhD degree in Electrical Engineering, both from the Aristotle University of Thessaloniki, Greece. Since 1994, he has been a Professor at the Department of Informatics of the same University. He served as a Visiting Professor at several Universities. His current interests are in the areas of image/video processing, machine learning, computer vision, intelligent digital media, human centered interfaces, affective computing, 3D imaging and biomedical imaging. He has published over 860 papers, contributed in 44 books in his areas of interest and edited or (co-)authored another 11 books. He has also been member of the program committee of many scientific conferences and workshops. In the past he served as Associate Editor or co-Editor of 9 international journals and General or Technical Chair of 4 international conferences. He participated in 69 R&D projects, primarily funded by the European Union and is/was principal investigator/researcher in 41 such projects. He has 28000+ citations to his work and h-index 81+ (Google Scholar).

Prof. Pitas leads the big European H2020 R&D project MULTIDRONE: . He is chair of the Autonomous Systems initiative