Emma is a project to create a platform for development of application for Spark and DockerSwarm clusters.

727 commits | Last update: October 17, 2019

Cite this software

Choose a version:
[[ releases.length > 0 ? releases[selectedIndex].doi : conceptDOI ]]
Copy to clipboard
Choose a reference manager file format:
Download file

What Emma can do for you

  • It is designed for users deploying Spark and DockerSwarm clusters in a cloud infra-structure.
  • It helps the user to prepare cloud virtual machines
  • The provision of machines is done with Ansible, an automation tool for IT infra-structure.
  • It provides command line access to the users to install the required libraries and systems, configure them, start/stop services, add new modules for Jupyter notebooks, and even update the firewall

Emma is an open-source project to create a platform for development of applications for Spark and DockerSwarm clusters. The platform runs on an infra-structure composed by virtual machines that must be reachable by SSH. The machines are either cloud virtual machines or Vagrant machines. The latter tool allows the platform to be simulated on a local machine, i.e. in a local development environment.

Once the machines are prepared, the servers are provisioned using Ansible, an automation tool for IT infra-structure. Ansible playbooks are used to create a storage layer, processing layer, and JupyterHub services. The storage layer offers two flavors of storage, file-base by GlusterFS and Hadoop Distributed File System (HDFS), and object-based using Minio. The processing layer has a Apache Spark cluster and a Docker Swarm sharing the storage instances.

With Ansible we are able to deploy a platform with the same features at different locations, such as local cluster, national infra-structure, or even a commercial cloud provider. Such a feature allows us to have tool-provenance for easily repeatability of experiments between scientists.

Read more
  • Big data
Programming Language
  • YAML
  • Apache-2.0
Source code


  • Niels Drost
    Netherlands eScience Center
  • Stefan Verhoeven
    Netherlands eScience Center
  • Jisk Attema
    Netherlands eScience Center
  • Romulo Gonçalves
    Netherlands eScience Center
Contact person
Niels Drost
Netherlands eScience Center

Information for page maintainers

OAI-PMH metadata:
citation metadata:
no valid CITATION.cff found in any release.