High spatial resolution phenological modelling at continental scales
Understanding phenological variability
Emma is a project to create a platform for development of application for Spark and DockerSwarm clusters.
Emma is an open-source project to create a platform for development of applications for Spark and DockerSwarm clusters. The platform runs on an infra-structure composed by virtual machines that must be reachable by SSH. The machines are either cloud virtual machines or Vagrant machines. The latter tool allows the platform to be simulated on a local machine, i.e. in a local development environment.
Once the machines are prepared, the servers are provisioned using Ansible, an automation tool for IT infra-structure. Ansible playbooks are used to create a storage layer, processing layer, and JupyterHub services. The storage layer offers two flavors of storage, file-base by GlusterFS and Hadoop Distributed File System (HDFS), and object-based using Minio. The processing layer has a Apache Spark cluster and a Docker Swarm sharing the storage instances.
With Ansible we are able to deploy a platform with the same features at different locations, such as local cluster, national infra-structure, or even a commercial cloud provider. Such a feature allows us to have tool-provenance for easily repeatability of experiments between scientists.