Task #8047

Set up an Apache Spark Cluster with resources that can be shared among partners

Added by Dimitris Katris over 2 years ago. Updated over 1 year ago.

Status:RejectedStart date:Apr 10, 2017
Priority:NormalDue date:
Assignee:Kostas Kakaletris% Done:




We should try to deploy an Apache Spark Cluster that can be used by the rest of the partners.

We should consider:
* authentication
* authorization
* available resources


#1 Updated by Kostas Kakaletris over 2 years ago

Right now our clusters are not using any authentication or authorization. Are blocked by firewall and accessed on from specific machines for testing.

We will investigate if it would be possible to add authentication and authorization in the several levels that consist a cluster (mesos, hdfs,spark, zookeeper, etc).

Additionally we will check if adding another layer (marathon, yarn, other) could help managing job submission better.

#2 Updated by Kostas Kakaletris over 2 years ago

  • Status changed from New to In Progress

#3 Updated by Kostas Kakaletris over 1 year ago

  • % Done changed from 0 to 60
  • Status changed from In Progress to Rejected
  • Infrastructure Development added

The test cluster was set up but there were security issues (same cluster user for spark on all nodes executing jobs and having access to them from several endpoints). Adding an additional management layer (marathon, yarn, etc) was not compatible with the existing implementation of geonalytics service (needed changes to the way that is committing and monitoring jobs) which would have as result many more resources for development and testing on this environment.
For those reasons this test cluster for every service will no be able to be done.

Also available in: Atom PDF