Set up an Apache Spark Cluster with resources that can be shared among partners
|Status:||Rejected||Start date:||Apr 10, 2017|
|Assignee:||Kostas Kakaletris||% Done:|
We should try to deploy an Apache Spark Cluster that can be used by the rest of the partners.
We should consider:
* available resources
#1 Updated by Kostas Kakaletris almost 2 years ago
Right now our clusters are not using any authentication or authorization. Are blocked by firewall and accessed on from specific machines for testing.
We will investigate if it would be possible to add authentication and authorization in the several levels that consist a cluster (mesos, hdfs,spark, zookeeper, etc).
Additionally we will check if adding another layer (marathon, yarn, other) could help managing job submission better.
#3 Updated by Kostas Kakaletris 11 months ago
- % Done changed from 0 to 60
- Status changed from In Progress to Rejected
- Infrastructure Development added
The test cluster was set up but there were security issues (same cluster user for spark on all nodes executing jobs and having access to them from several endpoints). Adding an additional management layer (marathon, yarn, etc) was not compatible with the existing implementation of geonalytics service (needed changes to the way that is committing and monitoring jobs) which would have as result many more resources for development and testing on this environment.
For those reasons this test cluster for every service will no be able to be done.