Task #1946

Project WP #654: WP6 - Supporting Blue Economy: VREs Development [Months: 1-30]

Project Task #656: T6.2 Strategic Investment analysis and Scientific Planning/Alerting VRE [Months: 1-30]

Project Activity #1633: Blue Economy VRE#2 Software Implementation, Integration and Deployment (Stage 2)

Task #6101: Geospatial Analytics & Alerting Platform Implementation (Stage 2)

Task #7991: Distributed execution of investment opportunity evaluation algorithms

Analyze Apache Spark utilization options

Added by Gerasimos Farantatos over 3 years ago. Updated about 1 year ago.

Status:ClosedStart date:Feb 15, 2016
Priority:NormalDue date:May 31, 2017
Assignee:Dimitris Katris% Done:

100%

Category:-
Sprint:WP06
Infrastructure:Development
Milestones:
Duration: 338

Description

  1. Without data staging
  2. With geospatial data staging

History

#1 Updated by Leonardo Candela over 3 years ago

It will be great if this activity will also lead to an integration of Apache Spark to the Statistical Manager domain. Is there any discussion with respect to this?

#2 Updated by George Kakaletris about 3 years ago

  • % Done changed from 0 to 10
  • Status changed from New to In Progress

#3 Updated by George Kakaletris over 2 years ago

  • % Done changed from 10 to 60
  • Assignee changed from George Kakaletris to Dimitris Katris

#4 Updated by Dimitris Katris about 2 years ago

  • Parent task changed from #1623 to #6101

#5 Updated by Dimitris Katris about 2 years ago

  • Due date set to May 31, 2017

#6 Updated by Dimitris Katris about 2 years ago

  • Parent task changed from #6101 to #7991

#7 Updated by Leonardo Candela about 2 years ago

One year ago I commented on this activity, see #1946#note-1

This ticked does not contain any actual description. Apache Spark is a worth considering technology and computing platform. It will be great if the activity performed in the context of WP6 is enlarged a bit to involve Blue Commons members too.

Is there any documentation we can have a look at? Is the Spark cluster exploitable by others?

#8 Updated by Dimitris Katris about 2 years ago

Hi Leonardo, I think that this ticket refers to the task of investigating if and how apache spark can be exploited in the context of geospatial analytics functions. Furthermore, we want to evaluate how efficient is the adoption of this technology for our purposes and our initial findings show that it can improve algorithm execution significantly in some cases. We have started working on the implementation of investment opportunity evaluation algorithms in a distributed fashion (#7991, #7545) but we have not yet completed our work and produce results that we can make available.

Regarding the cluster, the one that we currently work on is used by multiple projects of UOA and it can only be used internally in the institute for development purposes. There is also another one with more machines that is also shared between projects, we cannot make it publicly available and we plan to use it initially in the pre-production and production environment. Nevertheless, we plan to set up a new cluster or reconfigure one of the existing so that the rest of the bluebridge partners can exploit our resources and maybe have a dedicated spark infrastructure for the project's needs.

I have just opened a ticket for that (#8047)

#9 Updated by Dimitris Katris about 2 years ago

@leonardo.candela@isti.cnr.it (I forgot to mention you to get the notification)

#10 Updated by Dimitris Katris over 1 year ago

  • Assignee changed from Dimitris Katris to Dimitris Katris
  • Infrastructure Development added

#11 Updated by Dimitris Katris about 1 year ago

  • % Done changed from 60 to 100
  • Status changed from In Progress to Closed

Also available in: Atom PDF