Incident #10210

RStudio doesn't start

Added by TAHA IMZILEN about 2 years ago. Updated about 2 years ago.

Status:ClosedStart date:Nov 14, 2017
Priority:HighDue date:
Assignee:Julien Barde% Done:

100%

Category:Default
Infrastructure:Production
Milestones:
Duration:

Description

Hi,
I can't open RStudio from RProtoTypingLab. I have the following error:

Error navigating to ~:
Error occured during transmission

Thank you

code_Taha.R (2.44 KB) Julien Barde, Nov 15, 2017 01:51 PM

fad_drift_article_pnas.pdf (1.21 MB) Andrea Dell'Amico, Nov 15, 2017 02:33 PM

fad_drift_article_pnas_appendix.pdf (1.14 MB) Andrea Dell'Amico, Nov 15, 2017 02:34 PM

before_refresh.png (293 KB) Julien Barde, Nov 15, 2017 04:15 PM

after_refresh.png (394 KB) Julien Barde, Nov 15, 2017 04:15 PM

Capture d_écran de 2017-11-17 04-58-58.png (329 KB) Julien Barde, Nov 17, 2017 05:00 AM

1898
1899
1906

Subtasks

Support #10340: resetting RSudioClosedTAHA IMZILEN

History

#1 Updated by Roberto Cirillo about 2 years ago

  • Assignee changed from Andrea Dell'Amico to TAHA IMZILEN
  • Status changed from New to In Progress

I've checked the service right now and it seems to work fine. Please @taha.imzilen@ird.fr could you retry?

#2 Updated by TAHA IMZILEN about 2 years ago

yes ,it works fine now! Thank you

#3 Updated by Pasquale Pagano about 2 years ago

  • Status changed from In Progress to Closed

#4 Updated by TAHA IMZILEN about 2 years ago

HI @roberto.cirillo@isti.cnr.it ,
I'm sorry I have the same problem. I'm trying since a few hours and I still can not open RStudio!

#5 Updated by Pasquale Pagano about 2 years ago

  • Priority changed from High to Immediate
  • Assignee changed from TAHA IMZILEN to _InfraScience Systems Engineer
  • Status changed from Closed to In Progress
  • Category set to Default
  • Infrastructure Production added

#6 Updated by Roberto Cirillo about 2 years ago

  • Assignee changed from _InfraScience Systems Engineer to TAHA IMZILEN
  • Status changed from In Progress to Feedback

I've resetting your rstudio state and now it seems to work fine. I've also saved your previous state folder in another location for the moment, in this way if you need to recover some info from your previous state we could try to recover the old state. Please, could you check ?

#7 Updated by TAHA IMZILEN about 2 years ago

sorry, I still have the same issue.
just to be sure,have you resetting the rstudio state for the account taha.imzilen@ird.fr (because I also have a gmail account)?

#8 Updated by Andrea Dell'Amico about 2 years ago

  • Assignee changed from TAHA IMZILEN to Roberto Cirillo

#9 Updated by Roberto Cirillo about 2 years ago

I think the problem could be the following: your process consumes a lot of memory and the oom killer kills your current session.
After that, if you retry to open another session, your state became broken and I have to reset your state.

these are the logs that I see from /var/log/syslog:

5230-Nov  7 16:11:32 rstudio1-proto kernel: [6989448.839157] Out of memory: Kill process 21276 (rsession) score 469 or sacrifice child
5231-Nov  7 16:11:32 rstudio1-proto kernel: [6989448.839171] Killed process 21276 (rsession) total-vm:9522124kB, anon-rss:8469512kB, file-rss:0kB
5232:Nov  7 16:11:32 rstudio1-proto kernel: [6989448.840996] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
5233-Nov  7 16:11:32 rstudio1-proto kernel: [6989448.841000] java cpuset=/ mems_allowed=0

#10 Updated by Roberto Cirillo about 2 years ago

  • Assignee changed from Roberto Cirillo to TAHA IMZILEN

I've restored your rstudio state. Please keep in mind, if you use a lot of memory your process will be killed by system. Could you recheck if it works now?

#11 Updated by Julien Barde about 2 years ago

I would like to add some comments which seems related to this issue. Taha with some colleagues are using RStudio to process data in order to publish a paper (with knitr) entirely created on the infra. Our ultimate goal was to collaboratively edit the paper by using sharelatex.

However, the processes required by the data analysis are quite heavy and RStudio was often crashing. My colleagues were complaining about this and I supposed it was due to an overload of the server capacity.
Since the processing can't really be done with a normal PC, we wanted to compile the paper with the infra to demonstrate the interest of such a compilation environment.
This is all the more problematic that my colleagues were probably sometimes compiling the same codes at the same time.

Is there any solution to overcome such issues ? I think this would be a good example to demonstrate that in such cases, an online compilation environment with additional machine resources is required to do the job.

#12 Updated by Pasquale Pagano about 2 years ago

Hi @julien.barde@ird.fr, we are afraid to hear that there are people complaining since we are delivering better solutions than the ones that they are currently exploiting. @taha.imzilen@ird.fr is using RPrototypingLab if I correctly understood. This is an environment for prototyping and not for large computation. In this environment, RStudio is equipped with 12 GB RAM. We have other RStudio equipped with 32 GB RAM that could fit better Taha needs. However, those instances are associated to the VRE that requested them. Unfortunately, RStudio is not a technology designed to be clustered. Thus a user has its home directory just on one server. We can allocate the RStudio with 32 GB of RAM to another VRE, e.g. RStudioLab, but Taha will have to setup its own environment in that VRE.
If this is acceptable, I can invite the tech team to also move his home directory to another RStudio instance (if this is feasible).
So, to be concrete:
- do not use RPrototypingLab for any computation you need but use it to prototype your activity and then request the proper resources to run in real scientific cases.
- tell us if RStudioLab is suitable to Taha for his activity in this case
- report your needs in advance and not only when issues happen since we cannot evaluate the resources you need in advance without any information.

Finally consider that we cannot allocate services, including Rstudio, with more than 32 GB Ram. If your process exceeds this capacity we need a dedicated negotiation before starting the activity.

#13 Updated by Julien Barde about 2 years ago

Thanks a lot for your explanations

Nothing to be afraid about...French are known to complain all the time.
More seriously I think that we reach here a key issue for such a project. I completely understand that we are not properly using the infrastructure but, at the same time, we need to find a way to make users really understand what they can get. I am not really surprised since it took me a while to understand that :

  • the infrastructure runs multiple RStudio server instances. I tried to explain it to my colleagues multiple times but I think they didn't really understand (or tried to)
  • each RStudio instance is shared by multiple VREs
  • RStudio works in a silo, not on the grid. So one could say it's a playground to compile some code before deploying the same code on the grid
  • Sharelatex is still using another R compilation system and we don't know how much machine resources.

I can understand that my colleagues are lost since they basically just want to open RStudio and compile some code. I know it's not fair but at the same time I recognize that it is difficult for newcomers to get this information:

  • as far as I know, there is no overview / schema to explain how RStudio is used in the infra and it's difficult then to choose the right RStudio
  • it might be usefull to display a summary of machine resources availables for such components of the VREs (like an "about" sub-item below "RStudio" saying few words about the version, the RAM and the cores....)
  • since RStudio is the main reason to register in the infrastructure for some users, it means then that for some users, they should choose the VRE according to the machines resources of the underlying RStudio instance.
  • when multiple users like Taha and my colleagues use the same RStudio instance to compile the same (heavy) code, it still can be an issue.

I think a summary of RStudio servers configuration in the infra might be enough to drive users and chose in which VREs their code should be compiled.
We don't need you to change anything at this stage I think since the code Taha is trying to compile is downloading the data and codes from elsewhere so he doesn't need any replication of his workspace.

#14 Updated by Pasquale Pagano about 2 years ago

  • Status changed from Feedback to Closed

Thank @julien.barde@ird.fr for your explanation. I am going to close this incident since @taha.imzilen@ird.fr can move to RStudioLab to perform his job and we don't need to port his workspace.

As far as your comments, I agree with you that we need to find the way to better drive the users to the right VRE. This driving instructions are partially illustrated by the VRE managers but we need to put in the position VRE Managers and users to get the information they need to choose the VRE to exploit in some simple way.
@massimiliano.assante@isti.cnr.it, please read Julien's proposals and then we will discuss on how and where to report this information to the end users.

#15 Updated by Julien Barde about 2 years ago

  • Status changed from Closed to In Progress
  • File code_Taha.R added

I am reopening the ticket since something doesn't work properly.
Wherever I try to compile the code mentionned by Taha (http://rstudio.d4science.org/, https://rstudio2.d4science.org/, https://rstudio1.d4science.org/, http://ip-90-147-166-181.ct1.garrservices.it/), the compilation failed.
I have some serious doubts about the compilation in the infrastructure since I try with my PC it works and the compilation is fast...my PC has 32 GB RAM but it can't explain such a difference of behavior. It works as well locally on other PCs..
I guess something is not working properly with the management of memory and users sessions.

In the attached file you will find the code which we all fail to compile with RStudio online in the infra. You just need to copy paste it and execute it in RStudio and it should work...

#16 Updated by Julien Barde about 2 years ago

it might be related to #10340

#17 Updated by Andrea Dell'Amico about 2 years ago

Julien Barde wrote:

I am reopening the ticket since something doesn't work properly.
Wherever I try to compile the code mentionned by Taha (http://rstudio.d4science.org/, https://rstudio2.d4science.org/, https://rstudio1.d4science.org/, http://ip-90-147-166-181.ct1.garrservices.it/), the compilation failed.
I have some serious doubts about the compilation in the infrastructure since I try with my PC it works and the compilation is fast...my PC has 32 GB RAM but it can't explain such a difference of behavior. It works as well locally on other PCs..
I guess something is not working properly with the management of memory and users sessions.

In the attached file you will find the code which we all fail to compile with RStudio online in the infra. You just need to copy paste it and execute it in RStudio and it should work...

We are in the works for me space here. I just compiled that code on one of the IOTC_SS3 rstudio instances. It is also a strictly sequential job, it used just one processor from the beginning to the end of the computation. It also used 13GB of RAM at max, so it should compile on every rstudio instance that we have.

I attach the pdf files that I obtained as a result (they are the same that you compiled on sharelatex, aren't they?). Feel free to remove them if they cannot be published here.

#18 Updated by Julien Barde about 2 years ago

thanks Andrea for your feedback.
This is what we don't understand: @taha.imzilen@ird.fr wasn't able to compile them with his usual account (see https://support.d4science.org/issues/10340#change-58576). But he was able to compile it with another account that is not used often. Same for me and other colleagues: we can't compile these code since months...it's crashing almost all the time wherever we try.
It looks like it's not possible to compile it when you are using RStudio since a while...
How can we clean the RStudio environment to be sure it's been properly reset ?

#19 Updated by Andrea Dell'Amico about 2 years ago

There are several dot directories where the old sessions data live. We can manage to create a script to execute on request. Maybe from inside rstudio itself, @gianpaolo.coro@isti.cnr.it ?

What's the error that you obtain, btw? maybe it can give us some additional information.

#20 Updated by Andrea Dell'Amico about 2 years ago

  • Assignee changed from TAHA IMZILEN to _InfraScience Systems Engineer

#21 Updated by Julien Barde about 2 years ago

most of the time RStudio the logs are stucked, it doesn't say anything. Sometimes an error when reconnecting / refreshing the page.
I will make a try to show you.

Yes a script to clean everything might be very helpful.

#22 Updated by Roberto Cirillo about 2 years ago

Julien Barde wrote:

thanks Andrea for your feedback.
This is what we don't understand: @taha.imzilen@ird.fr wasn't able to compile them with his usual account (see https://support.d4science.org/issues/10340#change-58576). But he was able to compile it with another account that is not used often. Same for me and other colleagues: we can't compile these code since months...it's crashing almost all the time wherever we try.
It looks like it's not possible to compile it when you are using RStudio since a while...
How can we clean the RStudio environment to be sure it's been properly reset ?

Have you checked if the usual taha account and the other taha account are redirect on the same VM?

#23 Updated by Julien Barde about 2 years ago

In the attached files you will find a typical error: I try to compile the code but after 30 minutes RStudio is still stucked at the beginning of the process and logs stay at the same stage forever....when I refresh the page I get the error in the screenshot: "...abnormally terminated...unexpected crash..."

#24 Updated by Pasquale Pagano about 2 years ago

@julien.barde@ird.fr, @taha.imzilen@ird.fr I think that we identified an issue with RStudio and we are going to solve it. It is not related to this specific issue but it should solve it as well.

In short, to each VRE a cluster (I should call it a set more than a cluster btw) of RStudio has been associated. RStudioLab and RPrototypingLab are examples of those VREs. Unfortunately, not all RStudio instances are equivalent. They differ in the number of cores and in the type of HW (cpu type, memory type, location, ...). This is not an issue if the job does not stress the hardware.
When a user accesses the first time the RStudio through a VRE, the user is associated to that RStudio. This is mainly because RStudio is not designed to work in a cluster. We have identified a solution to enhance it in that direction but we have not yet had time to allocate resources for it. This explains why Taha is able to compile his code with one account and not with the other one. Simply, the two Taha accounts are assigned to different RStudio instances.

The solution we have identified is to allocate identical RStudio to all VREs starting from RStudioLab. To complete this activity we have to move the home of all users from the current allocated RStudio to the new one allocated to the VRE. On Thursday we will try to complete this activity for RStudioLab.

I am sorry for the inconvenience but it is quite complex to keep operating the infrastructure VREs while trying to evolve it and to fix the issues identified with the experience.

#25 Updated by Julien Barde about 2 years ago

thanks @pasquale.pagano@isti.cnr.it, we totally understand that this is not trivial.
As we said, my colleagues were failing most of the time in compiling this knitr document without really understanding the reasons.
This is great if we can finally compile it online since we would like to share the material of the publication (codes and data) but as well recommend the VRE as a compilation environment : either with Sharelatex or with RStudio. Sharelatex might be even better to compile it since people could access the same project collaboratively.

This is as well a methode that we would like to use to promote good practices for open science in the framework of OpenAire-connect. To achieve this, we need to be sure first that people who will try to compile it won't get an error. This is a good test since the size of data and processes require machine resources.

#26 Updated by Pasquale Pagano about 2 years ago

  • Assignee changed from _InfraScience Systems Engineer to Julien Barde
  • Status changed from In Progress to Feedback

Please, @julien.barde@ird.fr and @taha.imzilen@ird.fr check the RStudio accessible through the RStudioLab VRE.

RstudioLab VRE has been reconfigured with a new set of equivalent RStudio, each with 16 cores and 32 GB of RAM.

Soon, we will reconfigure also RPrototypingLab VRE and ICCAT_BFT-E VREs.

#27 Updated by Julien Barde about 2 years ago

I just made a test and I still have the same error (see attached screenshot)

#28 Updated by Julien Barde about 2 years ago

@pasquale.pagano@isti.cnr.it , sorry I just realized that you changed the URL of RStudio Server instance related to RStudioLab (now http://ip-90-147-167-220.ct1.garrservices.it/ while the note above, #10210#note-27, was worth with http://rstudio.d4science.org/)
So I tried to compile the code and it's working now.
Good news then. I will try again.
We will let you know if some colleagues can't compile it.
Thanks

#29 Updated by Pasquale Pagano about 2 years ago

  • Status changed from Feedback to Closed

I am going to close this incident. It should be ok for every other user now.

#30 Updated by Julien Barde about 2 years ago

yes thanks.
Christophe Lett (no completion for his email in redmine by the way) made another test and it worked for him as well. It should be good for the others as well.
We will let you know if it happens again.

Also available in: Atom PDF