User Tools

Site Tools


progetti:cloud-areapd:user_s_guides:im_user_guide

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
progetti:cloud-areapd:user_s_guides:im_user_guide [2016/10/11 15:18] – [Troubeshooting] verlato@infn.itprogetti:cloud-areapd:user_s_guides:im_user_guide [2017/07/21 08:03] (current) verlato@infn.it
Line 1: Line 1:
 +===== IM User Guide =====
 +  * go to http://90.147.77.138/im-web/index.php and create an account <IMusername> <IMpassword>
 +  * install in your machine the IM client (https://github.com/grycap/im-client)
 +  * configure your im_client.cfg file as below:
 +<code bash>
 +[im_client]
 +xmlrpc_url=http://90.147.77.138:8899
 +auth_file=auth.dat
 +</code>
 +  * configure your auth.dat file as below:
 +<code bash>
 +type = InfrastructureManager; username = <IMusername>; password = <IMpassword>
 +id = ost; type = OpenStack; host = cloud-areapd.pd.infn.it:5000; username = <yourCAPusername>; password = <yourCAPpassword>; tenant = GRID-CERTIFICATION; service_region = regionOne; auth_url = https://cloud-areapd.pd.infn.it:5000;
 +</code>
 +  * run the following command to create a CentOS7 torque/maui cluster with the master node with a floating IP by using [[http://pastebin.com/cp7G8rdN | os-c7-torque.radl file]]:
 +<code bash>
 +./im_client.py create os-c7-torque.radl
 +</code>
 +  * check on http://90.147.77.138/im-web/list.php the status of the infrastructure you just launched
 +  * when the infrastructure is configured, login on the master:
 +<code bash>
 +$ ./im_client.py list
 +Connected with: http://90.147.77.138:8899
 +Infrastructure IDs: 
 +  c2f70fa0-8ba9-11e6-9dbf-0242ac110002
 +  
 +$ ./im_client.py -a auth-os.dat sshvm c2f70fa0-8ba9-11e6-9dbf-0242ac110002 0
 +Connected with: http://90.147.77.138:8899
 +Warning: Permanently added '90.147.77.131' (RSA) to the list of known hosts.
 +Last login: Thu Oct  6 09:53:03 2016 from 90.147.77.138
 +[centos@torqueserver ~]$ 
 +[centos@torqueserver ~]$ sudo su -
 +[root@torqueserver ~]$ pbsnodes
 +vnode-0-0
 +     state = down
 +     np = 2
 +     ntype = cluster
 +     mom_service_port = 15002
 +     mom_manager_port = 15003
 +
 +vnode-1-0
 +     state = free
 +     np = 2
 +     ntype = cluster
 +     status = rectime=1475749622,varattr=,jobs=,state=free,netload=? 0,gres=,loadave=0.00,ncpus=0,physmem=4193904kb,availmem=3709676kb,totmem=4193904kb,idletime=2323,nusers=0,nsessions=0,uname=Linux vnode-1.localdomain 3.10.0-123.9.3.el7.x86_64 #1 SMP Thu Nov 6 15:06:03 UTC 2014 x86_64,opsys=linux
 +     mom_service_port = 15002
 +     mom_manager_port = 15003
 +
 +vnode-2-0
 +     state = free
 +     np = 2
 +     ntype = cluster
 +     status = rectime=1475749628,varattr=,jobs=,state=free,netload=? 0,gres=,loadave=0.00,ncpus=0,physmem=4193904kb,availmem=3714272kb,totmem=4193904kb,idletime=2319,nusers=0,nsessions=0,uname=Linux vnode-2.localdomain 3.10.0-123.9.3.el7.x86_64 #1 SMP Thu Nov 6 15:06:03 UTC 2014 x86_64,opsys=linux
 +     mom_service_port = 15002
 +     mom_manager_port = 15003
 +[root@torqueserver ~]# su - dteam001
 +[dteam001@torqueserver ~]$ ls
 +test.job
 +[dteam001@torqueserver ~]$ qsub -l nodes=2 test.job
 +0.torqueserver.localdomain
 +[dteam001@torqueserver ~]$ ls
 +errori  risultato  test.job
 +[dteam001@torqueserver ~]$ cat risultato 
 +Thu Oct  6 10:25:55 UTC 2016
 +Ci sono una serie di cose interessanti che ti interessera sapere
 +Questo job stato identificato come 0.torqueserver.localdomain e si chiama ExampleJob
 +e stato inserito inizialmente nella coda batch
 +ed e stato eseguito sulla coda batch
 +E stato sottoposto dalla macchina: torqueserver.localdomain
 +E stato eseguito sulla macchina: vnode-2.localdomain
 +Thu Oct  6 10:26:00 UTC 2016
 +</code>
 +  * the master will take one of the 3 available floating IPs: 90.147.77.131-->133, corresponding to cld-cream-01-->03(.pd.infn.it) in the DNS
 +  * X509 host certificates are available for cld-cream-01-->03 hosts
 +  * an example of .radl file for creating an HTCondor cluster on CentOS7 is here: [[http://pastebin.com/qiT9Vszf | os-c7-htcondor.radl]]
 +  * sometimes the cluster do not configure properly. Use e.g. $ ./im_client.py destroy c2f70fa0-8ba9-11e6-9dbf-0242ac110002 to destroy the cluster, and try again.
 +  * after destroying a cluster, do not forget to re-allocate the floating IP in the GRID-CERTIFICATION project: login as root in cld-ctrl-01 and execute the script ./GRID-CERTIFICATION-floIPs.sh
 +
 +===== Troubeshooting =====
 +  * to login the VM hosting IM server execute the following from cld-ctrl-01:
 +<code bash>
 +[root@cld-ctrl-01 ~]# cd MARCO
 +[root@cld-ctrl-01 MARCO]# ip netns exec qdhcp-b6e3d4de-2959-4c89-8558-5845f2b316fd ssh -i for_im_web_key  centos@10.63.34.5
 +
 +or 
 +
 +[root@cld-ctrl-01 MARCO]# ssh  -i for_im_web_key centos@90.147.77.138
 +[centos@im-server-and-web ~]$
 +</code>
 +
 +  * to restart the service:
 +<code bash>
 +[centos@im-server-and-web ~]$ sudo su -
 +[root@im-server-and-web ~]# docker stop im
 +[root@im-server-and-web ~]# docker stop im-web
 +[root@im-server-and-web ~]# docker start im
 +[root@im-server-and-web ~]# docker start im-web
 +</code>
 +
 +  * to login in the container running the service:
 +<code bash>
 +[root@im-server-and-web ~]# docker exec -ti im /bin/bash
 +root@e78904a1fcc6:/# 
 +</code>
 +  * in case you need to reset everything:
 +<code bash>
 +[root@im-server-and-web ~]# docker stop im; docker rm im
 +[root@im-server-and-web ~]# docker stop im-web; docker rm im-web
 +[root@im-server-and-web ~]# docker run -d -p 8899:8899 --name im grycap/im
 +[root@im-server-and-web ~]# docker run -d -p 80:80 --name im-web --link im:im grycap/im-web
 +</code>
  

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki