优点:使用yum直接安装RPM包,标准方法,配置简单。
1. 准备好环境,3台基于CentOS7.0以上操作系统的云服务器,其中1台作为master管理节点,另外2台作为计算节点,分别为:
m1: 192.168.3.30
s1: 192.168.3.31
s2: 192.168.3.32
配置/etc/hosts,确保互相可以靠机器名查找,如:
[plain] view plain copy
192.168.3.30 m1
192.168.3.31 s1
192.168.3.32 s2
2. 登录节点m1,s1,s2,安装epel-release:
[plain] view plain copy
[root@m1 yum.repos.d]# <strong>yum install epel-release</strong>
。。。
Installed:
epel-release.noarch 0:7-9
Complete!
3. 在各节点安装和OpenHPC官方仓库:
[plain] view plain copy
[root@m1 yum.repos.d]# <strong>yum install https://github.com/openhpc/ohpc/releases/download/v1.3.GA/ohpc-release-1.3-1.el7.x86_64.rpm</strong>
。。。
Installed:
ohpc-release.x86_64 0:1.3-1.el7
Complete!
4. master节点(m1)安装PBSPro server包:
[plain] view plain copy
[root@m1 ~]# yum install -y pbspro-server-ohpc
5. Slave节点(s1,s2)安装PBSPro execution包:
[plain] view plain copy
[root@s1 ~]# yum install -y pbspro-execution-ohpc
6. 配置Slave节点(s1, s2):
修改/etc/pbs.conf
[plain] view plain copy
PBS_SERVER=m1
修改/var/spool/pbs/mom_priv/config
[plain] view plain copy
/var/spool/pbs/mom_priv/config
7. 启动PBSPro集群:
分别在master(m1)和slave节点(s1, s2)上执行:
[plain] view plain copy
[root@m1 ~]# systemctl enable pbs
Created symlink from /etc/systemd/system/multi-user.target.wants/pbs.service to /usr/lib/systemd/system/pbs.service.
[root@m1 ~]# systemctl start pbs
[plain] view plain copy
[root@s1 ~]# systemctl enable pbs
Created symlink from /etc/systemd/system/multi-user.target.wants/pbs.service to /usr/lib/systemd/system/pbs.service.
[root@s1 ~]# systemctl start pbs
将Slave节点加入集群:
[plain] view plain copy
[root@m1 ~]# . /etc/profile.d/pbs.sh
[root@m1 ~]# qmgr -c 'create node s1'
[root@m1 ~]# qmgr -c 'create node s2'
检查节点情况:
[plain] view plain copy
[root@m1 ~]# pbsnodes -a
s1
Mom = s1
Port = 15002
pbs_version = 14.1.0
ntype = PBS
state = free
pcpus = 1
resources_available.arch = linux
resources_available.host = s1
resources_available.mem = 918488kb
resources_available.ncpus = 1
resources_available.vnode = s1
resources_assigned.accelerator_memory = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.netwins = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
s2
Mom = s2
Port = 15002
pbs_version = 14.1.0
ntype = PBS
state = free
pcpus = 1
resources_available.arch = linux
resources_available.host = s2
resources_available.mem = 918488kb
resources_available.ncpus = 1
resources_available.vnode = s2
resources_assigned.accelerator_memory = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.netwins = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
8. 提交测试作业,并查看作业运行情况。
[plain] view plain copy
[xxin@m1 ~]$ echo 'sleep 111' | qsub
2.m1
[xxin@m1 ~]$ qstat
注意:
如果节点状态是 state-unknown,down,可能是因为防火墙引起的,需要关闭防火墙:
[plain] view plain copy
[root@m1 ~]# systemctl stop firewalld.service
[root@m1 ~]# systemctl disable firewalld.service
提交作业不可以用root用户,需要使用普通用户。在集群中,可以建立统一用户管理系统,如LDAP,并且需要建立共享的/home存储,挂在到所有节点上(包括主节点和计算节点)
登录 | 立即注册