Table of Contents
OKD on vSphere
This repo is a collection of Terrraform and Ansible scripts to automate the OKD/OCP 4.x installation on vSphere environment.
The main goal is to require the insertion only of a minimal set of needed information and let Terraform, Ansible and OKD/OCP installer do the hard work.
Included scripts are meant to be ran from a workstation/execution host (it will be called execution host in the following), tested versions are:
- Terraform v1.3.5 with provider hashicorp/vsphere v2.2.0
- Ansible [core 2.13.6]
What is included
- vSphere env (Terraform):
- bastion host creation, from an existing template
- definition of a set of roles needed by OKD/OCP installer to perform a smooth deploy
- given a vSphere user (used as a service account), defined roles are automatically assigned
- definition of Ansible inventory and templates to be used in the next step
- returning of the bastion IP
- Bastion host configuration:
ansible
sudoer user creation to be used when performing next playbook execution (optional)install-config.yaml
generation- OKD/OCP installer download and installation
- OKD/OCP client download and installation
- enabling
oc
bash completion - download and trusting of vCenter certificates
- dir installation configuration as a git repo (and commiting
install-config.yaml
generated version)
What is NOT included
This scripts don’t include:
- vSphere service account creation, it needs to be created BEFORE Terraform execution
- A proper VM template on vSphere env running a Linux OS (a RH-like one if it’s possible)
- Avoiding plain passwords in Terraform state files and Ansible vars, use enrypted dirs and/or vaults by your own
- Additional configuration and anything not mentioned in the previous section
- Any further installation/setup/deploy regarding the installed OKD/OCP cluster.
- Something that can guess your desired configuration and/or env details, so carefully fill out config and var files as indicated below prior to run Terraform/Ansible
In general, all requirements are the same for common OKD/OCP installation:
All design, requirements validation and preparative activities are always needed in advance like a “manual” installation, please follow the above mentioned official documentation.
Information to be gathered
vSphere env
- vCenter URL/hostname (FQDN or valid alternate subject name)
- Computing cluster
- Datacenter
- Network (to be used as machine network)
- Administrator level account (to perform Terraform runs)
- Service account user (used by OKD/OCP installer/MachineConfig integration)
- Service account password
- Default Datastore
- VM template name
- VM template guest OS type
- Folder name
- Bastion VM details:
- name
- vCPUs
- assigned RAM (in MB)
- disk size (in GB)
OKD/OCP target cluster
- Base domain
- Cluster name
- Installation dir location (inside the bastion host)
- Compute (worker) nodes sizing:
- Assigned cores
- Cores per socket
- Assigned RAM memory (in MB)
- Disk size (in GB)
- Cardinality (number of replicas)
- Controlplane (master) nodes sizing:
- Assigned cores
- Cores per socket
- Assigned RAM memory (in MB)
- Disk size (in GB)
- Cardinality (number of replicas) 3 is a magic number, don’t change it
- Cluster network CIDR (if different from default)
- Service network CIDR (if different from default)
- Machine network CIDR
- VIPs:
- api (
api.<cluster_name>.<base_domain>
) - ingress (
*.apps.<cluster_name>.<base_domain>
)
- Pull secret (if RH Insight, etc. are needed, if not, leave the default fake pull secret)
- Management public key (allows ssh into the nodes while bootstrapping - recommended)
Management pubkeys
Is strongly recommended to use one or more management public keys, they can be found useful when troubleshooting the installation.
Management public keys can be inserted in :
pubkeys
list invars/pubkeys.yml
(allowed pubkeys grant access to bastion host when running Ansible playbooks )ssh_key
var invars/bastion.yaml
(allowed pubkeys grant access to bootstrapping OKD/OCP nodes)
Usage walkthrough
Preface: in order to be executed, Terraform scripts need to use an administrative user account password. To avoid to store it inside configuration files, it can be passed using an environment variable previously valued from different sources like a password manager.
For example, the env var can be defined, used and destroyed in a short bash one-liner while using pass or a similar password manager:
export TF_VAR_vsphere_password=$(pass vcenter/Administrator@vsphere.local); terraform <plan|apply|destroy>; unset TF_VAR_vsphere_password
In the following, when terraform <plan|apply|destroy>
commands are mentioned, a similar approach is supposed to be used.
NOTE: when passwords are passed to Terraform, related credentials are stored as PLAIN TEXT in .state
files after usage, so be careful while letting others access your Terraform project dir*
As mentioned before, a dedicated user needs to be created in advance on vSphere as a service account.
Installation phase can be resumed to the following steps:
- Clone this repo and cd into the local copy:
git clone git@baltig.infn.it:rorru/okd-on-vsphere.git cd okd-on-vsphere
<HTML><ol start="2" style="list-style-type: decimal;"></HTML> <HTML><li></HTML>Copy all configuration example files and edit them properly to reflect your existing/desired env, using information collected at the previous step:<HTML></li></HTML><HTML></ol></HTML>
cp terraform.tfvars.example terraform.tfvars cp vars/bastion.yaml.example vars/bastion.yaml cp vars/pubkeys.yml.example vars/pubkeys.yml vim terraform.tfvars vim vars/bastion.yaml vim vars/pubkeys.yml
<HTML><ol start="3" style="list-style-type: decimal;"></HTML> <HTML><li></HTML>Initialize the Terraform project:<HTML></li></HTML><HTML></ol></HTML>
terraform init
<HTML><ol start="4" style="list-style-type: decimal;"></HTML> <HTML><li></HTML>Define Terraform plan:<HTML></li></HTML><HTML></ol></HTML>
terraform plan
and ALWAYS REVIEW YOUR PLAN OUTPUT
<HTML><ol start="5" style="list-style-type: decimal;"></HTML>
<HTML><li></HTML>Apply your changes:
<HTML></li></HTML><HTML></ol></HTML>
terraform apply
and ALWAYS REVIEW YOUR APPLY OUTPUT BEFORE CONFIRM. Let Terraform create all needed resources, script execution stops after waiting for bastion host connectivity. At this point, all infrastructure resources are defined and Ansible will take care of configuring the bastion host.
<HTML><ol start="6" style="list-style-type: decimal;"></HTML>
<HTML><li></HTML>Grant passwordless access to the bastion host from the execution host. To do so, run the specific playbook that creates and configure a sudo ansible
user on the bastion host. The playbook runs as root by default, so if your VM template is configured for password access, use -k
option to be prompted to insert the root password:
<HTML></li></HTML><HTML></ol></HTML>
ansible-playbook -k enable_ansible_access.yaml
If your template allow for access using a sudoer user (often using a pubkey), use the form:
ansible-playbook -b -u <sudoer_user> enable_ansible_access.yaml
Add -K
option to the latter if password access should enforced (no pubkey is authorized on bastion host).
<HTML><ol start="7" style="list-style-type: decimal;"></HTML> <HTML><li></HTML>Another playbook needs to ben ran to configure bastion host:<HTML></li></HTML><HTML></ol></HTML>
ansible-playbook bastion_setup.yaml
<HTML><ol start="8" style="list-style-type: decimal;"></HTML>
<HTML><li></HTML>At this point, to launch the OKD/OCP installer, we need to access the bastion directly. If you don’t remember your bastion assigned IP, run terraform output
to show the information again. Then type:<HTML></li></HTML><HTML></ol></HTML>
ssh -l ansible <bastion_ip>
Escalate to super user privileges (if needed):
[ansible@bastion ~]$ sudo -i [root@bastion ~]#
and cd to the install dir location specified in the vars/bastion.yaml
. For example, supposing a var file content like:
platform: version: 4.10.0-0.okd-2022-07-09-073606 ... install_location: home_path: "/root" install_dir: "OKD-{{ platform.version }}" ...
you need to cd into /root/4.10.0-0.okd-2022-07-09-073606
dir.
<HTML><ol start="9" style="list-style-type: decimal;"></HTML>
<HTML><li></HTML>Review the install-config.yaml
file content and launch the OKD/OCP installer (use tmux
or screen
to avoid terminal disconnection):<HTML></li></HTML><HTML></ol></HTML>
openshift-install create cluster --dir <installation_dir>
Complete installation takes aproximatively 40 mins using default sizing parameters for master and worker nodes. Before execution stops, the installer will show administrative credential to access your new cluster. If you lose or forget related access information open <installation_dir>/auth/kubeadmin-password
for the kubeadmin
user password (usable on graphical console), or if you prefere to user the oc
client, a KUBECONFIG
env var can be defined:
export KUBECONFIG=<installation_dir>/auth/kubeconfig
Alternatively, copy <installation_dir>/auth/kubeconfig
as ~/.kube/config
To obtain the OKD/OCP web console URL, type:
oc whoami --show-console
Destroying the cluster
To destroy the cluster, follow the steps:
- Access your bastion, if you don’t remember your bastion assigned IP, run
terraform output
to show the information again. Type:
ssh -l ansible <bastion_ip>
Escalate to super user privileges (if needed):
[ansible@bastion ~]$ sudo -i [root@bastion ~]#
and cd in your install directory.
<HTML><ol start="2" style="list-style-type: decimal;"></HTML> <HTML><li></HTML>Run:<HTML></li></HTML><HTML></ol></HTML>
openshift-install destroy cluster --dir <installation_dir>
All PKD/OCP nodes VMs will be deleted. Now the bastion need to be destroyed itself.
<HTML><ol start="3" style="list-style-type: decimal;"></HTML> <HTML><li></HTML>Exit the bastion shell, and from the installation directory run:<HTML></li></HTML><HTML></ol></HTML>
terraform destroy
and ALWAYS REVIEW YOUR DESTROY OUTPUT BEFORE CONFIRM The bastion host will be destroyed and all permission assigned to the given service account user will be revoked.
Main caveat
Some permission automatically assigned to the given service account user are related to the root of vSphere computing resources, at “vCenter” level.
When updating an existing plan using terraform <plan|apply|destroy>
, Terraform makes an in-place change, often resulting in other “vCenter” level permission removal. Because of this, again, ALWAYS REVIEW YOUR TERRFORM OUTPUT BEFORE CONFIRM, if some “vCenter” level permissions are about to be destroyed:
- Stop the current Terraform execution (CTRL-c)
- Delete “vCenter” level permissions and related role state on Terraform files:
terraform state rm vsphere_entity_permissions.vcenter-permissions vsphere_role.okd-sa-vcenter-role
<HTML><ol start="3" style="list-style-type: decimal;"></HTML>
<HTML><li></HTML><HTML><p></HTML>Then manually remove the permission assigned to the service account user on vSphere<HTML></p></HTML><HTML></li></HTML>
<HTML><li></HTML><HTML><p></HTML>Re-run the terraform <plan|apply|destroy>
command just interrupted<HTML></p></HTML><HTML></li></HTML><HTML></ol></HTML>