6 Commits

64 changed files with 550 additions and 1 deletions

View File

@ -0,0 +1,12 @@
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: prometheus-storage
namespace: observability
spec:
storageClassName: slow-nfs-01
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi

View File

@ -550,4 +550,141 @@ I have decided dump my old Jenkins architecture and rely on Skaffold, it's great
I will work on integrating it with Jenkins.
# EXTRA EXTRA
## Secondary NFS provisioner
I will add a **secondary NFS Provisioner** as a new storage class.
This storage class will be targeting a **"slow"/HDD** directory/drive.
Mainly intended for storing a bunch of logs, files, videos, or whatever.
Looking at you Prometheus 👀👀.
NFS server: nfs.filter.home
Target directory: **/resources/slow_nfs_provisioner** (this is made up, I don't want to share it.)
## NFS Server
### Create the directory
- [x] Done
### Update NFS service config to allow such directory to be used.
- [x] Done
## Deploy new NFS provisioner
```shell
NFS_SERVER=nfs.filter.home
NFS_EXPORT_PATH=/resources/slow_nfs_provisioner
```
```shell
helm -n nfs-provisioner install slow-nfs-01 nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
--set nfs.server=${NFS_SERVER} \
--set nfs.path=${NFS_EXPORT_PATH} \
--set storageClass.defaultClass=true \
--set replicaCount=2 \
--set storageClass.name=slow-nfs-01 \
--set storageClass.provisionerName=slow-nfs-01
```
```text
NAME: slow-nfs-provisioner-01
LAST DEPLOYED: Fri Jan 12 23:32:25 2024
NAMESPACE: nfs-provisioner
STATUS: deployed
REVISION: 1
TEST SUITE: None
```
## Migrate some volumes to new dir
### Prometheus
(because he's the one filling my SSD.)
Copy files from (maintaining permissions):
**/resources/slow_nfs_provisioner/prometheus_generated_vol** to **/resources/slow_nfs_provisioner/prometheus_tmp**
This is mainly to "have them" already on the destination drive, folder name can be whatever.
### Create/Provision new PV
Since `path` value is immutable after creation, it will require to create a new volume, move the contents to the new volume, update the configs to match the new volume, recreate the workloads, then delete the old one.
Since this is my homelab, and I'm not bothered by some minutes of lost logs, I will instead, delete the old volume, delete the used deployment, create a new volume, then rename the folder `prometheus_tmp` I created on the previous step to replace the volume created (since the new volume is empty).
Then restart the Kubernetes deployment.
## Delete PVC
```shell
kubectl delete pvc -n observability prometheus-storage --force
```
This can take a bit since there are like 40GB of logs + it's still being used by the deployment.
```shell
kubectl get pvc -n observability prometheus-storage
```
```text
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-storage Terminating pvc-698cf837-14a3-43ee-990a-5a34e1a396de 1Gi RWX nfs-01 94d
```
### Delete Deployment
```shell
kubectl delete deployment -n observability prometheus
```
```text
deployment.apps "prometheus" deleted
```
### Delete PV
```shell
kubectl delete pv pvc-698cf837-14a3-43ee-990a-5a34e1a396de
```
```text
persistentvolume "pvc-698cf837-14a3-43ee-990a-5a34e1a396de" deleted
```
### Create new volume.
```shell
kubectl create -f PrometheusVolume.yaml
```
```text
persistentvolumeclaim/prometheus-storage created
```
I later did some cleanup from the existent data cause 41GB was kind of too much for the usage I do (aka noticed that the container `prometheus-server` was taking forever to parse all the data).
Later will change the configurations to reduce the retention + data stored.
### Redeployed Prometheus
It's been a while since I did the deployment.
```bash
kubectl get deployment -n observability prometheus
```
```text
NAME READY UP-TO-DATE AVAILABLE AGE
prometheus 1/1 1 1 3h24m
```
# Interesting
https://kubernetes.io/docs/concepts/storage/persistent-volumes/#cross-namespace-data-sources

View File

@ -0,0 +1,392 @@
# Description
Very much what the title says.
0. Search.
1. Create Proxmox VM and install OS on it.
2. Install cluster thingies to the VM.
3. Backup Cluster/Master Node
4. Stop Old Master Node
5. Restore Cluster on New Master Node
6. Update New Master Node IP to Use the Old Master Node IP
7. Rejoin All Nodes to the "New Cluster"
# Notes
## Possible issues?
- Master node name might present some discrepancies, will need to test.
- When the cluster is restored in the New Master Node, grant access to the client in that NFS server.
## Virtual Master Hardware
- 2 CPU Cores
- 8 GB of RAM
# Procedure
- [x] VM Created
- [x] SO (Debian) Installed
- [x] Edit Cluster Setup installer Ansible script into allowing not proceeding further after installing the packages/stuff necessary.
- [x] Install guest agent in all the VMs (I did kinda forgot about that)
- [x] Backup VM
- [x] Follow the guide from bellow
- [ ] Perform another backup to the control plane VM
# Links
I'm going to be following this:
https://serverfault.com/questions/1031093/migration-of-kubernetes-master-node-from-1-server-to-another-server
[//]: # ()
[//]: # (# Backup ETCD Kubernetes)
[//]: # ()
[//]: # (https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/)
[//]: # ()
[//]: # ()
# Verify your etcd data directory
SSH into the masterk node.
```shell
kubectl get pods -n kube-system etcd-pi4.filter.home -oyaml | less
```
```yaml
...
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
...
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
```
# Copy from old_master to new_master
> Why **bakup** instead of ba**ck**up? Because I want to use the K as Kubernetes.
## On new_master
```shell
mkdir bakup
```
## on OLD_master
```shell
sudo scp -r /etc/kubernetes/pki master2@192.168.1.173:~/bakup/
```
```console
healthcheck-client.key 100% 1679 577.0KB/s 00:00
server.crt 100% 1216 1.1MB/s 00:00
server.key 100% 1679 1.1MB/s 00:00
peer.crt 100% 1216 440.5KB/s 00:00
ca.crt 100% 1094 461.5KB/s 00:00
healthcheck-client.crt 100% 1159 417.8KB/s 00:00
ca.key 100% 1679 630.8KB/s 00:00
peer.key 100% 1679 576.4KB/s 00:00
front-proxy-client.crt 100% 1119 859.7KB/s 00:00
front-proxy-ca.key 100% 1679 672.4KB/s 00:00
ca.crt 100% 1107 386.8KB/s 00:00
sa.pub 100% 451 180.7KB/s 00:00
front-proxy-client.key 100% 1679 1.4MB/s 00:00
apiserver-etcd-client.key 100% 1675 1.3MB/s 00:00
apiserver.crt 100% 1294 819.1KB/s 00:00
ca.key 100% 1679 1.3MB/s 00:00
sa.key 100% 1679 1.5MB/s 00:00
apiserver-kubelet-client.crt 100% 1164 908.2KB/s 00:00
apiserver-kubelet-client.key 100% 1679 1.2MB/s 00:00
apiserver-etcd-client.crt 100% 1155 927.9KB/s 00:00
apiserver.key 100% 1675 1.4MB/s 00:00
front-proxy-ca.crt 100% 1123 939.7KB/s 00:00
```
## Remove "OLD" certs from the backup created
### on new_master
```shell
rm ~/bakup/pki/{apiserver.*,etcd/peer.*}
```
```console
removed '~/bakup/pki/apiserver.crt'
removed '~/bakup/pki/apiserver.key'
removed '~/bakup/pki/etcd/peer.crt'
removed '~/bakup/pki/etcd/peer.key'
```
## Move backup Kubernetes to the kubernetes directory (new_master)
```shell
cp -r ~/bakup/pki /etc/kubernetes/
```
```console
'~/bakup/pki' -> '/etc/kubernetes/pki'
'~/bakup/pki/etcd' -> '/etc/kubernetes/pki/etcd'
'~/bakup/pki/etcd/healthcheck-client.key' -> '/etc/kubernetes/pki/etcd/healthcheck-client.key'
'~/bakup/pki/etcd/server.crt' -> '/etc/kubernetes/pki/etcd/server.crt'
'~/bakup/pki/etcd/server.key' -> '/etc/kubernetes/pki/etcd/server.key'
'~/bakup/pki/etcd/ca.crt' -> '/etc/kubernetes/pki/etcd/ca.crt'
'~/bakup/pki/etcd/healthcheck-client.crt' -> '/etc/kubernetes/pki/etcd/healthcheck-client.crt'
'~/bakup/pki/etcd/ca.key' -> '/etc/kubernetes/pki/etcd/ca.key'
'~/bakup/pki/front-proxy-client.crt' -> '/etc/kubernetes/pki/front-proxy-client.crt'
'~/bakup/pki/front-proxy-ca.key' -> '/etc/kubernetes/pki/front-proxy-ca.key'
'~/bakup/pki/ca.crt' -> '/etc/kubernetes/pki/ca.crt'
'~/bakup/pki/sa.pub' -> '/etc/kubernetes/pki/sa.pub'
'~/bakup/pki/front-proxy-client.key' -> '/etc/kubernetes/pki/front-proxy-client.key'
'~/bakup/pki/apiserver-etcd-client.key' -> '/etc/kubernetes/pki/apiserver-etcd-client.key'
'~/bakup/pki/ca.key' -> '/etc/kubernetes/pki/ca.key'
'~/bakup/pki/sa.key' -> '/etc/kubernetes/pki/sa.key'
'~/bakup/pki/apiserver-kubelet-client.crt' -> '/etc/kubernetes/pki/apiserver-kubelet-client.crt'
'~/bakup/pki/apiserver-kubelet-client.key' -> '/etc/kubernetes/pki/apiserver-kubelet-client.key'
'~/bakup/pki/apiserver-etcd-client.crt' -> '/etc/kubernetes/pki/apiserver-etcd-client.crt'
'~/bakup/pki/front-proxy-ca.crt' -> '/etc/kubernetes/pki/front-proxy-ca.crt'
```
## ETCD snapshot on OLD_master
### from Kubectl
Check etcd api version.
```shell
kubectl exec -it etcd-pi4.filter.home -n kube-system -- etcdctl version
```
```console
etcdctl version: 3.5.10
API version: 3.5
```
### Create snapshot through etcd pod
```shell
kubectl exec -it etcd-pi4.filter.home -n kube-system -- etcdctl --endpoints https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key snapshot save /var/lib/etcd/snapshot1.db
```
```console
{"level":"info","ts":"2024-03-10T04:38:23.909625Z","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/var/lib/etcd/snapshot1.db.part"}
{"level":"info","ts":"2024-03-10T04:38:23.942816Z","logger":"client","caller":"v3@v3.5.10/maintenance.go:212","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2024-03-10T04:38:23.942946Z","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"https://127.0.0.1:2379"}
{"level":"info","ts":"2024-03-10T04:38:24.830242Z","logger":"client","caller":"v3@v3.5.10/maintenance.go:220","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2024-03-10T04:38:25.395294Z","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"https://127.0.0.1:2379","size":"19 MB","took":"1 second ago"}
{"level":"info","ts":"2024-03-10T04:38:25.395687Z","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/var/lib/etcd/snapshot1.db"}
Snapshot saved at /var/lib/etcd/snapshot1.db
```
### Transfer snapshot to the new_master node
### on the OLD_master
```shell
scp /var/lib/etcd/snapshot1.db master2@192.168.1.173:~/bakup
```
```text
snapshot1.db 100% 19MB 44.0MB/s 00:00
```
### Update kubeadm.config
### on the OLD_master
```shell
kubectl get cm -n kube-system kubeadm-config -oyaml
```
```text
apiVersion: v1
data:
ClusterConfiguration: |
apiServer:
extraArgs:
authorization-mode: Node,RBAC
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.k8s.io
kind: ClusterConfiguration
kubernetesVersion: v1.28.7
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
scheduler: {}
kind: ConfigMap
metadata:
creationTimestamp: "2024-02-22T21:45:42Z"
name: kubeadm-config
namespace: kube-system
resourceVersion: "234"
uid: c56b87b1-691d-4277-b66c-ab6035cead6a
```
### on the new_master
#### Create kubeadm-config.yaml
```shell
touch kubeadm-config.yaml
```
I have used the information from the previously displayed cm to create the following file (basically filling the default kubeadmin-config file):
Note that the token used differs.
```yaml
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.abcdef0123456789
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.1.9
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
name: masterk
taints: null
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.k8s.io
kind: ClusterConfiguration
kubernetesVersion: 1.29.0
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
scheduler: {}
```
### Install etcdctl
https://github.com/etcd-io/etcd/releases/tag/v3.5.12
### Restore from snapshot into new_master
This time I will be using the `etcdctl` cli tool.
```shell
mkdir /var/lib/etcd
```
```shell
ETCDCTL_API=3 /tmp/etcd-download-test/etcdctl --endpoints https://127.0.0.1:2379 snapshot restore './bakup/snapshot1.db' && mv ./default.etcd/member/ /var/lib/etcd/
```
```console
Deprecated: Use `etcdutl snapshot restore` instead.
2024-03-10T06:09:17+01:00 info snapshot/v3_snapshot.go:260 restoring snapshot {"path": "./bakup/snapshot1.db", "wal-dir": "default.etcd/member/wal", "data-dir": "default.etcd", "snap-dir": "default.etcd/member/snap"}
2024-03-10T06:09:17+01:00 info membership/store.go:141 Trimming membership information from the backend...
2024-03-10T06:09:18+01:00 info membership/cluster.go:421 added member {"cluster-id": "cdf818194e3a8c32", "local-member-id": "0", "added-peer-id": "8e9e05c52164694d", "added-peer-peer-urls": ["http://localhost:2380"]}
2024-03-10T06:09:18+01:00 info snapshot/v3_snapshot.go:287 restored snapshot {"path": "./bakup/snapshot1.db", "wal-dir": "default.etcd/member/wal", "data-dir": "default.etcd", "snap-dir": "default.etcd/member/snap"}
```
### Do shenanigans to replace the OLD_node by the new_node
Aka replace the IP maneuvers.
### Start new node
```shell
kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd --config kubeadm-config.yaml
```
```console
kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd --config kubeadm-config.yaml
[init] Using Kubernetes version: v1.29.0
[preflight] Running pre-flight checks
[WARNING DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
W0310 06:42:10.268972 1600 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.6" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "registry.k8s.io/pause:3.9" as the CRI sandbox image.
[certs] Using certificateDir folder "/etc/kubernetes/pki"
```
## Join "old nodes" into the "new masterk"
For my surprise I didn't need to rejoin nodes, only remove the old control plane.
```shell
kubectl get nodes
```
```console
NAME STATUS ROLES AGE VERSION
masterk.filter.home Ready control-plane 4m59s v1.29.2
pi4.filter.home NotReady control-plane 16d v1.29.2
slave01.filter.home Ready <none> 10d v1.29.2
slave02.filter.home Ready <none> 16d v1.29.2
slave03.filter.home Ready <none> 16d v1.29.2
slave04.filter.home Ready <none> 16d v1.29.2
```
```shell
kubectl delete node pi4.filter.home
```
```console
node "pi4.filter.home" deleted
```
```shell
kubectl get nodes
```
```console
NAME STATUS ROLES AGE VERSION
masterk.filter.home Ready control-plane 5m20s v1.29.2
slave01.filter.home Ready <none> 10d v1.29.2
slave02.filter.home Ready <none> 16d v1.29.2
slave03.filter.home Ready <none> 16d v1.29.2
slave04.filter.home Ready <none> 16d v1.29.2
```
So very much done, since I didn't need to rejoin I will be paying extra attention to the nodes for a while.

View File

@ -93,6 +93,7 @@ slave04.filter.home 1 0
- Jenkins master + dynamic agent(s)
- Docker Registry
- Skaffold (Client/User side, not running on the Kubernetes cluster, yet relies on it to create multiarch docker images)
### Git servers
@ -101,7 +102,14 @@ slave04.filter.home 1 0
### Media related
- Tube
- Firebrowser
- Fireshare
- Filebrowser
- Jellyfin
- qBitTorrent
## Downsides of my current setup
- Only 1 Kubernetes master node, therefore no full High Availability
- Only 1 NFS server / no HA NFS server, therefore if the NFS server is down most of the services on the Kubernetes cluster will also be down as they depend on such NFS