Compare commits
6 Commits
base_setup
...
b5508eab97
Author | SHA1 | Date | |
---|---|---|---|
b5508eab97 | |||
5f3c3b0e91 | |||
b367990028 | |||
4606dd3cf5 | |||
96fd561257 | |||
950137040f |
553
Migrations/Say_HI_to_Proxmox/README.md
Normal file
553
Migrations/Say_HI_to_Proxmox/README.md
Normal file
@ -0,0 +1,553 @@
|
||||
|
||||
- This time I won't be doing a "walkthrough" from the process, but instead a progress list.
|
||||
|
||||
The plan is to replace the `srv` server that is currently used as standalone docker/NFS server, with a Proxmox instance as it would allow some more flexibility.
|
||||
|
||||
My current requirements are:
|
||||
|
||||
- I need a NFS server (Proxmox can do that)
|
||||
|
||||
- Jenkins agent
|
||||
|
||||
## NFS
|
||||
|
||||
Meanwhile I configure the NFS entries, the Kubernetes services will be down.
|
||||
|
||||
## Jenkins
|
||||
|
||||
The idea is to replace Jenkins with ArgoCD eventually, so as per the moment will be a 🤷
|
||||
|
||||
## Core Services
|
||||
|
||||
They will be moved to the Kubernetes cluster.
|
||||
|
||||
### Jellyfin
|
||||
|
||||
Will need to wait until:
|
||||
|
||||
- NFS are set up
|
||||
- Kubernetes worker node is set up / set up to only ARM64 arch.
|
||||
|
||||
### Home DHCP
|
||||
|
||||
I'm so good that I already was building an image with DHCP both for `amd64` and `arm64`.
|
||||
|
||||
### Registry
|
||||
|
||||
- Wait until NFS is set up
|
||||
|
||||
### Tube
|
||||
|
||||
- Wait until NFS is set up
|
||||
- Kubernetes worker node is set up / set up to only ARM64 arch.
|
||||
|
||||
### QBitTorrent
|
||||
|
||||
- Wait until NFS is set up
|
||||
|
||||
### CoreDNS
|
||||
|
||||
- Will be deleted.
|
||||
|
||||
### Gitea
|
||||
|
||||
- Wait until NFS is set up
|
||||
|
||||
## Extra notes
|
||||
|
||||
Could create a new NFS pool for media related, specially when some data could b stored in an HDD and other could be stored in a SSD.
|
||||
|
||||
# Steps
|
||||
|
||||
## Make the DHCP server work in/from the Kubernetes cluster
|
||||
|
||||
- [x] Done
|
||||
|
||||
## Confirm how can I create a NFS server in Proxmox
|
||||
|
||||
https://www.reddit.com/r/Proxmox/comments/nnkt52/proxmox_host_as_nfs_server_or_guest_container_as/
|
||||
|
||||
https://forum.level1techs.com/t/how-to-create-a-nas-using-zfs-and-proxmox-with-pictures/117375
|
||||
|
||||
## Reorganize the local Network distribution/update the DHCP server
|
||||
|
||||
- [x] Done
|
||||
|
||||
## Update the DHCP server with the new arrangement
|
||||
|
||||
- [x] Ready
|
||||
- [x] Done
|
||||
|
||||
## Update the DNS server with the new arrangement
|
||||
|
||||
- [x] Ready
|
||||
- [x] Done
|
||||
|
||||
## Delete External service points for the Klussy deployments
|
||||
|
||||
- [x] Done
|
||||
|
||||
## Install Proxmox
|
||||
|
||||
- [x] Done
|
||||
|
||||
## Install NFS service on the Proxmox host
|
||||
|
||||
- [x] Done
|
||||
|
||||
## Configure NFS mount vols on the NFS server
|
||||
|
||||
- [x] Done
|
||||
|
||||
## Move directory from old NFS to new NFS server
|
||||
|
||||
- [x] Done
|
||||
|
||||
## Configure NFS mount vols on the klussy cluster to match the new NFS server
|
||||
|
||||
- [x] Done
|
||||
|
||||
## Deploy "old" external services (if possible) + their NFS mounts
|
||||
|
||||
- [x] Gitea
|
||||
- [x] Tube (older version)
|
||||
- [x] Registry # Maybe replace Registry for Harbor in the future
|
||||
|
||||
https://ruzickap.github.io/k8s-harbor/part-04/#install-harbor-using-helm
|
||||
|
||||
## Deploy new slave node on the Proxmox server
|
||||
|
||||
- [x] Done
|
||||
|
||||
## Update Cluster to latest version cause it's about time.
|
||||
|
||||
Made this Ansible script:
|
||||
- https://gitea.filterhome.xyz/ofilter/ansible_update_cluster
|
||||
|
||||
- [x] Done
|
||||
|
||||
## Deploy remaining services + their NFS mounts
|
||||
|
||||
- [x] Jellyfin
|
||||
- [x] QBitTorrent
|
||||
- [x] Filebrowser
|
||||
|
||||
|
||||
## [EXTRA] Deploy new slave node on the Proxmox server (slave04)
|
||||
|
||||
Decided to add ANOTHER VM as a slave to allow some flexibility between x64 nodes.
|
||||
|
||||
- [x] Created the VM and installed the OS
|
||||
- [x] Set up GPU pass through for the newly created VM
|
||||
- [x] Created a Kubernetes Node
|
||||
- [x] Done
|
||||
|
||||
|
||||
## Set up the GPU available in the Kubernetes Node
|
||||
|
||||
Very much what the title says. Steps below.
|
||||
|
||||
- [x] Done
|
||||
|
||||
|
||||
### Install nvidia drivers
|
||||
|
||||
> **Note:**
|
||||
> - Steps were performed in the VM Instance (Slave04). \
|
||||
> - Snapshots were performed on the Proxmox node, taking a snapshot of the affected VM. \
|
||||
> - `Kubectl` command(s) were performed on a computer of mine external to the Kubernetes Cluster/Nodes to interact with the Kubernetes Cluster.
|
||||
|
||||
#### Take snapshot
|
||||
|
||||
- [x] Done
|
||||
|
||||
#### Repo thingies
|
||||
|
||||
Enable `non-free` repo for debian.
|
||||
|
||||
aka. idk you do that
|
||||
|
||||
`non-free` and `non-free-firmware` are different things, so if `non-free-firmware` is already listed, but `non-free` not, slap that bitch in + `contrib`.
|
||||
|
||||
```md
|
||||
FROM:
|
||||
deb http://ftp.au.debian.org/debian/ buster main
|
||||
TO:
|
||||
deb-src http://ftp.au.debian.org/debian/ buster main non-free contrib
|
||||
```
|
||||
|
||||
In my case that was enabled during the installation.
|
||||
|
||||
Once repos set up, use:
|
||||
|
||||
```shell
|
||||
apt update && apt install nvidia-detect -y
|
||||
```
|
||||
|
||||
##### [Error] Unable to locate package nvidia-detect
|
||||
|
||||
Ensure both `non-free` and `contrib` are in the repo file.
|
||||
|
||||
(File /etc/apt/sources.list)
|
||||
|
||||
####
|
||||
```shell
|
||||
nvidia-detect
|
||||
```
|
||||
```text
|
||||
Detected NVIDIA GPUs:
|
||||
00:10.0 VGA compatible controller [0300]: NVIDIA Corporation GM206 [GeForce GTX 960] [10de:1401] (rev a1)
|
||||
|
||||
Checking card: NVIDIA Corporation GM206 [GeForce GTX 960] (rev a1)
|
||||
Your card is supported by all driver versions.
|
||||
Your card is also supported by the Tesla drivers series.
|
||||
Your card is also supported by the Tesla 470 drivers series.
|
||||
It is recommended to install the
|
||||
nvidia-driver
|
||||
package.
|
||||
```
|
||||
|
||||
### Install nvidia driver
|
||||
|
||||
```shell
|
||||
apt install nvidia-driver
|
||||
```
|
||||
|
||||
We might receive a complaint regarding "conflicting modules".
|
||||
|
||||
Just restart the VM.
|
||||
|
||||
#### Reboot VM
|
||||
|
||||
```shell
|
||||
reboot
|
||||
```
|
||||
|
||||
#### nvidia-smi
|
||||
|
||||
VM has access to the Nvidia drivers/GPU
|
||||
|
||||
```shell
|
||||
nvidia-smi
|
||||
```
|
||||
|
||||
```text
|
||||
Fri Dec 15 00:00:36 2023
|
||||
+-----------------------------------------------------------------------------+
|
||||
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 |
|
||||
|-------------------------------+----------------------+----------------------+
|
||||
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
|
||||
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|
||||
| | | MIG M. |
|
||||
|===============================+======================+======================|
|
||||
| 0 NVIDIA GeForce ... On | 00000000:00:10.0 Off | N/A |
|
||||
| 0% 38C P8 11W / 160W | 1MiB / 4096MiB | 0% Default |
|
||||
| | | N/A |
|
||||
+-------------------------------+----------------------+----------------------+
|
||||
|
||||
+-----------------------------------------------------------------------------+
|
||||
| Processes: |
|
||||
| GPU GI CI PID Type Process name GPU Memory |
|
||||
| ID ID Usage |
|
||||
|=============================================================================|
|
||||
| No running processes found |
|
||||
+-----------------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
### Install Nvidia Container Runtime
|
||||
|
||||
#### Take snapshot
|
||||
|
||||
- [x] Done
|
||||
|
||||
#### Install curl
|
||||
|
||||
```shell
|
||||
apt-get install curl
|
||||
```
|
||||
|
||||
#### Add repo
|
||||
|
||||
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-apt
|
||||
|
||||
```shell
|
||||
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
|
||||
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
|
||||
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
|
||||
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
|
||||
```
|
||||
|
||||
```shell
|
||||
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
|
||||
```
|
||||
|
||||
### Update Containerd config
|
||||
|
||||
#### Select nvidia-container-runtime as new runtime for Containerd
|
||||
|
||||
> No clue if this is a requirement! as afterward also did more changes to the configuration.
|
||||
|
||||
```shell
|
||||
sudo sed -i 's/runtime = "runc"/runtime = "nvidia-container-runtime"/g' /etc/containerd/config.toml
|
||||
```
|
||||
|
||||
#### Reboot Containerd service
|
||||
|
||||
```shell
|
||||
sudo systemctl restart containerd
|
||||
```
|
||||
|
||||
#### Check status from Containerd
|
||||
|
||||
Check if Containerd has initialized correctly after restarting the service.
|
||||
|
||||
```shell
|
||||
sudo systemctl status containerd
|
||||
```
|
||||
|
||||
### Test nvidia runtime
|
||||
|
||||
#### Pull nvidia cuda image
|
||||
|
||||
I used the Ubuntu based container since I didn't find one specific for Debian.
|
||||
|
||||
```shell
|
||||
sudo ctr images pull docker.io/nvidia/cuda:12.3.1-base-ubuntu20.04
|
||||
```
|
||||
|
||||
```text
|
||||
docker.io/nvidia/cuda:12.3.1-base-ubuntu20.04: resolved |++++++++++++++++++++++++++++++++++++++|
|
||||
index-sha256:0654b44e2515f03b811496d0e2d67e9e2b81ca1f6ed225361bb3e3bb67d22e18: done |++++++++++++++++++++++++++++++++++++++|
|
||||
manifest-sha256:7d8fdd2a5e96ec57bc511cda1fc749f63a70e207614b3485197fd734359937e7: done |++++++++++++++++++++++++++++++++++++++|
|
||||
layer-sha256:25ad149ed3cff49ddb57ceb4418377f63c897198de1f9de7a24506397822de3e: done |++++++++++++++++++++++++++++++++++++++|
|
||||
layer-sha256:1698c67699a3eee2a8fc185093664034bb69ab67c545ab6d976399d5500b2f44: done |++++++++++++++++++++++++++++++++++++++|
|
||||
config-sha256:d13839a3c4fbd332f324c135a279e14c432e90c8a03a9cedc43ddf3858f882a7: done |++++++++++++++++++++++++++++++++++++++|
|
||||
layer-sha256:ba7b66a9df40b8a1c1a41d58d7c3beaf33a50dc842190cd6a2b66e6f44c3b57b: done |++++++++++++++++++++++++++++++++++++++|
|
||||
layer-sha256:c5f2ffd06d8b1667c198d4f9a780b55c86065341328ab4f59d60dc996ccd5817: done |++++++++++++++++++++++++++++++++++++++|
|
||||
layer-sha256:520797292d9250932259d95f471bef1f97712030c1d364f3f297260e5fee1de8: done |++++++++++++++++++++++++++++++++++++++|
|
||||
elapsed: 4.2 s
|
||||
```
|
||||
|
||||
#### Start container
|
||||
|
||||
Containerd already has access to the nvidia gpu/drivers
|
||||
|
||||
```shell
|
||||
sudo ctr run --rm --gpus 0 docker.io/nvidia/cuda:12.3.1-base-ubuntu20.04 nvidia-smi nvidia-smi
|
||||
```
|
||||
|
||||
```text
|
||||
Thu Dec 14 23:18:55 2023
|
||||
+-----------------------------------------------------------------------------+
|
||||
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.3 |
|
||||
|-------------------------------+----------------------+----------------------+
|
||||
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
|
||||
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|
||||
| | | MIG M. |
|
||||
|===============================+======================+======================|
|
||||
| 0 NVIDIA GeForce ... On | 00000000:00:10.0 Off | N/A |
|
||||
| 0% 41C P8 11W / 160W | 1MiB / 4096MiB | 0% Default |
|
||||
| | | N/A |
|
||||
+-------------------------------+----------------------+----------------------+
|
||||
|
||||
+-----------------------------------------------------------------------------+
|
||||
| Processes: |
|
||||
| GPU GI CI PID Type Process name GPU Memory |
|
||||
| ID ID Usage |
|
||||
|=============================================================================|
|
||||
| No running processes found |
|
||||
+-----------------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
### Set the GPU available in the Kubernetes Node
|
||||
|
||||
We `still` don't have the GPU added/available in the Node.
|
||||
|
||||
```shell
|
||||
kubectl describe nodes | tr -d '\000' | sed -n -e '/^Name/,/Roles/p' -e '/^Capacity/,/Allocatable/p' -e '/^Allocated resources/,/Events/p' | grep -e Name -e nvidia.com | perl -pe 's/\n//' | perl -pe 's/Name:/\n/g' | sed 's/nvidia.com\/gpu:\?//g' | sed '1s/^/Node Available(GPUs) Used(GPUs)/' | sed 's/$/ 0 0 0/' | awk '{print $1, $2, $3}' | column -t
|
||||
```
|
||||
|
||||
```text
|
||||
Node Available(GPUs) Used(GPUs)
|
||||
pi4.filter.home 0 0
|
||||
slave01.filter.home 0 0
|
||||
slave02.filter.home 0 0
|
||||
slave03.filter.home 0 0
|
||||
slave04.filter.home 0 0
|
||||
```
|
||||
|
||||
#### Update
|
||||
|
||||
Set Containerd config with the following settings.
|
||||
|
||||
Obv do a backup of the config before proceeding to modify the file.
|
||||
|
||||
```toml
|
||||
# /etc/containerd/config.toml
|
||||
version = 2
|
||||
[plugins]
|
||||
[plugins."io.containerd.grpc.v1.cri"]
|
||||
[plugins."io.containerd.grpc.v1.cri".containerd]
|
||||
default_runtime_name = "nvidia"
|
||||
|
||||
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
|
||||
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
|
||||
privileged_without_host_devices = false
|
||||
runtime_engine = ""
|
||||
runtime_root = ""
|
||||
runtime_type = "io.containerd.runc.v2"
|
||||
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
|
||||
BinaryName = "/usr/bin/nvidia-container-runtime"
|
||||
```
|
||||
#### Restart containerd (again)
|
||||
|
||||
```shell
|
||||
sudo systemctl restart containerd
|
||||
```
|
||||
|
||||
#### Check status from Containerd
|
||||
|
||||
Check if Containerd has initialized correctly after restarting the service.
|
||||
|
||||
```shell
|
||||
sudo systemctl status containerd
|
||||
```
|
||||
|
||||
#### Set some labels to avoid spread
|
||||
|
||||
We will deploy Nvidia CRDs so will tag the Kubernetes nodes that **won't** have a GPU available to avoid running GPU related stuff on them.
|
||||
|
||||
```shell
|
||||
kubectl label nodes slave0{1..3}.filter.home nvidia.com/gpu.deploy.operands=false
|
||||
```
|
||||
|
||||
#### Deploy nvidia operators
|
||||
|
||||
"Why this `--set` flags?"
|
||||
|
||||
- Cause that's what worked out for me. Don't like it? Want to explore? Just try which combination works for you idk.
|
||||
|
||||
```shell
|
||||
helm install --wait --generate-name \
|
||||
nvidia/gpu-operator \
|
||||
--set operator.defaultRuntime="containerd"\
|
||||
-n gpu-operator \
|
||||
--set driver.enabled=false \
|
||||
--set toolkit.enabled=false
|
||||
```
|
||||
|
||||
### Check running pods
|
||||
|
||||
Check all the pods are running (or have completed)
|
||||
|
||||
```shell
|
||||
kubectl get pods -n gpu-operator -owide
|
||||
```
|
||||
```text
|
||||
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
|
||||
gpu-feature-discovery-4nctr 1/1 Running 0 9m34s 172.16.241.67 slave04.filter.home <none> <none>
|
||||
gpu-operator-1702608759-node-feature-discovery-gc-79d6bb94h6fht 1/1 Running 0 9m57s 172.16.176.63 slave03.filter.home <none> <none>
|
||||
gpu-operator-1702608759-node-feature-discovery-master-64c5nwww4 1/1 Running 0 9m57s 172.16.86.110 pi4.filter.home <none> <none>
|
||||
gpu-operator-1702608759-node-feature-discovery-worker-72wqk 1/1 Running 0 9m57s 172.16.106.5 slave02.filter.home <none> <none>
|
||||
gpu-operator-1702608759-node-feature-discovery-worker-7snt4 1/1 Running 0 9m57s 172.16.86.111 pi4.filter.home <none> <none>
|
||||
gpu-operator-1702608759-node-feature-discovery-worker-9ngnw 1/1 Running 0 9m56s 172.16.176.5 slave03.filter.home <none> <none>
|
||||
gpu-operator-1702608759-node-feature-discovery-worker-csnfq 1/1 Running 0 9m56s 172.16.241.123 slave04.filter.home <none> <none>
|
||||
gpu-operator-1702608759-node-feature-discovery-worker-k6dxf 1/1 Running 0 9m57s 172.16.247.8 slave01.filter.home <none> <none>
|
||||
gpu-operator-fcbd9bbd7-fv5kb 1/1 Running 0 9m57s 172.16.86.116 pi4.filter.home <none> <none>
|
||||
nvidia-cuda-validator-xjfkr 0/1 Completed 0 5m37s 172.16.241.126 slave04.filter.home <none> <none>
|
||||
nvidia-dcgm-exporter-q8kk4 1/1 Running 0 9m35s 172.16.241.125 slave04.filter.home <none> <none>
|
||||
nvidia-device-plugin-daemonset-vvz4c 1/1 Running 0 9m35s 172.16.241.127 slave04.filter.home <none> <none>
|
||||
nvidia-operator-validator-8899m 1/1 Running 0 9m35s 172.16.241.124 slave04.filter.home <none> <none>
|
||||
```
|
||||
|
||||
### Done!
|
||||
|
||||
```shell
|
||||
kubectl describe nodes | tr -d '\000' | sed -n -e '/^Name/,/Roles/p' -e '/^Capacity/,/Allocatable/p' -e '/^Allocated resources/,/Events/p' | grep -e Name -e nvidia.com | perl -pe 's/\n//' | perl -pe 's/Name:/\n/g' | sed 's/nvidia.com\/gpu:\?//g' | sed '1s/^/Node Available(GPUs) Used(GPUs)/' | sed 's/$/ 0 0 0/' | awk '{print $1, $2, $3}' | column -t
|
||||
```
|
||||
|
||||
```text
|
||||
Node Available(GPUs) Used(GPUs)
|
||||
pi4.filter.home 0 0
|
||||
slave01.filter.home 0 0
|
||||
slave02.filter.home 0 0
|
||||
slave03.filter.home 0 0
|
||||
slave04.filter.home 1 0
|
||||
```
|
||||
|
||||
### vGPU
|
||||
|
||||
I could use vGPU and split my GPU among multiple VMs, but, it would also mean that the GPU no longer posts to the Physical Monitor attached to the Proxmox PC/Server, which I would like to avoid.
|
||||
|
||||
Meanwhile, it's certainly not a requirement (and I only use the monitor on emergencies/whenever I need to touch the BIOS/Install a new OS), I **still** don't own a Serial connector, therefore I will consider making the change to use vGPU **in the future** (whenever I receive the package from Aliexpress, and I confirm it works).
|
||||
|
||||
|
||||
|
||||
[//]: # (```shell)
|
||||
|
||||
[//]: # (kubectl events pods --field-selector status.phase!=Running -n gpu-operator)
|
||||
|
||||
[//]: # (```)
|
||||
|
||||
[//]: # ()
|
||||
[//]: # (```shell)
|
||||
|
||||
[//]: # (kubectl get pods --field-selector status.phase!=Running -n gpu-operator | awk '{print $1}' | tail -n +2 | xargs kubectl events -n gpu-operator pods)
|
||||
[//]: # (```)
|
||||
|
||||
|
||||
## Jellyfin GPU Acceleration
|
||||
|
||||
- [x] Configured Jellyfin with GPU acceleration
|
||||
|
||||
## Make Cluster HA
|
||||
|
||||
- [ ] Done
|
||||
- [x] Aborted
|
||||
|
||||
Since it would mostly require to recreate the cluster, I would like to have the DNS/DHCP service externalized to the cluster, or a Load Balancer external to the cluster, etc etc.
|
||||
|
||||
So, I rather have a cluster with 2 points of failure:
|
||||
|
||||
- Single control plane
|
||||
- No HA NFS/NAS
|
||||
|
||||
Then to having an Uroboros for Cluster.
|
||||
|
||||
I also just thought on having a DNS failover
|
||||
|
||||
But it's not the current case, as
|
||||
|
||||
## Update rest of the stuff/configs as required to match the new Network distribution
|
||||
|
||||
Which stuff?
|
||||
|
||||
IDK. It's an OS in case I'm forgetting something
|
||||
|
||||
- [x] Done Aka. everything seems to be running correctly
|
||||
|
||||
## Migrade Jenkins
|
||||
|
||||
https://devopscube.com/jenkins-build-agents-kubernetes/
|
||||
|
||||
https://www.jenkins.io/doc/book/installing/kubernetes/
|
||||
|
||||
- [x] Done
|
||||
|
||||
## Skaffold
|
||||
|
||||
- Learned to use Skaffold, yet requires manual execution.
|
||||
|
||||
- It's great tho
|
||||
|
||||
https://skaffold.dev/docs/references/yaml/
|
||||
|
||||
https://skaffold.dev/docs/builders/cross-platform/
|
||||
|
||||
## CI/CD Container creation
|
||||
|
||||
I have decided dump my old Jenkins architecture and rely on Skaffold, it's great.
|
||||
|
||||
I will work on integrating it with Jenkins.
|
||||
|
||||
|
25
Migrations/Say_HI_to_Proxmox/dhcp_notes.md
Normal file
25
Migrations/Say_HI_to_Proxmox/dhcp_notes.md
Normal file
@ -0,0 +1,25 @@
|
||||
# Initial notes
|
||||
```
|
||||
.1 Gateway
|
||||
|
||||
.2/3 DHCP-DNS
|
||||
|
||||
9-6 Kubernetes masters.
|
||||
10-15 Kubernetes slaves.
|
||||
|
||||
20 Public Ingress
|
||||
21 Local Ingress
|
||||
22-38 Kubernetes LBs/Deployments/Services
|
||||
39 Egress gateway
|
||||
|
||||
50-60 Standalone Hosts
|
||||
61-70 Proxmox
|
||||
|
||||
100-120 VMs
|
||||
|
||||
140-149 Handpicked client hosts
|
||||
|
||||
150-200 DHCP range
|
||||
|
||||
250-255 Wifi and stuff
|
||||
```
|
169
README.md
169
README.md
@ -3,7 +3,21 @@ gitea: none
|
||||
include_toc: true
|
||||
---
|
||||
|
||||
## Older patch notes/version
|
||||
|
||||
Select different tags.
|
||||
|
||||
## TLDR Changelog
|
||||
|
||||
- Replaced the old standalone Docker/NFS server for a Proxmox/NFS instance.
|
||||
|
||||
- Added 2 VMs as worker nodes to the cluster, they will be used/are intended for x64 bit images.
|
||||
|
||||
- One of the new added worker VMs receives a GPU through Proxmox PCI pass through.
|
||||
|
||||
- Some services might have been removed or added.
|
||||
|
||||
# Devices
|
||||
|
||||
## List of current devices:
|
||||
|
||||
@ -11,122 +25,83 @@ include_toc: true
|
||||
|
||||
```yaml
|
||||
Gateway: 192.168.1.1
|
||||
Pi4: 192.168.1.2
|
||||
Srv: 192.168.1.3
|
||||
Proxmox/NFS: somwhere.
|
||||
```
|
||||
|
||||
### Kluster
|
||||
|
||||
> Kubernetes Cluster
|
||||
|
||||
A set of Orange PI 5, so far all of them are the 4GB of RAM version.
|
||||
- Pi 4 with 4GB running as a Master. (Masterk/Pi4)
|
||||
|
||||
- A pair of Orange PI 5, so far all of them are the 8GB of RAM version. (Slave01-2)
|
||||
|
||||
- Proxmox VMs, both with 3 CPU cores and 8GB of RAM (Slave03-4)
|
||||
|
||||
- `Slave04` contains a GPU through Proxmox CPU pass through.
|
||||
|
||||
```yaml
|
||||
Masterk: 192.168.1.10
|
||||
Slave01: 192.168.1.11
|
||||
Masterk: 192.168.1.9
|
||||
Slave01: 192.168.1.10
|
||||
Slave02: 192.168.1.11
|
||||
Slave03: 192.168.1.12
|
||||
Slave04: 192.168.1.13
|
||||
```
|
||||
|
||||
## Which services are running where.
|
||||
```yaml
|
||||
Node Available(GPUs) Used(GPUs)
|
||||
pi4.filter.home 0 0
|
||||
slave01.filter.home 0 0
|
||||
slave02.filter.home 0 0
|
||||
slave03.filter.home 0 0
|
||||
slave04.filter.home 1 0
|
||||
```
|
||||
|
||||
> **Note**:
|
||||
> `Depracated` doesn't mean that the service has obliterated, but that the service is no longer being run in that specific node/instance.
|
||||
## Which services I'm hosting
|
||||
|
||||
### Pi4 (main reverse proxy)
|
||||
### Home Network
|
||||
|
||||
> Initially the Pi4 would only contain lightweight services, performing "core" functions on the network, as well of providing access to some very specific web services that wouldn't incur in much load (such as DNS, DHCP, Gitea, DuckDNS IP updater and `Tube` + `Traefik` as a main reverse proxy for the network).
|
||||
- CoreDNS
|
||||
- DHCPd
|
||||
|
||||
Services run on `docker` / `docker-compose`.
|
||||
### Discord Bots
|
||||
|
||||
#### Containers
|
||||
- Traefik
|
||||
- Gitea
|
||||
- Portainer
|
||||
- Registry
|
||||
- containrrr/watchtower
|
||||
- https://gitea.filterhome.xyz/ofilter/Steam_Invite_Discord (both Master and Dev branches)
|
||||
- Shlink + ShlinkUI (deployed as it has functionality with the Steam Discord Bot from above)
|
||||
|
||||
##### Monitoring
|
||||
### Public DNS
|
||||
|
||||
- grafana
|
||||
- prometheus
|
||||
- alert manager
|
||||
- zcube/cadvisor
|
||||
|
||||
##### Home Network
|
||||
- Coredns
|
||||
- dhcpd
|
||||
- Godaddy
|
||||
- Duckdns
|
||||
|
||||
##### Misc
|
||||
### CRDs
|
||||
|
||||
- DuckDNS
|
||||
- emulatorjs
|
||||
- [Steam_Invite_Discord](https://gitea.filterhome.xyz/ofilter/Steam_Invite_Discord)
|
||||
|
||||
##### Depracated
|
||||
|
||||
- bind9 DNS
|
||||
- [Internet speedtest metrics](https://github.com/nickmaccarthy/internet-speed-test-metrics)
|
||||
- kanboard
|
||||
- mantis
|
||||
- minecraft server + [Minecraft Discord Bot](https://gitea.filterhome.xyz/ofilter/Minecraft_Discord_Bot)
|
||||
- [FGO Tools](https://github.com/OriolFilter/FGO_tools)
|
||||
- muximix
|
||||
- openvpn
|
||||
- Plex
|
||||
- Protainer
|
||||
- mantis
|
||||
- [speedtest_container](https://gitea.filterhome.xyz/ofilter/speedtest_contiainer)
|
||||
- splunk
|
||||
- vaultwarden
|
||||
|
||||
|
||||
|
||||
### Srv (main media server)
|
||||
|
||||
> Initially the server would contain media services and some with higher load, like Minecraft and factorio servers. Right now this server is the designated media server provider, and as well contains other more generalized services, as currently in planning a migration to reorganize the infrastructure.
|
||||
|
||||
Services run on `docker` / `docker-compose`.
|
||||
|
||||
#### Containers
|
||||
|
||||
- Traefik
|
||||
- Portainer
|
||||
- Jenkins
|
||||
- containrrr/watchtower
|
||||
- zcube/cadvisor
|
||||
|
||||
##### Media
|
||||
|
||||
- kizaing/kavita
|
||||
- prologic/tube
|
||||
- gotson/komga
|
||||
- lscr.io/linuxserver/qbittorrent
|
||||
- grafana
|
||||
- lscr.io/linuxserver/jellyfin
|
||||
- difegue/lanraragi
|
||||
- filebrowser/filebrowser
|
||||
|
||||
##### Misc
|
||||
|
||||
- chesscorp/chess-club
|
||||
|
||||
##### Depracated
|
||||
|
||||
##### Notes
|
||||
|
||||
Traefik generates public certificates automatically
|
||||
|
||||
> https://doc.traefik.io/traefik/https/acme/
|
||||
|
||||
#### Kluster
|
||||
|
||||
> Idk I can run whatever I want.\
|
||||
> So far been a playground of Istio for me to create [an Istio documentation](https://gitea.filterhome.xyz/ofilter/Istio_Examples).
|
||||
|
||||
|
||||
- Cilium
|
||||
- Istio Service Mesh
|
||||
- Cert Manager
|
||||
- Istio
|
||||
- Nvidia Gpu Operator
|
||||
- NFS Volume Provisioner
|
||||
- MetalLB
|
||||
|
||||
### Observability
|
||||
|
||||
##### Services
|
||||
- Grafana
|
||||
- Prometheus
|
||||
- Kiali
|
||||
- Jaeger
|
||||
|
||||
-
|
||||
### CI/CD
|
||||
|
||||
- Jenkins master + dynamic agent(s)
|
||||
- Docker Registry
|
||||
|
||||
### Git servers
|
||||
|
||||
- Gitea
|
||||
|
||||
### Media related
|
||||
|
||||
- Tube
|
||||
- Firebrowser
|
||||
- Filebrowser
|
||||
- Jellyfin
|
||||
- qBitTorrent
|
||||
|
@ -1,42 +0,0 @@
|
||||
https://github.com/mikeroyal/Self-Hosting-Guide#backups
|
||||
|
||||
https://github.com/mikeroyal/Self-Hosting-Guide#snapshots-managementsystem-recovery
|
||||
|
||||
https://github.com/mikeroyal/Self-Hosting-Guide#file-systems
|
||||
|
||||
https://github.com/mikeroyal/Self-Hosting-Guide#storage
|
||||
|
||||
https://goteleport.com/
|
||||
|
||||
|
||||
|
||||
---
|
||||
|
||||
Volumes
|
||||
|
||||
|
||||
https://github.com/seaweedfs/seaweedfs
|
||||
|
||||
|
||||
---
|
||||
DNS
|
||||
|
||||
https://github.com/awesome-selfhosted/awesome-selfhosted#dns
|
||||
|
||||
https://github.com/awesome-foss/awesome-sysadmin#dns---control-panels--domain-management
|
||||
|
||||
---
|
||||
#3dp
|
||||
|
||||
https://github.com/Floppy/van_dam
|
||||
|
||||
---
|
||||
|
||||
? https://goteleport.com/
|
||||
|
||||
|
||||
---
|
||||
|
||||
Gitea thingies
|
||||
|
||||
https://docs.gitea.com/awesome?_highlight=content#sdk
|
Reference in New Issue
Block a user