Added simple monitoring examples.

They are based off Helm PrometheusStack community Chart.
2024-01-15 21:32:14 +01:00
parent 942a3bf8ae
commit 6cb3c9fa50
9 changed files with 647 additions and 0 deletions
--- a/13-monitoring/01-Create_Prometheus_Stack/README.md
+++ b/13-monitoring/01-Create_Prometheus_Stack/README.md
@@ -0,0 +1,353 @@
+---
+gitea: none
+include_toc: true
+---
+
+# Description
+
+This example deploys a Prometheus stack (Prometheus, Grafana, Alert Manager) through helm.
+
+This will be used as a base for the future examples.
+
+It's heavily recommended to have a base knowledge of Istio before proceeding to modify the settings according to your needs.
+
+## Requisites
+
+- Istio deployed and running at the namespace `istio-system`.
+- Helm installed.
+
+# Istio Files
+
+## Gateway
+
+Simple HTTP gateway.
+
+It only allows traffic from the domain `my.home`, and it's subdomains.
+
+Listens to the port 80 and expects HTTP (unencrypted) requests.
+
+> **Note:**
+> I assume the Gateway is already deployed, therefore on the walkthrough it's not mentioned nor specified. If you don't have a gateway, proceed to deploy one before continuing. 
+
+```yaml
+apiVersion: networking.istio.io/v1alpha3
+kind: Gateway
+metadata:
+  name: local-gateway
+  namespace: default
+spec:
+  selector:
+    istio: local-ingress
+  servers:
+    - port:
+        number: 80
+        name: http
+        protocol: HTTP
+      hosts:
+        - "my.home"
+        - "*.filter.home"
+```
+
+## VirtualService.yaml
+
+2 simple Virtual Services for the Grafana and Prometheus services/dashboards.
+
+URL for each one are:
+
+- prometheus.my.home
+
+- grafana.my.home
+
+```yaml
+apiVersion: networking.istio.io/v1alpha3
+kind: VirtualService
+metadata:
+  name: grafana-vs
+  namespace: default
+  labels:
+    app: grafana
+spec:
+  hosts:
+    - "grafana.my.home"
+  gateways:
+    - default/local-gateway
+  http:
+    - route:
+        - destination:
+            host: prometheus-stack-01-grafana.observability.svc.cluster.local
+            port:
+              number: 80
+---
+apiVersion: networking.istio.io/v1alpha3
+kind: VirtualService
+metadata:
+  name: prometheus-vs
+  namespace: observability
+  labels:
+    app: prometheus
+spec:
+  hosts:
+    - "prometheus.my.home"
+  gateways:
+    - default/local-gateway
+  http:
+    - route:
+        - destination:
+            host: prometheus-stack-01-kube-p-prometheus.observability.svc.cluster.local
+            port:
+              number: 9090
+```
+
+# Walkthrough
+
+## Create Observability NS
+
+```shell
+kubectl create namespace
+```
+
+Placeholder namespace annotation, **istio-injection** will be enabled after the installation is completed.
+
+If istio-injection is enabled, Helm installation will **fail**.
+
+I have to check on what/why.
+
+```shell
+kubectl label namespaces observability istio-injection=disabled --overwrite=true
+```
+
+# PersistentVolume
+
+I'm using a NFS provisioner, you can use whatever you want. (Optional)
+
+On the file `stack_values.yaml` I specified that 2 volumes will be provisioned, one for Prometheus, and another one for AlertManager.
+
+If you don't want to provision volumes, set that file to blank, or on the installation step, remove the line that specifies such line.
+
+As well increased the retention from 10 days (default value), to 30 days, but since you won't have a volume, don't think that will be much of an issue for you...
+
+## Installation
+
+I will be installing Prometheus Operator through Helm.
+
+```shell
+helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
+```
+
+```text
+"prometheus-community" has been added to your repositories
+```
+
+```shell
+helm show values prometheus-community/kube-prometheus-stack
+```
+
+```text
+A lot of text, recommended to save the output on a file and you go through it (at latest use control+f or whatever other search option to find the things you might be interested on replacing/changing)
+```
+
+My stack_Values.yaml file is:
+
+```yaml
+prometheus:
+  prometheusSpec:
+    retention: "30d"
+    storageSpec:
+      volumeClaimTemplate:
+        spec:
+          storageClassName: slow-nfs-01
+          accessModes: [ReadWriteOnce]
+          resources:
+            requests:
+              storage: 50Gi
+alertmanager:
+  alertmanagerSpec:
+    storage:
+      volumeClaimTemplate:
+        spec:
+          storageClassName: slow-nfs-01
+          accessModes: [ReadWriteOnce]
+          resources:
+            requests:
+              storage: 10Gi
+```
+
+Besides the volumes mentioned in [here](#persistentvolume), increased the retention from 10 days to 30.
+
+If you haven't configured a PersistentVolume storage, just skip the `--set` lines referencing such. Note that once the pod is restarted, all data will be lost.
+
+```shell
+helm install prometheus-stack-01 prometheus-community/kube-prometheus-stack \
+  -n observability \
+  --values ./src/stack_values.yaml
+```
+
+```text
+NAME: prometheus-stack-01
+LAST DEPLOYED: Sun Jan 14 22:34:11 2024
+NAMESPACE: observability
+STATUS: deployed
+REVISION: 1
+NOTES:
+kube-prometheus-stack has been installed. Check its status by running:
+  kubectl --namespace observability get pods -l "release=prometheus-stack-01"
+
+Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
+```
+
+### Check running pods in namespace
+
+Everything seems to be deployed and working correctly.
+
+```shell
+kubectl get pods -n observability
+```
+
+```text
+NAME                                                      READY   STATUS    RESTARTS   AGE
+alertmanager-prometheus-stack-01-kube-p-alertmanager-0    2/2     Running   0          73s
+prometheus-prometheus-stack-01-kube-p-prometheus-0        2/2     Running   0          73s
+prometheus-stack-01-grafana-69bd95649b-w67xg              3/3     Running   0          76s
+prometheus-stack-01-kube-p-operator-b97d5f9cc-cm2pl       1/1     Running   0          76s
+prometheus-stack-01-kube-state-metrics-554fd7bf8b-z62gv   1/1     Running   0          76s
+prometheus-stack-01-prometheus-node-exporter-7bwbd        1/1     Running   0          76s
+prometheus-stack-01-prometheus-node-exporter-dvqc6        1/1     Running   0          76s
+prometheus-stack-01-prometheus-node-exporter-nfm5g        1/1     Running   0          76s
+prometheus-stack-01-prometheus-node-exporter-ssfkb        1/1     Running   0          76s
+```
+
+### Enable Istio Injection
+
+Let's enable back istio-injection on the namespace.
+
+```shell
+kubectl label namespaces observability istio-injection=enabled --overwrite=true
+```
+
+### Delete all pods so are recreated with the istio sidecar
+
+To update the containers we will need to delete/recreate all of them.
+
+```shell
+kubectl delete pods -n observability --all
+```
+
+```text
+pod "alertmanager-prometheus-stack-01-kube-p-alertmanager-0" deleted
+pod "prometheus-prometheus-stack-01-kube-p-prometheus-0" deleted
+pod "prometheus-stack-01-grafana-69bd95649b-w67xg" deleted
+pod "prometheus-stack-01-kube-p-operator-b97d5f9cc-cm2pl" deleted
+pod "prometheus-stack-01-kube-state-metrics-554fd7bf8b-z62gv" deleted
+pod "prometheus-stack-01-prometheus-node-exporter-7bwbd" deleted
+pod "prometheus-stack-01-prometheus-node-exporter-dvqc6" deleted
+pod "prometheus-stack-01-prometheus-node-exporter-nfm5g" deleted
+pod "prometheus-stack-01-prometheus-node-exporter-ssfkb" deleted
+```
+
+### Check pods status (again)
+
+Everything seems to be deployed and working correctly.
+
+```shell
+kubectl get pods -n observability
+```
+
+```text
+NAME                                                      READY   STATUS    RESTARTS      AGE
+alertmanager-prometheus-stack-01-kube-p-alertmanager-0    3/3     Running   0             44s
+prometheus-prometheus-stack-01-kube-p-prometheus-0        3/3     Running   0             43s
+prometheus-stack-01-grafana-69bd95649b-24v58              4/4     Running   0             46s
+prometheus-stack-01-kube-p-operator-b97d5f9cc-5bdwh       2/2     Running   1 (43s ago)   46s
+prometheus-stack-01-kube-state-metrics-554fd7bf8b-wjw4d   2/2     Running   2 (41s ago)   46s
+prometheus-stack-01-prometheus-node-exporter-4266g        1/1     Running   0             46s
+prometheus-stack-01-prometheus-node-exporter-lmxdj        1/1     Running   0             45s
+prometheus-stack-01-prometheus-node-exporter-shd72        1/1     Running   0             45s
+prometheus-stack-01-prometheus-node-exporter-wjhdr        1/1     Running   0             45s
+```
+
+### Gateway
+
+I have my gateways already created (on this scenario I will be using the local gateway).
+
+### VirtualService
+
+I will create 2 Virtual Service entries, one for the Grafana dashboard, and another for the Prometheus dashboard:
+
+- Prometheus dashboard URL: "prometheus.llb.filter.home"
+- Grafana dashboard URL: "grafana.llb.filter.home"
+
+```text
+kubectl apply -f ./src/VirtualService.yaml
+```
+
+```shell
+virtualservice.networking.istio.io/grafana-vs created
+virtualservice.networking.istio.io/prometheus-vs created
+```
+
+## Prometheus
+
+As a simple example of being able to access kubernetes metrics, you can run the following promql queries:
+
+### Running pods per node
+
+We can see the value "node=XXXX", which matches the node from our Kubernetes nodes available within the cluster.
+
+```promql
+kubelet_running_pods
+```
+
+### Running pods per namespace
+
+Right now, on the namespace "observability" I have a total of 9 pods running.
+
+```promql
+sum(kube_pod_status_ready) by (namespace)
+```
+
+You can verify this by running:
+
+```shell
+kubectl get pods -n observability --no-headers=true | nl
+```
+
+```text
+     1  alertmanager-prometheus-stack-01-kube-p-alertmanager-0    3/3   Running   0             40m
+     2  prometheus-prometheus-stack-01-kube-p-prometheus-0        3/3   Running   0             40m
+     3  prometheus-stack-01-grafana-69bd95649b-24v58              4/4   Running   0             40m
+     4  prometheus-stack-01-kube-p-operator-b97d5f9cc-5bdwh       2/2   Running   1 (40m ago)   40m
+     5  prometheus-stack-01-kube-state-metrics-554fd7bf8b-wjw4d   2/2   Running   2 (40m ago)   40m
+     6  prometheus-stack-01-prometheus-node-exporter-4266g        1/1   Running   0             40m
+     7  prometheus-stack-01-prometheus-node-exporter-lmxdj        1/1   Running   0             40m
+     8  prometheus-stack-01-prometheus-node-exporter-shd72        1/1   Running   0             40m
+     9  prometheus-stack-01-prometheus-node-exporter-wjhdr        1/1   Running   0             40m
+```
+
+Which returns a total of 9 pods, with s status "running"
+
+### Running containers per namespace
+
+Currently, this is returning 18 containers running on the namespace **observability**.
+
+```promql
+sum(kube_pod_container_status_running) by (namespace)
+```
+
+Very much, listing again the pods running within the namespace, and just counting the values, I can confirm the total of containers running within the namespace, totals up to 18, matching the prometheus data.
+
+```shell
+kubectl get pods -n observability
+```
+
+```text
+NAME                                                      READY   STATUS    RESTARTS      AGE
+alertmanager-prometheus-stack-01-kube-p-alertmanager-0    3/3     Running   0             45m
+prometheus-prometheus-stack-01-kube-p-prometheus-0        3/3     Running   0             45m
+prometheus-stack-01-grafana-69bd95649b-24v58              4/4     Running   0             45m
+prometheus-stack-01-kube-p-operator-b97d5f9cc-5bdwh       2/2     Running   1 (45m ago)   45m
+prometheus-stack-01-kube-state-metrics-554fd7bf8b-wjw4d   2/2     Running   2 (45m ago)   45m
+prometheus-stack-01-prometheus-node-exporter-4266g        1/1     Running   0             45m
+prometheus-stack-01-prometheus-node-exporter-lmxdj        1/1     Running   0             45m
+prometheus-stack-01-prometheus-node-exporter-shd72        1/1     Running   0             45m
+prometheus-stack-01-prometheus-node-exporter-wjhdr        1/1     Running   0             45m
+```
--- a/13-monitoring/01-Create_Prometheus_Stack/src/Gateway.yaml
+++ b/13-monitoring/01-Create_Prometheus_Stack/src/Gateway.yaml
@@ -0,0 +1,16 @@
+apiVersion: networking.istio.io/v1alpha3
+kind: Gateway
+metadata:
+  name: local-gateway
+  namespace: default
+spec:
+  selector:
+    istio: local-ingress
+  servers:
+    - port:
+        number: 80
+        name: http
+        protocol: HTTP
+      hosts:
+        - "my.home"
+        - "*.filter.home"
--- a/13-monitoring/01-Create_Prometheus_Stack/src/VirtualService.yaml
+++ b/13-monitoring/01-Create_Prometheus_Stack/src/VirtualService.yaml
@@ -0,0 +1,37 @@
+apiVersion: networking.istio.io/v1alpha3
+kind: VirtualService
+metadata:
+  name: grafana-vs
+  namespace: default
+  labels:
+    app: grafana
+spec:
+  hosts:
+    - "grafana.my.home"
+  gateways:
+    - default/local-gateway
+  http:
+    - route:
+        - destination:
+            host: prometheus-stack-01-grafana.observability.svc.cluster.local
+            port:
+              number: 80
+---
+apiVersion: networking.istio.io/v1alpha3
+kind: VirtualService
+metadata:
+  name: prometheus-vs
+  namespace: observability
+  labels:
+    app: prometheus
+spec:
+  hosts:
+    - "prometheus.my.home"
+  gateways:
+    - default/local-gateway
+  http:
+    - route:
+        - destination:
+            host: prometheus-stack-01-kube-p-prometheus.observability.svc.cluster.local
+            port:
+              number: 9090
--- a/13-monitoring/01-Create_Prometheus_Stack/src/stack_values.yaml
+++ b/13-monitoring/01-Create_Prometheus_Stack/src/stack_values.yaml
@@ -0,0 +1,21 @@
+prometheus:
+  prometheusSpec:
+    retention: "30d"
+    storageSpec:
+      volumeClaimTemplate:
+        spec:
+          storageClassName: slow-nfs-01
+          accessModes: [ReadWriteOnce]
+          resources:
+            requests:
+              storage: 50Gi
+alertmanager:
+  alertmanagerSpec:
+    storage:
+      volumeClaimTemplate:
+        spec:
+          storageClassName: slow-nfs-01
+          accessModes: [ReadWriteOnce]
+          resources:
+            requests:
+              storage: 10Gi
--- a/13-monitoring/02-Add_Istio_Scrapping_Metrics/README.md
+++ b/13-monitoring/02-Add_Istio_Scrapping_Metrics/README.md
@@ -0,0 +1,60 @@
+## Description
+
+Through the use of Prometheus CRDs, we deploy a PodMonitor and ServiceMonitor objects, which will scrap metrics from the Envoy Proxies attached to each pod and Istiod deployment. 
+
+## Requirements
+
+- Complete step [01-Create_Prometheus_Stack](../01-Create_Prometheus_Stack)
+
+## Istio Metrics
+
+Now that a functional Prometheus-Grafana-Alert manager set up.
+
+The next step is to deploy scrapping Prometheus jobs/configs to gather:
+
+- Envoy proxy metrics
+- Istiod metrics.
+
+> **Note**: \
+> That the operators deployed are based off the [Istio Prometheus Operator Example](https://github.com/istio/istio/blob/1.20.2/samples/addons/extras/prometheus-operator.yaml)
+
+```shell
+kubectl create -f PrometheusIstioAgent.yaml
+```
+
+```text
+servicemonitor.monitoring.coreos.com/istiod-metrics-monitor created
+podmonitor.monitoring.coreos.com/envoy-stats-monitor created
+```
+
+To update the list of Prometheus targets, we can wait for a bit until it gets picked up automatically, idk give it a minute or two, get off the PC and grab some whatever or stretch your legs.
+
+### Check Targets
+
+Once the Prometheus pod is up and running again, if we access the website service, and access to the section **Status > Targets**, we can list all the available Targets.
+
+Once there, I am able to see the following entries:
+
+- **podMonitor/observability/envoy-stats-monitor/0 (15/15 up)**
+
+- **serviceMonitor/observability/istiod-metrics-monitor/0 (2/2 up)**
+
+### Check through Prometheus queries
+
+Now, back to the **Graph** section, we can confirm if we are receiving metrics from **Istiod** and **Envoy**.
+
+#### Istiod
+
+Very simple and straightforward, the uptime for each one of the **Istiod** pods.
+
+```promql
+istiod_uptime_seconds
+```
+
+#### Envoy
+
+Requests grouped by `destination_service_name`.
+
+```promql
+sum(istio_requests_total) by (destination_service_name)
+```
--- a/13-monitoring/02-Add_Istio_Scrapping_Metrics/src/PrometheusIstioAgent.yaml
+++ b/13-monitoring/02-Add_Istio_Scrapping_Metrics/src/PrometheusIstioAgent.yaml
@@ -0,0 +1,66 @@
+apiVersion: monitoring.coreos.com/v1
+kind: ServiceMonitor
+metadata:
+  labels:
+    app: kube-prometheus-stack-prometheus
+    release: prometheus-stack-01
+  name: istiod-metrics-monitor
+  namespace: observability
+spec:
+  jobLabel: istio
+  targetLabels: [app]
+  selector:
+    matchExpressions:
+      - {key: istio, operator: In, values: [pilot]}
+  namespaceSelector:
+    any: true
+  endpoints:
+    - port: http-monitoring
+      interval: 15s
+---
+apiVersion: monitoring.coreos.com/v1
+kind: PodMonitor
+metadata:
+  name: envoy-stats-monitor
+  labels:
+    app: kube-prometheus-stack-prometheus
+    release: prometheus-stack-01
+  namespace: observability
+spec:
+  selector:
+    matchExpressions:
+      - {key: istio-prometheus-ignore, operator: DoesNotExist}
+  namespaceSelector:
+    any: true
+  jobLabel: envoy-stats
+  podMetricsEndpoints:
+    - path: /stats/prometheus
+      interval: 15s
+      relabelings:
+        - action: keep
+          sourceLabels: [__meta_kubernetes_pod_container_name]
+          regex: "istio-proxy"
+        - action: keep
+          sourceLabels: [__meta_kubernetes_pod_annotationpresent_prometheus_io_scrape]
+        - action: replace
+          regex: (\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4})
+          replacement: '[$2]:$1'
+          sourceLabels:
+            - __meta_kubernetes_pod_annotation_prometheus_io_port
+            - __meta_kubernetes_pod_ip
+          targetLabel: __address__
+        - action: replace
+          regex: (\d+);((([0-9]+?)(\.|$)){4})
+          replacement: $2:$1
+          sourceLabels:
+            - __meta_kubernetes_pod_annotation_prometheus_io_port
+            - __meta_kubernetes_pod_ip
+          targetLabel: __address__
+        - action: labeldrop
+          regex: "__meta_kubernetes_pod_label_(.+)"
+        - sourceLabels: [__meta_kubernetes_namespace]
+          action: replace
+          targetLabel: namespace
+        - sourceLabels: [__meta_kubernetes_pod_name]
+          action: replace
+          targetLabel: pod_name
--- a/13-monitoring/03-Grafana_Istio_Dashboards/README.md
+++ b/13-monitoring/03-Grafana_Istio_Dashboards/README.md
@@ -0,0 +1,58 @@
+## Description
+
+Shares some dashboards ready to use once Istio metrics are added to the Prometheus stack.
+
+This is extremely simple to be honest.
+
+## Requirements
+
+- Complete step [02-Add_Istio_Scrapping_Metrics](../02-Add_Istio_Scrapping_Metrics)
+
+## Grafana
+
+### Default credentials
+
+> **Note:** \
+> Since Grafana has no storage/volume, **all changes will be lost**
+
+User: admin
+Password: prom-operator
+
+Just check any dashboard to see if it's working correctly.
+
+I personally recommend the dashboard:
+
+- **Node Exporter / USE Method / Node**
+
+Lists the resource utilization for each one of the Nodes.
+
+IDK check whatever you want, there are some good predefined graphs already.
+
+### Want to change crededntials?
+
+Just log into the admin user and change whatever the hell you want.
+
+Username, email, password.
+
+Select different preferences..., whatever.
+
+### Want to manage/create Users/Teams?
+
+Select `Administration` > `Users and Access`.
+
+There you will be able to create/manage **Users**, **Teams** and **Service Accounts**.
+
+### Istio related Dashboards
+
+Here is a list of ready to go Istio related dashboards that you might want to set up on your Grafana Deployment.
+
+- https://grafana.com/grafana/dashboards/7630-istio-workload-dashboard/
+- https://grafana.com/grafana/dashboards/7636-istio-service-dashboard/
+- https://grafana.com/grafana/dashboards/7645-istio-control-plane-dashboard/
+- https://grafana.com/grafana/dashboards/7639-istio-mesh-dashboard/
+
+The dashboards where found here:
+
+- https://grafana.com/orgs/istio/dashboards
+
+
--- a/13-monitoring/README.md
+++ b/13-monitoring/README.md
@@ -0,0 +1,28 @@
+Currently, in progress, more or less.
+
+Note that on this set of examples, steps 1, 2 and 3 are incremental, therefore they use resources set up in each one of the previous examples.
+
+
+
+Example of Alert Manager could be a simple "when X service has 80% of 503, rise alert"
+
+
+
+
+## Related link
+
+https://raw.githubusercontent.com/istio/istio/release-1.17/samples/httpbin/sample-client/fortio-deploy.yaml
+
+https://github.com/istio/istio/tree/master/samples/addons
+
+https://github.com/prometheus-operator/prometheus-operator
+
+https://istio.io/latest/docs/ops/integrations/prometheus/
+
+https://istio.io/latest/docs/ops/integrations/prometheus/#option-2-customized-scraping-configurations
+
+https://istio.io/latest/docs/ops/integrations/prometheus/#tls-settings
+
+https://www.reddit.com/r/istio/comments/usld1h/istio_mtls_and_prometheus_the_definitive/
+
+https://superorbital.io/blog/istio-metrics-merging/  <--- Interesting read
--- a/README.md
+++ b/README.md
@@ -8,6 +8,10 @@ Refer to the specific `README.md` in each example for more information.

 # Tree of folders

+```shell
+tree -d | grep -v src$
+```
+
 ```text
 ├── 00-Troubleshooting
 ├── 01-Getting_Started
@@ -54,6 +58,10 @@ Refer to the specific `README.md` in each example for more information.
 │   ├── 01-FaultInjection-delay
 │   └── 02-FaultInjection-abort
 ├── 12-CircuitBreaking
+├── 13-monitoring
+│   ├── 01-Create_Prometheus_Stack
+│   ├── 02-Add_Istio_Scrapping_Metrics
+│   └── 03-Grafana_Istio_Dashboards
 ├── 90-MixConfigs
 │   ├── 01-HTTPS-Gateway_Service_Entry
 │   └── Minecraft