Added simple monitoring examples.

They are based off Helm PrometheusStack community Chart.
This commit is contained in:
savagebidoof 2024-01-15 21:32:14 +01:00
parent 942a3bf8ae
commit 6cb3c9fa50
9 changed files with 647 additions and 0 deletions

View File

@ -0,0 +1,353 @@
---
gitea: none
include_toc: true
---
# Description
This example deploys a Prometheus stack (Prometheus, Grafana, Alert Manager) through helm.
This will be used as a base for the future examples.
It's heavily recommended to have a base knowledge of Istio before proceeding to modify the settings according to your needs.
## Requisites
- Istio deployed and running at the namespace `istio-system`.
- Helm installed.
# Istio Files
## Gateway
Simple HTTP gateway.
It only allows traffic from the domain `my.home`, and it's subdomains.
Listens to the port 80 and expects HTTP (unencrypted) requests.
> **Note:**
> I assume the Gateway is already deployed, therefore on the walkthrough it's not mentioned nor specified. If you don't have a gateway, proceed to deploy one before continuing.
```yaml
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: local-gateway
namespace: default
spec:
selector:
istio: local-ingress
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "my.home"
- "*.filter.home"
```
## VirtualService.yaml
2 simple Virtual Services for the Grafana and Prometheus services/dashboards.
URL for each one are:
- prometheus.my.home
- grafana.my.home
```yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: grafana-vs
namespace: default
labels:
app: grafana
spec:
hosts:
- "grafana.my.home"
gateways:
- default/local-gateway
http:
- route:
- destination:
host: prometheus-stack-01-grafana.observability.svc.cluster.local
port:
number: 80
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: prometheus-vs
namespace: observability
labels:
app: prometheus
spec:
hosts:
- "prometheus.my.home"
gateways:
- default/local-gateway
http:
- route:
- destination:
host: prometheus-stack-01-kube-p-prometheus.observability.svc.cluster.local
port:
number: 9090
```
# Walkthrough
## Create Observability NS
```shell
kubectl create namespace
```
Placeholder namespace annotation, **istio-injection** will be enabled after the installation is completed.
If istio-injection is enabled, Helm installation will **fail**.
I have to check on what/why.
```shell
kubectl label namespaces observability istio-injection=disabled --overwrite=true
```
# PersistentVolume
I'm using a NFS provisioner, you can use whatever you want. (Optional)
On the file `stack_values.yaml` I specified that 2 volumes will be provisioned, one for Prometheus, and another one for AlertManager.
If you don't want to provision volumes, set that file to blank, or on the installation step, remove the line that specifies such line.
As well increased the retention from 10 days (default value), to 30 days, but since you won't have a volume, don't think that will be much of an issue for you...
## Installation
I will be installing Prometheus Operator through Helm.
```shell
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
```
```text
"prometheus-community" has been added to your repositories
```
```shell
helm show values prometheus-community/kube-prometheus-stack
```
```text
A lot of text, recommended to save the output on a file and you go through it (at latest use control+f or whatever other search option to find the things you might be interested on replacing/changing)
```
My stack_Values.yaml file is:
```yaml
prometheus:
prometheusSpec:
retention: "30d"
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: slow-nfs-01
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 50Gi
alertmanager:
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
storageClassName: slow-nfs-01
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 10Gi
```
Besides the volumes mentioned in [here](#persistentvolume), increased the retention from 10 days to 30.
If you haven't configured a PersistentVolume storage, just skip the `--set` lines referencing such. Note that once the pod is restarted, all data will be lost.
```shell
helm install prometheus-stack-01 prometheus-community/kube-prometheus-stack \
-n observability \
--values ./src/stack_values.yaml
```
```text
NAME: prometheus-stack-01
LAST DEPLOYED: Sun Jan 14 22:34:11 2024
NAMESPACE: observability
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
kubectl --namespace observability get pods -l "release=prometheus-stack-01"
Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
```
### Check running pods in namespace
Everything seems to be deployed and working correctly.
```shell
kubectl get pods -n observability
```
```text
NAME READY STATUS RESTARTS AGE
alertmanager-prometheus-stack-01-kube-p-alertmanager-0 2/2 Running 0 73s
prometheus-prometheus-stack-01-kube-p-prometheus-0 2/2 Running 0 73s
prometheus-stack-01-grafana-69bd95649b-w67xg 3/3 Running 0 76s
prometheus-stack-01-kube-p-operator-b97d5f9cc-cm2pl 1/1 Running 0 76s
prometheus-stack-01-kube-state-metrics-554fd7bf8b-z62gv 1/1 Running 0 76s
prometheus-stack-01-prometheus-node-exporter-7bwbd 1/1 Running 0 76s
prometheus-stack-01-prometheus-node-exporter-dvqc6 1/1 Running 0 76s
prometheus-stack-01-prometheus-node-exporter-nfm5g 1/1 Running 0 76s
prometheus-stack-01-prometheus-node-exporter-ssfkb 1/1 Running 0 76s
```
### Enable Istio Injection
Let's enable back istio-injection on the namespace.
```shell
kubectl label namespaces observability istio-injection=enabled --overwrite=true
```
### Delete all pods so are recreated with the istio sidecar
To update the containers we will need to delete/recreate all of them.
```shell
kubectl delete pods -n observability --all
```
```text
pod "alertmanager-prometheus-stack-01-kube-p-alertmanager-0" deleted
pod "prometheus-prometheus-stack-01-kube-p-prometheus-0" deleted
pod "prometheus-stack-01-grafana-69bd95649b-w67xg" deleted
pod "prometheus-stack-01-kube-p-operator-b97d5f9cc-cm2pl" deleted
pod "prometheus-stack-01-kube-state-metrics-554fd7bf8b-z62gv" deleted
pod "prometheus-stack-01-prometheus-node-exporter-7bwbd" deleted
pod "prometheus-stack-01-prometheus-node-exporter-dvqc6" deleted
pod "prometheus-stack-01-prometheus-node-exporter-nfm5g" deleted
pod "prometheus-stack-01-prometheus-node-exporter-ssfkb" deleted
```
### Check pods status (again)
Everything seems to be deployed and working correctly.
```shell
kubectl get pods -n observability
```
```text
NAME READY STATUS RESTARTS AGE
alertmanager-prometheus-stack-01-kube-p-alertmanager-0 3/3 Running 0 44s
prometheus-prometheus-stack-01-kube-p-prometheus-0 3/3 Running 0 43s
prometheus-stack-01-grafana-69bd95649b-24v58 4/4 Running 0 46s
prometheus-stack-01-kube-p-operator-b97d5f9cc-5bdwh 2/2 Running 1 (43s ago) 46s
prometheus-stack-01-kube-state-metrics-554fd7bf8b-wjw4d 2/2 Running 2 (41s ago) 46s
prometheus-stack-01-prometheus-node-exporter-4266g 1/1 Running 0 46s
prometheus-stack-01-prometheus-node-exporter-lmxdj 1/1 Running 0 45s
prometheus-stack-01-prometheus-node-exporter-shd72 1/1 Running 0 45s
prometheus-stack-01-prometheus-node-exporter-wjhdr 1/1 Running 0 45s
```
### Gateway
I have my gateways already created (on this scenario I will be using the local gateway).
### VirtualService
I will create 2 Virtual Service entries, one for the Grafana dashboard, and another for the Prometheus dashboard:
- Prometheus dashboard URL: "prometheus.llb.filter.home"
- Grafana dashboard URL: "grafana.llb.filter.home"
```text
kubectl apply -f ./src/VirtualService.yaml
```
```shell
virtualservice.networking.istio.io/grafana-vs created
virtualservice.networking.istio.io/prometheus-vs created
```
## Prometheus
As a simple example of being able to access kubernetes metrics, you can run the following promql queries:
### Running pods per node
We can see the value "node=XXXX", which matches the node from our Kubernetes nodes available within the cluster.
```promql
kubelet_running_pods
```
### Running pods per namespace
Right now, on the namespace "observability" I have a total of 9 pods running.
```promql
sum(kube_pod_status_ready) by (namespace)
```
You can verify this by running:
```shell
kubectl get pods -n observability --no-headers=true | nl
```
```text
1 alertmanager-prometheus-stack-01-kube-p-alertmanager-0 3/3 Running 0 40m
2 prometheus-prometheus-stack-01-kube-p-prometheus-0 3/3 Running 0 40m
3 prometheus-stack-01-grafana-69bd95649b-24v58 4/4 Running 0 40m
4 prometheus-stack-01-kube-p-operator-b97d5f9cc-5bdwh 2/2 Running 1 (40m ago) 40m
5 prometheus-stack-01-kube-state-metrics-554fd7bf8b-wjw4d 2/2 Running 2 (40m ago) 40m
6 prometheus-stack-01-prometheus-node-exporter-4266g 1/1 Running 0 40m
7 prometheus-stack-01-prometheus-node-exporter-lmxdj 1/1 Running 0 40m
8 prometheus-stack-01-prometheus-node-exporter-shd72 1/1 Running 0 40m
9 prometheus-stack-01-prometheus-node-exporter-wjhdr 1/1 Running 0 40m
```
Which returns a total of 9 pods, with s status "running"
### Running containers per namespace
Currently, this is returning 18 containers running on the namespace **observability**.
```promql
sum(kube_pod_container_status_running) by (namespace)
```
Very much, listing again the pods running within the namespace, and just counting the values, I can confirm the total of containers running within the namespace, totals up to 18, matching the prometheus data.
```shell
kubectl get pods -n observability
```
```text
NAME READY STATUS RESTARTS AGE
alertmanager-prometheus-stack-01-kube-p-alertmanager-0 3/3 Running 0 45m
prometheus-prometheus-stack-01-kube-p-prometheus-0 3/3 Running 0 45m
prometheus-stack-01-grafana-69bd95649b-24v58 4/4 Running 0 45m
prometheus-stack-01-kube-p-operator-b97d5f9cc-5bdwh 2/2 Running 1 (45m ago) 45m
prometheus-stack-01-kube-state-metrics-554fd7bf8b-wjw4d 2/2 Running 2 (45m ago) 45m
prometheus-stack-01-prometheus-node-exporter-4266g 1/1 Running 0 45m
prometheus-stack-01-prometheus-node-exporter-lmxdj 1/1 Running 0 45m
prometheus-stack-01-prometheus-node-exporter-shd72 1/1 Running 0 45m
prometheus-stack-01-prometheus-node-exporter-wjhdr 1/1 Running 0 45m
```

View File

@ -0,0 +1,16 @@
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: local-gateway
namespace: default
spec:
selector:
istio: local-ingress
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "my.home"
- "*.filter.home"

View File

@ -0,0 +1,37 @@
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: grafana-vs
namespace: default
labels:
app: grafana
spec:
hosts:
- "grafana.my.home"
gateways:
- default/local-gateway
http:
- route:
- destination:
host: prometheus-stack-01-grafana.observability.svc.cluster.local
port:
number: 80
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: prometheus-vs
namespace: observability
labels:
app: prometheus
spec:
hosts:
- "prometheus.my.home"
gateways:
- default/local-gateway
http:
- route:
- destination:
host: prometheus-stack-01-kube-p-prometheus.observability.svc.cluster.local
port:
number: 9090

View File

@ -0,0 +1,21 @@
prometheus:
prometheusSpec:
retention: "30d"
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: slow-nfs-01
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 50Gi
alertmanager:
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
storageClassName: slow-nfs-01
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 10Gi

View File

@ -0,0 +1,60 @@
## Description
Through the use of Prometheus CRDs, we deploy a PodMonitor and ServiceMonitor objects, which will scrap metrics from the Envoy Proxies attached to each pod and Istiod deployment.
## Requirements
- Complete step [01-Create_Prometheus_Stack](../01-Create_Prometheus_Stack)
## Istio Metrics
Now that a functional Prometheus-Grafana-Alert manager set up.
The next step is to deploy scrapping Prometheus jobs/configs to gather:
- Envoy proxy metrics
- Istiod metrics.
> **Note**: \
> That the operators deployed are based off the [Istio Prometheus Operator Example](https://github.com/istio/istio/blob/1.20.2/samples/addons/extras/prometheus-operator.yaml)
```shell
kubectl create -f PrometheusIstioAgent.yaml
```
```text
servicemonitor.monitoring.coreos.com/istiod-metrics-monitor created
podmonitor.monitoring.coreos.com/envoy-stats-monitor created
```
To update the list of Prometheus targets, we can wait for a bit until it gets picked up automatically, idk give it a minute or two, get off the PC and grab some whatever or stretch your legs.
### Check Targets
Once the Prometheus pod is up and running again, if we access the website service, and access to the section **Status > Targets**, we can list all the available Targets.
Once there, I am able to see the following entries:
- **podMonitor/observability/envoy-stats-monitor/0 (15/15 up)**
- **serviceMonitor/observability/istiod-metrics-monitor/0 (2/2 up)**
### Check through Prometheus queries
Now, back to the **Graph** section, we can confirm if we are receiving metrics from **Istiod** and **Envoy**.
#### Istiod
Very simple and straightforward, the uptime for each one of the **Istiod** pods.
```promql
istiod_uptime_seconds
```
#### Envoy
Requests grouped by `destination_service_name`.
```promql
sum(istio_requests_total) by (destination_service_name)
```

View File

@ -0,0 +1,66 @@
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: kube-prometheus-stack-prometheus
release: prometheus-stack-01
name: istiod-metrics-monitor
namespace: observability
spec:
jobLabel: istio
targetLabels: [app]
selector:
matchExpressions:
- {key: istio, operator: In, values: [pilot]}
namespaceSelector:
any: true
endpoints:
- port: http-monitoring
interval: 15s
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: envoy-stats-monitor
labels:
app: kube-prometheus-stack-prometheus
release: prometheus-stack-01
namespace: observability
spec:
selector:
matchExpressions:
- {key: istio-prometheus-ignore, operator: DoesNotExist}
namespaceSelector:
any: true
jobLabel: envoy-stats
podMetricsEndpoints:
- path: /stats/prometheus
interval: 15s
relabelings:
- action: keep
sourceLabels: [__meta_kubernetes_pod_container_name]
regex: "istio-proxy"
- action: keep
sourceLabels: [__meta_kubernetes_pod_annotationpresent_prometheus_io_scrape]
- action: replace
regex: (\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4})
replacement: '[$2]:$1'
sourceLabels:
- __meta_kubernetes_pod_annotation_prometheus_io_port
- __meta_kubernetes_pod_ip
targetLabel: __address__
- action: replace
regex: (\d+);((([0-9]+?)(\.|$)){4})
replacement: $2:$1
sourceLabels:
- __meta_kubernetes_pod_annotation_prometheus_io_port
- __meta_kubernetes_pod_ip
targetLabel: __address__
- action: labeldrop
regex: "__meta_kubernetes_pod_label_(.+)"
- sourceLabels: [__meta_kubernetes_namespace]
action: replace
targetLabel: namespace
- sourceLabels: [__meta_kubernetes_pod_name]
action: replace
targetLabel: pod_name

View File

@ -0,0 +1,58 @@
## Description
Shares some dashboards ready to use once Istio metrics are added to the Prometheus stack.
This is extremely simple to be honest.
## Requirements
- Complete step [02-Add_Istio_Scrapping_Metrics](../02-Add_Istio_Scrapping_Metrics)
## Grafana
### Default credentials
> **Note:** \
> Since Grafana has no storage/volume, **all changes will be lost**
User: admin
Password: prom-operator
Just check any dashboard to see if it's working correctly.
I personally recommend the dashboard:
- **Node Exporter / USE Method / Node**
Lists the resource utilization for each one of the Nodes.
IDK check whatever you want, there are some good predefined graphs already.
### Want to change crededntials?
Just log into the admin user and change whatever the hell you want.
Username, email, password.
Select different preferences..., whatever.
### Want to manage/create Users/Teams?
Select `Administration` > `Users and Access`.
There you will be able to create/manage **Users**, **Teams** and **Service Accounts**.
### Istio related Dashboards
Here is a list of ready to go Istio related dashboards that you might want to set up on your Grafana Deployment.
- https://grafana.com/grafana/dashboards/7630-istio-workload-dashboard/
- https://grafana.com/grafana/dashboards/7636-istio-service-dashboard/
- https://grafana.com/grafana/dashboards/7645-istio-control-plane-dashboard/
- https://grafana.com/grafana/dashboards/7639-istio-mesh-dashboard/
The dashboards where found here:
- https://grafana.com/orgs/istio/dashboards

28
13-monitoring/README.md Normal file
View File

@ -0,0 +1,28 @@
Currently, in progress, more or less.
Note that on this set of examples, steps 1, 2 and 3 are incremental, therefore they use resources set up in each one of the previous examples.
Example of Alert Manager could be a simple "when X service has 80% of 503, rise alert"
## Related link
https://raw.githubusercontent.com/istio/istio/release-1.17/samples/httpbin/sample-client/fortio-deploy.yaml
https://github.com/istio/istio/tree/master/samples/addons
https://github.com/prometheus-operator/prometheus-operator
https://istio.io/latest/docs/ops/integrations/prometheus/
https://istio.io/latest/docs/ops/integrations/prometheus/#option-2-customized-scraping-configurations
https://istio.io/latest/docs/ops/integrations/prometheus/#tls-settings
https://www.reddit.com/r/istio/comments/usld1h/istio_mtls_and_prometheus_the_definitive/
https://superorbital.io/blog/istio-metrics-merging/ <--- Interesting read

View File

@ -8,6 +8,10 @@ Refer to the specific `README.md` in each example for more information.
# Tree of folders
```shell
tree -d | grep -v src$
```
```text
├── 00-Troubleshooting
├── 01-Getting_Started
@ -54,6 +58,10 @@ Refer to the specific `README.md` in each example for more information.
│   ├── 01-FaultInjection-delay
│   └── 02-FaultInjection-abort
├── 12-CircuitBreaking
├── 13-monitoring
│   ├── 01-Create_Prometheus_Stack
│   ├── 02-Add_Istio_Scrapping_Metrics
│   └── 03-Grafana_Istio_Dashboards
├── 90-MixConfigs
│   ├── 01-HTTPS-Gateway_Service_Entry
│   └── Minecraft