diff --git a/13-monitoring/01-Create_Prometheus_Stack/README.md b/13-monitoring/01-Create_Prometheus_Stack/README.md new file mode 100644 index 0000000..631f2c1 --- /dev/null +++ b/13-monitoring/01-Create_Prometheus_Stack/README.md @@ -0,0 +1,353 @@ +--- +gitea: none +include_toc: true +--- + +# Description + +This example deploys a Prometheus stack (Prometheus, Grafana, Alert Manager) through helm. + +This will be used as a base for the future examples. + +It's heavily recommended to have a base knowledge of Istio before proceeding to modify the settings according to your needs. + +## Requisites + +- Istio deployed and running at the namespace `istio-system`. +- Helm installed. + +# Istio Files + +## Gateway + +Simple HTTP gateway. + +It only allows traffic from the domain `my.home`, and it's subdomains. + +Listens to the port 80 and expects HTTP (unencrypted) requests. + +> **Note:** +> I assume the Gateway is already deployed, therefore on the walkthrough it's not mentioned nor specified. If you don't have a gateway, proceed to deploy one before continuing. + +```yaml +apiVersion: networking.istio.io/v1alpha3 +kind: Gateway +metadata: + name: local-gateway + namespace: default +spec: + selector: + istio: local-ingress + servers: + - port: + number: 80 + name: http + protocol: HTTP + hosts: + - "my.home" + - "*.filter.home" +``` + +## VirtualService.yaml + +2 simple Virtual Services for the Grafana and Prometheus services/dashboards. + +URL for each one are: + +- prometheus.my.home + +- grafana.my.home + +```yaml +apiVersion: networking.istio.io/v1alpha3 +kind: VirtualService +metadata: + name: grafana-vs + namespace: default + labels: + app: grafana +spec: + hosts: + - "grafana.my.home" + gateways: + - default/local-gateway + http: + - route: + - destination: + host: prometheus-stack-01-grafana.observability.svc.cluster.local + port: + number: 80 +--- +apiVersion: networking.istio.io/v1alpha3 +kind: VirtualService +metadata: + name: prometheus-vs + namespace: observability + labels: + app: prometheus +spec: + hosts: + - "prometheus.my.home" + gateways: + - default/local-gateway + http: + - route: + - destination: + host: prometheus-stack-01-kube-p-prometheus.observability.svc.cluster.local + port: + number: 9090 +``` + +# Walkthrough + +## Create Observability NS + +```shell +kubectl create namespace +``` + +Placeholder namespace annotation, **istio-injection** will be enabled after the installation is completed. + +If istio-injection is enabled, Helm installation will **fail**. + +I have to check on what/why. + +```shell +kubectl label namespaces observability istio-injection=disabled --overwrite=true +``` + +# PersistentVolume + +I'm using a NFS provisioner, you can use whatever you want. (Optional) + +On the file `stack_values.yaml` I specified that 2 volumes will be provisioned, one for Prometheus, and another one for AlertManager. + +If you don't want to provision volumes, set that file to blank, or on the installation step, remove the line that specifies such line. + +As well increased the retention from 10 days (default value), to 30 days, but since you won't have a volume, don't think that will be much of an issue for you... + +## Installation + +I will be installing Prometheus Operator through Helm. + +```shell +helm repo add prometheus-community https://prometheus-community.github.io/helm-charts +``` + +```text +"prometheus-community" has been added to your repositories +``` + +```shell +helm show values prometheus-community/kube-prometheus-stack +``` + +```text +A lot of text, recommended to save the output on a file and you go through it (at latest use control+f or whatever other search option to find the things you might be interested on replacing/changing) +``` + +My stack_Values.yaml file is: + +```yaml +prometheus: + prometheusSpec: + retention: "30d" + storageSpec: + volumeClaimTemplate: + spec: + storageClassName: slow-nfs-01 + accessModes: [ReadWriteOnce] + resources: + requests: + storage: 50Gi +alertmanager: + alertmanagerSpec: + storage: + volumeClaimTemplate: + spec: + storageClassName: slow-nfs-01 + accessModes: [ReadWriteOnce] + resources: + requests: + storage: 10Gi +``` + +Besides the volumes mentioned in [here](#persistentvolume), increased the retention from 10 days to 30. + +If you haven't configured a PersistentVolume storage, just skip the `--set` lines referencing such. Note that once the pod is restarted, all data will be lost. + +```shell +helm install prometheus-stack-01 prometheus-community/kube-prometheus-stack \ + -n observability \ + --values ./src/stack_values.yaml +``` + +```text +NAME: prometheus-stack-01 +LAST DEPLOYED: Sun Jan 14 22:34:11 2024 +NAMESPACE: observability +STATUS: deployed +REVISION: 1 +NOTES: +kube-prometheus-stack has been installed. Check its status by running: + kubectl --namespace observability get pods -l "release=prometheus-stack-01" + +Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator. +``` + +### Check running pods in namespace + +Everything seems to be deployed and working correctly. + +```shell +kubectl get pods -n observability +``` + +```text +NAME READY STATUS RESTARTS AGE +alertmanager-prometheus-stack-01-kube-p-alertmanager-0 2/2 Running 0 73s +prometheus-prometheus-stack-01-kube-p-prometheus-0 2/2 Running 0 73s +prometheus-stack-01-grafana-69bd95649b-w67xg 3/3 Running 0 76s +prometheus-stack-01-kube-p-operator-b97d5f9cc-cm2pl 1/1 Running 0 76s +prometheus-stack-01-kube-state-metrics-554fd7bf8b-z62gv 1/1 Running 0 76s +prometheus-stack-01-prometheus-node-exporter-7bwbd 1/1 Running 0 76s +prometheus-stack-01-prometheus-node-exporter-dvqc6 1/1 Running 0 76s +prometheus-stack-01-prometheus-node-exporter-nfm5g 1/1 Running 0 76s +prometheus-stack-01-prometheus-node-exporter-ssfkb 1/1 Running 0 76s +``` + +### Enable Istio Injection + +Let's enable back istio-injection on the namespace. + +```shell +kubectl label namespaces observability istio-injection=enabled --overwrite=true +``` + +### Delete all pods so are recreated with the istio sidecar + +To update the containers we will need to delete/recreate all of them. + +```shell +kubectl delete pods -n observability --all +``` + +```text +pod "alertmanager-prometheus-stack-01-kube-p-alertmanager-0" deleted +pod "prometheus-prometheus-stack-01-kube-p-prometheus-0" deleted +pod "prometheus-stack-01-grafana-69bd95649b-w67xg" deleted +pod "prometheus-stack-01-kube-p-operator-b97d5f9cc-cm2pl" deleted +pod "prometheus-stack-01-kube-state-metrics-554fd7bf8b-z62gv" deleted +pod "prometheus-stack-01-prometheus-node-exporter-7bwbd" deleted +pod "prometheus-stack-01-prometheus-node-exporter-dvqc6" deleted +pod "prometheus-stack-01-prometheus-node-exporter-nfm5g" deleted +pod "prometheus-stack-01-prometheus-node-exporter-ssfkb" deleted +``` + +### Check pods status (again) + +Everything seems to be deployed and working correctly. + +```shell +kubectl get pods -n observability +``` + +```text +NAME READY STATUS RESTARTS AGE +alertmanager-prometheus-stack-01-kube-p-alertmanager-0 3/3 Running 0 44s +prometheus-prometheus-stack-01-kube-p-prometheus-0 3/3 Running 0 43s +prometheus-stack-01-grafana-69bd95649b-24v58 4/4 Running 0 46s +prometheus-stack-01-kube-p-operator-b97d5f9cc-5bdwh 2/2 Running 1 (43s ago) 46s +prometheus-stack-01-kube-state-metrics-554fd7bf8b-wjw4d 2/2 Running 2 (41s ago) 46s +prometheus-stack-01-prometheus-node-exporter-4266g 1/1 Running 0 46s +prometheus-stack-01-prometheus-node-exporter-lmxdj 1/1 Running 0 45s +prometheus-stack-01-prometheus-node-exporter-shd72 1/1 Running 0 45s +prometheus-stack-01-prometheus-node-exporter-wjhdr 1/1 Running 0 45s +``` + +### Gateway + +I have my gateways already created (on this scenario I will be using the local gateway). + +### VirtualService + +I will create 2 Virtual Service entries, one for the Grafana dashboard, and another for the Prometheus dashboard: + +- Prometheus dashboard URL: "prometheus.llb.filter.home" +- Grafana dashboard URL: "grafana.llb.filter.home" + +```text +kubectl apply -f ./src/VirtualService.yaml +``` + +```shell +virtualservice.networking.istio.io/grafana-vs created +virtualservice.networking.istio.io/prometheus-vs created +``` + +## Prometheus + +As a simple example of being able to access kubernetes metrics, you can run the following promql queries: + +### Running pods per node + +We can see the value "node=XXXX", which matches the node from our Kubernetes nodes available within the cluster. + +```promql +kubelet_running_pods +``` + +### Running pods per namespace + +Right now, on the namespace "observability" I have a total of 9 pods running. + +```promql +sum(kube_pod_status_ready) by (namespace) +``` + +You can verify this by running: + +```shell +kubectl get pods -n observability --no-headers=true | nl +``` + +```text + 1 alertmanager-prometheus-stack-01-kube-p-alertmanager-0 3/3 Running 0 40m + 2 prometheus-prometheus-stack-01-kube-p-prometheus-0 3/3 Running 0 40m + 3 prometheus-stack-01-grafana-69bd95649b-24v58 4/4 Running 0 40m + 4 prometheus-stack-01-kube-p-operator-b97d5f9cc-5bdwh 2/2 Running 1 (40m ago) 40m + 5 prometheus-stack-01-kube-state-metrics-554fd7bf8b-wjw4d 2/2 Running 2 (40m ago) 40m + 6 prometheus-stack-01-prometheus-node-exporter-4266g 1/1 Running 0 40m + 7 prometheus-stack-01-prometheus-node-exporter-lmxdj 1/1 Running 0 40m + 8 prometheus-stack-01-prometheus-node-exporter-shd72 1/1 Running 0 40m + 9 prometheus-stack-01-prometheus-node-exporter-wjhdr 1/1 Running 0 40m +``` + +Which returns a total of 9 pods, with s status "running" + +### Running containers per namespace + +Currently, this is returning 18 containers running on the namespace **observability**. + +```promql +sum(kube_pod_container_status_running) by (namespace) +``` + +Very much, listing again the pods running within the namespace, and just counting the values, I can confirm the total of containers running within the namespace, totals up to 18, matching the prometheus data. + +```shell +kubectl get pods -n observability +``` + +```text +NAME READY STATUS RESTARTS AGE +alertmanager-prometheus-stack-01-kube-p-alertmanager-0 3/3 Running 0 45m +prometheus-prometheus-stack-01-kube-p-prometheus-0 3/3 Running 0 45m +prometheus-stack-01-grafana-69bd95649b-24v58 4/4 Running 0 45m +prometheus-stack-01-kube-p-operator-b97d5f9cc-5bdwh 2/2 Running 1 (45m ago) 45m +prometheus-stack-01-kube-state-metrics-554fd7bf8b-wjw4d 2/2 Running 2 (45m ago) 45m +prometheus-stack-01-prometheus-node-exporter-4266g 1/1 Running 0 45m +prometheus-stack-01-prometheus-node-exporter-lmxdj 1/1 Running 0 45m +prometheus-stack-01-prometheus-node-exporter-shd72 1/1 Running 0 45m +prometheus-stack-01-prometheus-node-exporter-wjhdr 1/1 Running 0 45m +``` diff --git a/13-monitoring/01-Create_Prometheus_Stack/src/Gateway.yaml b/13-monitoring/01-Create_Prometheus_Stack/src/Gateway.yaml new file mode 100644 index 0000000..2700e04 --- /dev/null +++ b/13-monitoring/01-Create_Prometheus_Stack/src/Gateway.yaml @@ -0,0 +1,16 @@ +apiVersion: networking.istio.io/v1alpha3 +kind: Gateway +metadata: + name: local-gateway + namespace: default +spec: + selector: + istio: local-ingress + servers: + - port: + number: 80 + name: http + protocol: HTTP + hosts: + - "my.home" + - "*.filter.home" \ No newline at end of file diff --git a/13-monitoring/01-Create_Prometheus_Stack/src/VirtualService.yaml b/13-monitoring/01-Create_Prometheus_Stack/src/VirtualService.yaml new file mode 100644 index 0000000..69668bb --- /dev/null +++ b/13-monitoring/01-Create_Prometheus_Stack/src/VirtualService.yaml @@ -0,0 +1,37 @@ +apiVersion: networking.istio.io/v1alpha3 +kind: VirtualService +metadata: + name: grafana-vs + namespace: default + labels: + app: grafana +spec: + hosts: + - "grafana.my.home" + gateways: + - default/local-gateway + http: + - route: + - destination: + host: prometheus-stack-01-grafana.observability.svc.cluster.local + port: + number: 80 +--- +apiVersion: networking.istio.io/v1alpha3 +kind: VirtualService +metadata: + name: prometheus-vs + namespace: observability + labels: + app: prometheus +spec: + hosts: + - "prometheus.my.home" + gateways: + - default/local-gateway + http: + - route: + - destination: + host: prometheus-stack-01-kube-p-prometheus.observability.svc.cluster.local + port: + number: 9090 \ No newline at end of file diff --git a/13-monitoring/01-Create_Prometheus_Stack/src/stack_values.yaml b/13-monitoring/01-Create_Prometheus_Stack/src/stack_values.yaml new file mode 100644 index 0000000..42cde5a --- /dev/null +++ b/13-monitoring/01-Create_Prometheus_Stack/src/stack_values.yaml @@ -0,0 +1,21 @@ +prometheus: + prometheusSpec: + retention: "30d" + storageSpec: + volumeClaimTemplate: + spec: + storageClassName: slow-nfs-01 + accessModes: [ReadWriteOnce] + resources: + requests: + storage: 50Gi +alertmanager: + alertmanagerSpec: + storage: + volumeClaimTemplate: + spec: + storageClassName: slow-nfs-01 + accessModes: [ReadWriteOnce] + resources: + requests: + storage: 10Gi \ No newline at end of file diff --git a/13-monitoring/02-Add_Istio_Scrapping_Metrics/README.md b/13-monitoring/02-Add_Istio_Scrapping_Metrics/README.md new file mode 100644 index 0000000..e6a0669 --- /dev/null +++ b/13-monitoring/02-Add_Istio_Scrapping_Metrics/README.md @@ -0,0 +1,60 @@ +## Description + +Through the use of Prometheus CRDs, we deploy a PodMonitor and ServiceMonitor objects, which will scrap metrics from the Envoy Proxies attached to each pod and Istiod deployment. + +## Requirements + +- Complete step [01-Create_Prometheus_Stack](../01-Create_Prometheus_Stack) + +## Istio Metrics + +Now that a functional Prometheus-Grafana-Alert manager set up. + +The next step is to deploy scrapping Prometheus jobs/configs to gather: + +- Envoy proxy metrics +- Istiod metrics. + +> **Note**: \ +> That the operators deployed are based off the [Istio Prometheus Operator Example](https://github.com/istio/istio/blob/1.20.2/samples/addons/extras/prometheus-operator.yaml) + +```shell +kubectl create -f PrometheusIstioAgent.yaml +``` + +```text +servicemonitor.monitoring.coreos.com/istiod-metrics-monitor created +podmonitor.monitoring.coreos.com/envoy-stats-monitor created +``` + +To update the list of Prometheus targets, we can wait for a bit until it gets picked up automatically, idk give it a minute or two, get off the PC and grab some whatever or stretch your legs. + +### Check Targets + +Once the Prometheus pod is up and running again, if we access the website service, and access to the section **Status > Targets**, we can list all the available Targets. + +Once there, I am able to see the following entries: + +- **podMonitor/observability/envoy-stats-monitor/0 (15/15 up)** + +- **serviceMonitor/observability/istiod-metrics-monitor/0 (2/2 up)** + +### Check through Prometheus queries + +Now, back to the **Graph** section, we can confirm if we are receiving metrics from **Istiod** and **Envoy**. + +#### Istiod + +Very simple and straightforward, the uptime for each one of the **Istiod** pods. + +```promql +istiod_uptime_seconds +``` + +#### Envoy + +Requests grouped by `destination_service_name`. + +```promql +sum(istio_requests_total) by (destination_service_name) +``` \ No newline at end of file diff --git a/13-monitoring/02-Add_Istio_Scrapping_Metrics/src/PrometheusIstioAgent.yaml b/13-monitoring/02-Add_Istio_Scrapping_Metrics/src/PrometheusIstioAgent.yaml new file mode 100644 index 0000000..cf9d677 --- /dev/null +++ b/13-monitoring/02-Add_Istio_Scrapping_Metrics/src/PrometheusIstioAgent.yaml @@ -0,0 +1,66 @@ +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + labels: + app: kube-prometheus-stack-prometheus + release: prometheus-stack-01 + name: istiod-metrics-monitor + namespace: observability +spec: + jobLabel: istio + targetLabels: [app] + selector: + matchExpressions: + - {key: istio, operator: In, values: [pilot]} + namespaceSelector: + any: true + endpoints: + - port: http-monitoring + interval: 15s +--- +apiVersion: monitoring.coreos.com/v1 +kind: PodMonitor +metadata: + name: envoy-stats-monitor + labels: + app: kube-prometheus-stack-prometheus + release: prometheus-stack-01 + namespace: observability +spec: + selector: + matchExpressions: + - {key: istio-prometheus-ignore, operator: DoesNotExist} + namespaceSelector: + any: true + jobLabel: envoy-stats + podMetricsEndpoints: + - path: /stats/prometheus + interval: 15s + relabelings: + - action: keep + sourceLabels: [__meta_kubernetes_pod_container_name] + regex: "istio-proxy" + - action: keep + sourceLabels: [__meta_kubernetes_pod_annotationpresent_prometheus_io_scrape] + - action: replace + regex: (\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4}) + replacement: '[$2]:$1' + sourceLabels: + - __meta_kubernetes_pod_annotation_prometheus_io_port + - __meta_kubernetes_pod_ip + targetLabel: __address__ + - action: replace + regex: (\d+);((([0-9]+?)(\.|$)){4}) + replacement: $2:$1 + sourceLabels: + - __meta_kubernetes_pod_annotation_prometheus_io_port + - __meta_kubernetes_pod_ip + targetLabel: __address__ + - action: labeldrop + regex: "__meta_kubernetes_pod_label_(.+)" + - sourceLabels: [__meta_kubernetes_namespace] + action: replace + targetLabel: namespace + - sourceLabels: [__meta_kubernetes_pod_name] + action: replace + targetLabel: pod_name \ No newline at end of file diff --git a/13-monitoring/03-Grafana_Istio_Dashboards/README.md b/13-monitoring/03-Grafana_Istio_Dashboards/README.md new file mode 100644 index 0000000..2886d21 --- /dev/null +++ b/13-monitoring/03-Grafana_Istio_Dashboards/README.md @@ -0,0 +1,58 @@ +## Description + +Shares some dashboards ready to use once Istio metrics are added to the Prometheus stack. + +This is extremely simple to be honest. + +## Requirements + +- Complete step [02-Add_Istio_Scrapping_Metrics](../02-Add_Istio_Scrapping_Metrics) + +## Grafana + +### Default credentials + +> **Note:** \ +> Since Grafana has no storage/volume, **all changes will be lost** + +User: admin +Password: prom-operator + +Just check any dashboard to see if it's working correctly. + +I personally recommend the dashboard: + +- **Node Exporter / USE Method / Node** + +Lists the resource utilization for each one of the Nodes. + +IDK check whatever you want, there are some good predefined graphs already. + +### Want to change crededntials? + +Just log into the admin user and change whatever the hell you want. + +Username, email, password. + +Select different preferences..., whatever. + +### Want to manage/create Users/Teams? + +Select `Administration` > `Users and Access`. + +There you will be able to create/manage **Users**, **Teams** and **Service Accounts**. + +### Istio related Dashboards + +Here is a list of ready to go Istio related dashboards that you might want to set up on your Grafana Deployment. + +- https://grafana.com/grafana/dashboards/7630-istio-workload-dashboard/ +- https://grafana.com/grafana/dashboards/7636-istio-service-dashboard/ +- https://grafana.com/grafana/dashboards/7645-istio-control-plane-dashboard/ +- https://grafana.com/grafana/dashboards/7639-istio-mesh-dashboard/ + +The dashboards where found here: + +- https://grafana.com/orgs/istio/dashboards + + diff --git a/13-monitoring/README.md b/13-monitoring/README.md new file mode 100644 index 0000000..9cac48d --- /dev/null +++ b/13-monitoring/README.md @@ -0,0 +1,28 @@ +Currently, in progress, more or less. + +Note that on this set of examples, steps 1, 2 and 3 are incremental, therefore they use resources set up in each one of the previous examples. + + + +Example of Alert Manager could be a simple "when X service has 80% of 503, rise alert" + + + + +## Related link + +https://raw.githubusercontent.com/istio/istio/release-1.17/samples/httpbin/sample-client/fortio-deploy.yaml + +https://github.com/istio/istio/tree/master/samples/addons + +https://github.com/prometheus-operator/prometheus-operator + +https://istio.io/latest/docs/ops/integrations/prometheus/ + +https://istio.io/latest/docs/ops/integrations/prometheus/#option-2-customized-scraping-configurations + +https://istio.io/latest/docs/ops/integrations/prometheus/#tls-settings + +https://www.reddit.com/r/istio/comments/usld1h/istio_mtls_and_prometheus_the_definitive/ + +https://superorbital.io/blog/istio-metrics-merging/ <--- Interesting read \ No newline at end of file diff --git a/README.md b/README.md index b02df5b..22c8fdc 100755 --- a/README.md +++ b/README.md @@ -8,6 +8,10 @@ Refer to the specific `README.md` in each example for more information. # Tree of folders +```shell +tree -d | grep -v src$ +``` + ```text ├── 00-Troubleshooting ├── 01-Getting_Started @@ -54,6 +58,10 @@ Refer to the specific `README.md` in each example for more information. │   ├── 01-FaultInjection-delay │   └── 02-FaultInjection-abort ├── 12-CircuitBreaking +├── 13-monitoring +│   ├── 01-Create_Prometheus_Stack +│   ├── 02-Add_Istio_Scrapping_Metrics +│   └── 03-Grafana_Istio_Dashboards ├── 90-MixConfigs │   ├── 01-HTTPS-Gateway_Service_Entry │   └── Minecraft