Skip to content
Vladimir Chavkov
Go back

Azure AKS Networking and Ingress in Production: Practical Guide

Edit page

Azure AKS Networking and Ingress in Production: Practical Guide

AKS is easy to create and deceptively hard to get networking right—especially at scale. Most AKS operational incidents I see are rooted in one of these:

This guide focuses on production-grade AKS networking and ingress patterns you can standardize across clusters.

Mental model: AKS networking layers

Choose the right CNI: kubenet vs Azure CNI

kubenet (legacy/simple)

Trade-offs:

Trade-offs:

IP planning: the most important design step

You must plan:

Quick rules of thumb

(max nodes) * (max pods per node) + operational buffer

In practice, allocate a subnet much larger than your immediate need.

Outbound traffic: understand your egress path

In AKS, outbound behavior depends on how the cluster is set up.

Common outbound types

Why NAT Gateway is often best

Private AKS clusters

Private clusters reduce exposure by keeping the API server private.

Operational considerations:

DNS in AKS: common pitfalls

CoreDNS basics

Common issues:

Signals:

Ingress options in AKS

You usually pick one of:

NGINX Ingress Controller

Pros:

Cons:

AGIC (Application Gateway)

Pros:

Cons:

Production pattern: NGINX Ingress + cert-manager

This is a widely used approach for flexible ingress + automated TLS.

Example: NGINX Ingress install (Helm)

Terminal window
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
--set controller.replicaCount=2 \
--set controller.service.type=LoadBalancer

Example: Ingress resource

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app
namespace: app
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "60"
spec:
ingressClassName: nginx
tls:
- hosts:
- app.example.com
secretName: app-tls
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: app
port:
number: 80

Production pattern: AGIC (high-level checklist)

Key items that usually cause incidents:

TLS strategy

Recommended standard:

Operational best practices:

Troubleshooting runbook

1) Debug ingress routing

Terminal window
kubectl -n ingress-nginx get pods
kubectl -n ingress-nginx logs deploy/ingress-nginx-controller --tail=200
kubectl -n app describe ingress app
kubectl -n app get endpoints app

2) Debug LoadBalancer provisioning

Terminal window
kubectl -n ingress-nginx get svc
kubectl -n ingress-nginx describe svc ingress-nginx-controller

Common causes:

3) Debug DNS inside the cluster

Terminal window
kubectl run -it --rm dnsutils \
--image=registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3 \
--restart=Never -- sh
nslookup kubernetes.default.svc.cluster.local
nslookup app.example.com

4) Debug outbound connectivity

Terminal window
kubectl run -it --rm curl \
--image=curlimages/curl:8.5.0 \
--restart=Never -- sh
curl -I https://example.com

If egress fails, check:

Guardrails to standardize across clusters

Conclusion

AKS production networking is mostly about preventing avoidable problems: IP exhaustion, unpredictable outbound SNAT, private DNS drift, and ingress/TLS operational inconsistency. Standardize your CNI choice, size your subnets conservatively, make outbound explicit (NAT Gateway/UDR), and adopt a repeatable ingress + cert automation pattern.


Edit page
Share this post on:

Previous Post
Google Cloud Professional Certifications: Complete Guide to GCP Career Path
Next Post
Trivy: Complete Container and Infrastructure Security Scanner Guide