Professional Documents
Culture Documents
New 2
New 2
New 2
From the provided SNAT connection count chart, the backend IP addresses and
frontend IP address experiencing SNAT port exhaustion are:
Backend IP Addresses:
10.224.1.166
10.224.1.34
10.224.0.69
10.224.0.199
Immediate Mitigation:
Increase the number of pre-allocated ports per node from the default 1,024 to
3,000.
This can be done without needing additional public IPs, given that the default
configuration provides 64,000 available ports.
Reduce TCP Idle Timeout:
Set the TCP idle timeout to 4 minutes to release idle connections faster.
This adjustment helps to free up SNAT ports more quickly and reduces the chance of
port exhaustion.
Use the metrics provided in the current setup to identify patterns in SNAT port
exhaustion.
Focus on the frontend IP address and backend IP address connections, along with the
destination IP addresses and ports, to pinpoint the specific services causing the
issue.
Pay attention to spikes in the metrics around specific times or nodes.
Examine Service Usage:
Analyze logs to identify the specific services or applications causing high SNAT
usage.
Monitor the outbound connection patterns of API services and SQL node pool
services.
Immediate Mitigation
Option 1: Increase Pre-Allocated Ports per Node
Current Configuration:
Proposed Change:
default_node_pool {
name = "default"
node_count = 3
vm_size = "Standard_DS2_v2"
}
network_profile {
load_balancer_profile {
managed_outbound_ip_count = 1
outbound_ip_address_ids = []
outbound_ip_prefix_ids = []
outbound_ports_allocated_per_node {
min_count = 3000
max_count = 3000
}
}
network_plugin = "azure"
network_policy = "calico"
load_balancer_sku = "standard"
}
identity {
type = "SystemAssigned"
}
}
***********************************************
Mid-term Mitigation
Option 2: Set TCP Idle Timeout to 4 Minutes
Current Configuration:
Proposed Change:
Set the TCP idle timeout to 4 minutes to release idle connections faster.
Terraform Configuration:
default_node_pool {
name = "default"
node_count = 3
vm_size = "Standard_DS2_v2"
}
network_profile {
load_balancer_profile {
managed_outbound_ip_count = 1
outbound_ip_address_ids = []
outbound_ip_prefix_ids = []
outbound_ports_allocated_per_node {
min_count = 3000
max_count = 3000
}
idle_timeout_in_minutes = 4
}
network_plugin = "azure"
network_policy = "calico"
load_balancer_sku = "standard"
}
identity {
type = "SystemAssigned"
}
}
*************************************************
Long-term Strategy
Option 3: Add Additional Outbound Public IPs
Current Configuration:
Add multiple outbound public IPs to increase the total number of available SNAT
ports.
Terraform Configuration:
default_node_pool {
name = "default"
node_count = 3
vm_size = "Standard_DS2_v2"
}
network_profile {
load_balancer_profile {
outbound_ip_address_ids = [
azurerm_public_ip.lb_public_ip[0].id,
azurerm_public_ip.lb_public_ip[1].id
]
outbound_ports_allocated_per_node {
min_count = 3000
max_count = 3000
}
idle_timeout_in_minutes = 4
}
network_plugin = "azure"
network_policy = "calico"
load_balancer_sku = "standard"
}
identity {
type = "SystemAssigned"
}
}
Monitoring and Diagnostics
Azure Monitor and Log Analytics:
Use Azure Monitor and Log Analytics to collect and analyze logs.
Network Watcher:
log {
category = "kube-apiserver"
enabled = true
retention_policy {
enabled = true
days = 30
}
}
metric {
category = "AllMetrics"
enabled = true
retention_policy {
enabled = true
days = 30
}
}
}
Cost Implementation
The cost implications for implementing these changes include:
Adding more public IPs will incur additional costs. Each public IP address is
billed separately.
Enabling diagnostic settings and sending logs to a Log Analytics workspace will
incur data ingestion and retention costs.
VM Costs:
There might be increased costs if the cluster scales up due to autoscaling and
increased node count.
Note: To get detailed cost estimates, use the Azure Pricing Calculator to input the
specific services and configurations you plan to use.
Cost Considerations:
Adjusting the idle timeout and pre-allocated ports per node can be done with
minimal cost impact.
Adding more public IPs will incur additional costs. Each Standard Public IP address
has an associated cost as per Azure pricing.
Monitor and optimize usage to balance between cost and performance.
Recommendations:
3. Scaling Considerations:
Regularly review and adjust the load balancer and node configurations based on the
cluster's scaling requirements.
Use the cluster autoscaler to automatically manage the number of nodes in response
to load changes.
Conclusion
By increasing the number of pre-allocated ports per node, reducing the TCP idle
timeout, and adding additional outbound public IPs,
you can effectively mitigate SNAT port exhaustion in your AKS cluster. Implementing
monitoring and diagnostic tools will help you
continuously analyze and optimize your setup.