Skip to content

deviant101/High-availability-RKE2-Kubernetes-Cluster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HA RKE2 Kubernetes Cluster on AWS

Terraform RKE2 AWS License

A production-ready High Availability RKE2 Kubernetes cluster deployed on AWS using Terraform Infrastructure-as-Code. This project implements industry best practices for deploying fault-tolerant Kubernetes clusters with multi-AZ distribution, automatic failover, and secure networking.

Architecture Overview

                                    INTERNET
                                        |
                             +----------+----------+
                             |   Internet Gateway  |
                             +----------+----------+
                                        |
            +---------------------------+---------------------------+
            |                           |                           |
    +-------+-------+           +-------+-------+           +-------+-------+
    | Public Subnet |           | Public Subnet |           | Public Subnet |
    |  10.0.1.0/24  |           |  10.0.2.0/24  |           |  10.0.3.0/24  |
    |     AZ-a      |           |     AZ-b      |           |     AZ-c      |
    +-------+-------+           +-------+-------+           +-------+-------+
            |                           |                           |
    +-------+-------+           +-------+-------+           +-------+-------+
    |  Control      |           |  Control      |           |  Control      |
    |  Plane-1      |           |  Plane-2      |           |  Plane-3      |
    |  (etcd)       |           |  (etcd)       |           |  (etcd)       |
    +-------+-------+           +-------+-------+           +-------+-------+
            |                           |                           |
            +-----------+---------------+-----------+---------------+
                        |                           |
             +----------+----------+     +----------+----------+
             |   Network Load      |     |   Cross-AZ etcd     |
             |   Balancer (NLB)    |     |   Replication       |
             |   :6443, :9345      |     +---------------------+
             +----------+----------+
                        |
    +-------------------+-------------------+
    |                   |                   |
+---+---+           +---+---+           +---+---+
|Worker |           |Worker |           |Worker |
|Node-1 |           |Node-2 |           |Node-3 |
+-------+           +-------+           +-------+

Features

  • High Availability: 3 Control Plane nodes with embedded etcd for quorum-based consensus
  • Multi-AZ Deployment: Nodes distributed across 3 Availability Zones for fault tolerance
  • Network Load Balancer: AWS NLB for API server high availability and automatic failover
  • Cilium CNI: eBPF-based container networking for high performance and advanced features
  • Security Hardened: Encrypted EBS volumes, restrictive security groups, tainted control plane nodes
  • Production Ready: Proper health checks, retry logic, and graceful cluster initialization
  • Modular Design: Clean Terraform module separation for maintainability

Quick Start

Prerequisites

  • Terraform >= 1.5.0
  • AWS CLI configured with appropriate credentials
  • SSH key pair for EC2 access
  • AWS account with permissions for VPC, EC2, ELB, and IAM

Deployment

# Clone the repository
git clone https://github.com/deviant101/ha-rke2-kubernetes-cluster.git
cd ha-rke2-kubernetes-cluster/terraform

# Copy and configure variables
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your settings

# Initialize Terraform
terraform init

# Preview changes
terraform plan

# Deploy the cluster
terraform apply

Access the Cluster

# Get kubeconfig (after deployment)
$(terraform output -raw kubeconfig_command)

# Verify cluster
export KUBECONFIG=./kubeconfig.yaml
kubectl get nodes
kubectl get pods -A

Configuration

Variable Default Description
aws_region us-east-1 AWS region for deployment
cluster_name rke2-ha-cluster Name of the Kubernetes cluster
control_plane_count 3 Number of control plane nodes
worker_count 3 Number of worker nodes
control_plane_instance_type t3.medium EC2 instance type for control plane
worker_instance_type t3.medium EC2 instance type for workers
rke2_version v1.34.6+rke2r1 RKE2 version to install
vpc_cidr 10.0.0.0/16 VPC CIDR block
pod_cidr 10.42.0.0/16 Kubernetes Pod CIDR
service_cidr 10.43.0.0/16 Kubernetes Service CIDR

See terraform/variables.tf for all configuration options.

Documentation

Document Description
Architecture Guide Detailed AWS infrastructure architecture and design decisions
HA RKE2 Guide High Availability concepts and RKE2 specifics
Deployment Guide Step-by-step deployment instructions
Flow Diagrams Cluster initialization and join process flows
Troubleshooting Common issues and solutions

Module Structure

terraform/
├── main.tf                 # Root orchestration and provider config
├── variables.tf            # Input variable definitions
├── outputs.tf              # Output definitions
├── terraform.tfvars.example
└── modules/
    ├── vpc/                # VPC, subnets, IGW, route tables
    ├── security-groups/    # Security group rules for CP and workers
    ├── nlb/                # Network Load Balancer configuration
    ├── control-plane/      # Control plane EC2 instances
    └── workers/            # Worker node EC2 instances

High Availability Design

Control Plane HA

  • 3 etcd members using RKE2's embedded etcd
  • Raft consensus for leader election
  • Survives single node failure (2/3 quorum maintained)
  • Cross-AZ replication for datacenter fault tolerance

Network HA

  • Network Load Balancer distributes API traffic across all control plane nodes
  • Cross-zone load balancing enabled for even distribution
  • Health checks ensure traffic only routes to healthy nodes

Failure Scenarios

Scenario Impact Recovery
1 Control Plane failure Cluster fully operational Automatic (quorum maintained)
2 Control Plane failures API read-only mode Manual intervention required
1 Worker failure Workloads rescheduled Automatic (Kubernetes handles)
1 AZ failure 2 nodes continue operating Automatic (cross-AZ design)

Security Considerations

Implemented

  • EBS volumes encrypted at rest
  • Security groups restrict traffic to VPC CIDR
  • RKE2 token marked as sensitive in Terraform
  • Control plane nodes tainted to prevent workload scheduling
  • Separate security groups for control plane and workers

Recommendations for Production

  • Restrict SSH access to specific IP ranges (not 0.0.0.0/0)
  • Consider VPN or bastion host for API access
  • Enable AWS CloudTrail for audit logging
  • Implement IAM roles for AWS API access from nodes
  • Use private subnets with NAT Gateway

Outputs

After deployment, Terraform provides:

# Cluster endpoints
terraform output kubernetes_api_endpoint
terraform output rke2_registration_endpoint

# Node IPs
terraform output control_plane_public_ips
terraform output worker_public_ips

# SSH commands
terraform output ssh_control_plane_commands

# Kubeconfig retrieval
terraform output kubeconfig_command

Cleanup

# Destroy all resources
terraform destroy

Contributing

Contributions are welcome! Please read our contributing guidelines and submit pull requests for any enhancements.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments


Note: This is a reference implementation. Always review and adapt security configurations for your specific production requirements.

About

Production-ready High Availability RKE2 Kubernetes cluster on AWS using Terraform. Features 3 control plane nodes with embedded etcd, 3 worker nodes, NLB for API HA, Cilium CNI, multi-AZ deployment, and automated cluster bootstrapping

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors