Skip to content

Commit 26d205e

Browse files
authored
Merge pull request #249 from Azure/dev/nl/updateFullyConvergedDoc
update and refine the fully converged design after team discussion
2 parents 1ba703b + bf20dc6 commit 26d205e

3 files changed

Lines changed: 49 additions & 62 deletions

File tree

TSG/EnvironmentValidator/Networking/Troubleshoot-Network-Test-StorageConnections-ConnectivityCheck.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -252,7 +252,7 @@ In converged deployments, the Storage Connections validator will create a tempor
252252

253253
4. If any ping fails, check the following:
254254

255-
- That the VLANs are correctly configured on the TOR switches. In a converged deployment, both storage VLANs should be configured on the interface.
255+
- That the VLANs are correctly configured on the TOR switches. In a converged deployment, each storage VLAN should be configured on its respective ToR switch (Storage VLAN A on ToR-A, Storage VLAN B on ToR-B).
256256
- That physical NICs are connected to the correct ports on the TOR switches.
257257
- That no VLANs are configured on the physical NICs.
258258
- That no firewall rules or other configuration are blocking APIPA traffic.

TSG/Networking/Top-Of-Rack-Switch/Overview-Azure-Local-Deployment-Pattern.md

Lines changed: 13 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ A high-performance design utilizing dedicated NICs for management/compute and st
7272
![Switched with 2 ToRs](images/AzureLocalPhysicalNetworkDiagram_Switched.png)
7373

7474
**Fully Converged Deployment**
75-
A balanced design where all traffic types (management, compute, storage) share the same physical NICs through VLAN segmentation. This pattern minimizes hardware footprint while maintaining high scalability. **Both storage VLANs must be configured on both ToR switches** because SET (Switch Embedded Teaming) may route either storage VLAN through either physical NIC.
75+
A balanced design where all traffic types (management, compute, storage) share the same physical NICs through VLAN segmentation. This pattern minimizes hardware footprint while maintaining high scalability. The **recommended** configuration uses **one storage VLAN per ToR switch**: Storage VLAN A on ToR-A (mapped to one physical NIC) and Storage VLAN B on ToR-B (mapped to the other physical NIC). In failure scenarios (NIC or ToR), SMB/RDMA traffic automatically fails over to the remaining path.
7676

7777
![Fully-Converged with 2 ToRs](images/AzureLocalPhysicalNetworkDiagram_FullyConverged.png)
7878

@@ -82,11 +82,11 @@ A balanced design where all traffic types (management, compute, storage) share t
8282
| Deployment Pattern | Host NIC Configuration | ToR Switch VLAN Configuration | Primary Use Cases |
8383
|---------------------|------------------------|-------------------------------|-------------------|
8484
| **Switchless** | 2 NICs to switches (M+C traffic) + (N−1) direct inter-node NICs (S traffic) | Trunk ports with M, C VLANs only; no storage VLANs on ToRs | Edge deployments, remote sites, cost-sensitive environments |
85-
| **Switched** | 4 NICs per host: 2 for M+C traffic, 2 dedicated for storage | M and C VLANs on both ToRs; S1 VLAN on ToR1 only, S2 VLAN on ToR2 only (dedicated storage NICs) | Enterprise deployments requiring dedicated storage performance and traffic isolation |
86-
| **Fully Converged** | 2 NICs per host carrying all traffic types (M+C+S) via VLAN segmentation | Both storage VLANs (S1, S2) on both ToRs (required for SET) | General-purpose deployments balancing performance, simplicity, and hardware efficiency |
85+
| **Switched** | 4 NICs per host: 2 for M+C traffic, 2 dedicated for storage | M and C VLANs on both ToRs; S1 VLAN on ToR-A only, S2 VLAN on ToR-B only (dedicated storage NICs) | Enterprise deployments requiring dedicated storage performance and traffic isolation |
86+
| **Fully Converged** | 2 NICs per host carrying all traffic types (M+C+S) via VLAN segmentation | S1 VLAN on ToR-A only, S2 VLAN on ToR-B only (recommended) | General-purpose deployments balancing performance, simplicity, and hardware efficiency |
8787

8888
> [!NOTE]
89-
> **Storage VLAN Configuration**: Storage VLANs can be configured as either **Layer 3 (L3) networks with IP subnets** or **Layer 2 (L2) networks without IP subnets**. **Layer 2 configuration is recommended** because it simplifies VLAN tagging, allowing Azure Local hosts to use any IP addresses without hardcoding subnet configurations on the switch or requiring predefined IP ranges. Since Azure Local nodes handle storage traffic tagging, ensure these VLANs are configured as **tagged VLANs on trunk ports** across all ToR switches.
89+
> **Storage VLAN Configuration**: Storage VLANs can be configured as either **Layer 3 (L3) networks with IP subnets** or **Layer 2 (L2) networks without IP subnets**. **Layer 2 configuration is recommended** because it simplifies VLAN tagging, allowing Azure Local hosts to use any IP addresses without hardcoding subnet configurations on the switch or requiring predefined IP ranges. For the recommended deployment patterns in this document, storage VLANs must be configured as **tagged VLANs on trunk ports only on their respective ToR switches**, and **must not be tagged across all ToR switches** unless you are intentionally implementing a non-recommended, legacy, or special-case design that explicitly requires global storage VLAN reachability.
9090
9191

9292
---
@@ -131,27 +131,20 @@ This tool is designed to automate the generation of Azure Local switch configura
131131
### Q: How should Storage VLANs be configured across ToR switches?
132132

133133
**A:**
134-
Storage VLAN configuration depends on the **deployment pattern**:
134+
The recommended baseline design uses **one storage VLAN per ToR switch** for both Switched and Fully Converged deployments:
135135

136136
| Deployment Pattern | ToR VLAN Configuration | Why |
137137
|-------------------|------------------------|-----|
138-
| **Switched** | S1 on ToR1 only, S2 on ToR2 only | Dedicated storage NICs connect to specific ToRs |
139-
| **Fully Converged** | Both S1 & S2 on both ToRs | SET may route either storage VLAN through either physical NIC |
138+
| **Switched** | S1 on ToR-A only, S2 on ToR-B only | Dedicated storage NICs connect to specific ToRs |
139+
| **Fully Converged** | S1 on ToR-A only, S2 on ToR-B only | Each storage VLAN is mapped to one physical NIC; failover occurs automatically |
140140

141-
**Switched Deployment (One Storage VLAN per ToR):**
142-
- Each host has **dedicated storage NICs** (4 NICs total)
143-
- Storage NIC1 connects to ToR1 → only needs VLAN 711
144-
- Storage NIC2 connects to ToR2 → only needs VLAN 712
145-
- This reduces MC-LAG utilization and optimizes RDMA performance
141+
**Storage VLAN Configuration:**
142+
- Storage VLAN A is configured only on ToR-A and mapped to one physical NIC
143+
- Storage VLAN B is configured only on ToR-B and mapped to the other physical NIC
144+
- In failure scenarios (NIC or ToR failure), SMB/RDMA traffic automatically fails over to the remaining path with reduced bandwidth but no functional impact
146145

147-
**Fully Converged Deployment (Both Storage VLANs on Both ToRs):**
148-
- Each host has only **2 NICs** shared for all traffic
149-
- SET (Switch Embedded Teaming) handles vNIC-to-pNIC mapping
150-
- SET may route either storage VLAN through either physical NIC
151-
- **Both ToRs must carry both storage VLANs** to support SET's flexibility
152-
153-
> [!IMPORTANT]
154-
> In Fully Converged deployments, configuring only one storage VLAN per ToR will cause connectivity issues when SET routes a storage vNIC to a physical NIC connected to a ToR that doesn't have that VLAN configured.
146+
> [!NOTE]
147+
> Configuring both storage VLANs on both ToR switches is also supported but optional. Testing has confirmed there is no meaningful resiliency or failover benefit from this configuration, and it increases complexity without improving availability.
155148
156149

157150
### Q: Are **DCB (Data Center Bridging)** features like **PFC** and **ETS** required for RDMA in Azure Local deployments?

TSG/Networking/Top-Of-Rack-Switch/Reference-TOR-Fully-Converged-Storage.md

Lines changed: 35 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ This document provides a comprehensive reference for implementing a fully conver
2727
- [Quality of Service (QoS)](#quality-of-service-qos)
2828
- [BGP Routing](#bgp-routing)
2929
- [Frequently Asked Questions](#frequently-asked-questions)
30-
- [Q: Why must both Storage VLANs be on both ToR switches in Fully Converged?](#q-why-must-both-storage-vlans-be-on-both-tor-switches-in-fully-converged)
30+
- [Q: How should Storage VLANs be configured in Fully Converged deployments?](#q-how-should-storage-vlans-be-configured-in-fully-converged-deployments)
3131
- [Additional Resources](#additional-resources)
3232
- [Official Documentation](#official-documentation)
3333
- [Technical Deep Dives](#technical-deep-dives)
@@ -44,7 +44,7 @@ Azure Local's fully converged network design provides a unified approach to hand
4444

4545
The fully converged physical network architecture integrates **management**, **compute**, and **storage** traffic over the same physical Ethernet interfaces. This design minimizes hardware footprint while maximizing scalability and deployment simplicity.
4646

47-
**Key Design Principle**: In Fully Converged deployments, **both storage VLANs must be configured on both ToR switches**. This is because each host has only 2 NICs (shared for all traffic), and SET (Switch Embedded Teaming) may route either storage VLAN through either physical NIC based on its load balancing algorithm.
47+
**Key Design Principle**: In Fully Converged deployments, the **recommended** baseline design uses **one storage VLAN per ToR switch**: Storage VLAN A is configured only on TOR-A and mapped to one physical NIC, while Storage VLAN B is configured only on TOR-B and mapped to the other physical NIC. In failure scenarios (NIC or ToR), SMB/RDMA traffic automatically fails over to the remaining path with reduced bandwidth but no functional impact. Configuring both storage VLANs on both ToR switches is also supported but optional.
4848

4949
## Architecture Components
5050

@@ -82,7 +82,7 @@ This section demonstrates a **fully converged Azure Local deployment** where man
8282

8383
### Design Characteristics
8484
- **Fully Converged**: All traffic types (Management, Compute, Storage) utilize the same physical links
85-
- **Redundant Infrastructure**: Each node connects to both ToR1 and ToR2 for high availability
85+
- **Redundant Infrastructure**: Each node connects to both TOR-A and TOR-B for high availability
8686
- **Switch Embedded Teaming**: Host-level NIC bonding provides fault tolerance and load balancing
8787
- **VLAN Segmentation**: Traffic isolation using IEEE 802.1Q VLAN tagging
8888

@@ -103,22 +103,22 @@ The following tables demonstrate physical connectivity between Azure Local nodes
103103

104104
| Azure Local Node | Interface | ToR Switch | Interface |
105105
|------------------|-----------|------------|-------------|
106-
| **Host1** | NIC A | ToR1 | Ethernet1/1 |
107-
| **Host1** | NIC B | ToR2 | Ethernet1/1 |
106+
| **Host1** | NIC A | TOR-A | Ethernet1/1 |
107+
| **Host1** | NIC B | TOR-B | Ethernet1/1 |
108108

109109
#### Host 2
110110

111111
| Azure Local Node | Interface | ToR Switch | Interface |
112112
|------------------|-----------|------------|-------------|
113-
| **Host2** | NIC A | ToR1 | Ethernet1/2 |
114-
| **Host2** | NIC B | ToR2 | Ethernet1/2 |
113+
| **Host2** | NIC A | TOR-A | Ethernet1/2 |
114+
| **Host2** | NIC B | TOR-B | Ethernet1/2 |
115115

116116
#### Host 3
117117

118118
| Azure Local Node | Interface | ToR Switch | Interface |
119119
|------------------|-----------|------------|-------------|
120-
| **Host3** | NIC A | ToR1 | Ethernet1/3 |
121-
| **Host3** | NIC B | ToR2 | Ethernet1/3 |
120+
| **Host3** | NIC A | TOR-A | Ethernet1/3 |
121+
| **Host3** | NIC B | TOR-B | Ethernet1/3 |
122122

123123

124124
### VLAN Architecture
@@ -132,14 +132,14 @@ The fully converged design uses VLAN segmentation to isolate different traffic t
132132
| Storage 1 | SMB storage over RDMA (first path) | 711 | Tagged VLAN, L2 only (no SVI) |
133133
| Storage 2 | SMB storage over RDMA (second path) | 712 | Tagged VLAN, L2 only (no SVI) |
134134

135-
> [!IMPORTANT]
136-
> **Storage VLAN Design Pattern for Fully Converged**: In Fully Converged deployments, **both storage VLANs (711 and 712) must be configured on both ToR switches**. This is because:
135+
> [!NOTE]
136+
> **Storage VLAN Design Pattern for Fully Converged**: The **recommended** baseline design uses **one storage VLAN per ToR switch**:
137137
>
138-
> - Each host has only **2 NICs** connecting to both ToRs (no dedicated storage NICs)
139-
> - **SET (Switch Embedded Teaming)** handles vNIC-to-pNIC mapping at the host level
140-
> - SET may route either storage VLAN through either physical NIC based on its load balancing algorithm
138+
> - Storage VLAN 711 is configured only on TOR-A and mapped to one physical NIC
139+
> - Storage VLAN 712 is configured only on TOR-B and mapped to the other physical NIC
140+
> - In failure scenarios (NIC or ToR), SMB/RDMA traffic automatically fails over to the remaining path
141141
>
142-
> This differs from **Switched** deployments where dedicated storage NICs connect to specific ToRs, allowing one storage VLAN per ToR.
142+
> Configuring both storage VLANs on both ToR switches is also supported but optional. Testing has confirmed no meaningful resiliency benefit from this configuration.
143143
144144
### Top-of-Rack Switch Configuration
145145

@@ -168,7 +168,7 @@ This section provides configuration guidance using **Cisco Nexus 93180YC-FX3 (NX
168168
- **VLAN 712 (Storage 2)**: Layer 2 only VLAN (no SVI), tagged on trunk ports for RDMA traffic
169169

170170
> [!NOTE]
171-
> In Fully Converged deployments, **both storage VLANs must be configured on both ToR switches** because SET handles vNIC-to-pNIC mapping at the host level and may route either storage VLAN through either physical NIC.
171+
> In Fully Converged deployments, the recommended design uses **one storage VLAN per ToR switch**: Storage VLAN 711 on TOR-A only, Storage VLAN 712 on TOR-B only. This simplifies configuration while automatic failover handles NIC or ToR failures.
172172
173173
> [!IMPORTANT]
174174
> Storage VLANs 711 and 712 should **NOT** be permitted on the ToR-to-ToR peer-link (vPC peer-link, MLAG inter-switch trunk, or any L2 interconnect between ToR switches). Storage traffic must flow directly from host to ToR to destination host to maintain optimal RDMA performance. Allowing storage VLANs on peer links can cause performance degradation.
@@ -181,16 +181,14 @@ This section provides configuration guidance using **Cisco Nexus 93180YC-FX3 (NX
181181

182182
##### Sample NX-OS Configuration
183183

184-
**ToR1 Configuration:**
184+
**TOR-A Configuration:**
185185
```console
186186
vlan 7
187187
name Management_7
188188
vlan 201
189189
name Compute_201
190190
vlan 711
191191
name Storage_711
192-
vlan 712
193-
name Storage_712
194192

195193
interface Vlan7
196194
description Management
@@ -213,22 +211,20 @@ interface Ethernet1/1-3
213211
switchport
214212
switchport mode trunk
215213
switchport trunk native vlan 7
216-
switchport trunk allowed vlan 7,201,711,712
214+
switchport trunk allowed vlan 7,201,711
217215
priority-flow-control mode on send-tlv
218216
spanning-tree port type edge trunk
219217
mtu 9216
220218
service-policy type qos input AZS_SERVICES
221219
no shutdown
222220
```
223221

224-
**ToR2 Configuration:**
222+
**TOR-B Configuration:**
225223
```console
226224
vlan 7
227225
name Management_7
228226
vlan 201
229227
name Compute_201
230-
vlan 711
231-
name Storage_711
232228
vlan 712
233229
name Storage_712
234230

@@ -253,7 +249,7 @@ interface Ethernet1/1-3
253249
switchport
254250
switchport mode trunk
255251
switchport trunk native vlan 7
256-
switchport trunk allowed vlan 7,201,711,712
252+
switchport trunk allowed vlan 7,201,712
257253
priority-flow-control mode on send-tlv
258254
spanning-tree port type edge trunk
259255
mtu 9216
@@ -262,8 +258,8 @@ interface Ethernet1/1-3
262258
```
263259

264260
> [!NOTE]
265-
> - Both ToR switches have **identical VLAN configurations** (7, 201, 711, 712) in Fully Converged deployments
266-
> - SET at the host level handles vNIC-to-pNIC mapping to optimize storage traffic paths
261+
> - TOR-A has Storage VLAN 711 only, TOR-B has Storage VLAN 712 only (one storage VLAN per ToR)
262+
> - In failure scenarios, SMB/RDMA traffic automatically fails over to the remaining path
267263
> - QoS policies and routing design (e.g., uplinks, BGP/OSPF, default gateway) will be introduced in a separate document
268264
269265

@@ -326,7 +322,7 @@ Host4 c$ Administrator Administrator 3.1.1 2
326322

327323
> [!NOTE]
328324
> **SMB Multichannel Validation Key Points:**
329-
> - Both storage VLANs (711 and 712) are operational with RDMA enabled
325+
> - Storage VLANs 711 and 712 are operational with RDMA enabled (each mapped to its respective ToR)
330326
> - `RdmaConnectionCount = 2` confirms RDMA is being used for storage traffic
331327
> - `TcpConnectionCount = 0` shows no fallback to regular TCP
332328
> - SMB 3.1.1 dialect is being used for optimal performance
@@ -345,7 +341,7 @@ Confirm that storage VLANs 711 and 712 are allowed on the trunk to the host:
345341

346342
```console
347343
# Verify VLANs are allowed on the interface trunk
348-
ToR1# show interface ethernet 1/3 trunk
344+
TOR-A# show interface ethernet 1/3 trunk
349345

350346
Port Native Status Port
351347
Vlan Channel
@@ -364,7 +360,7 @@ Check MAC address table entries for storage VLANs. The example below shows one p
364360

365361
```console
366362
# Check per-VLAN MAC table entries across the ToR
367-
ToR1# show mac address-table vlan 711
363+
TOR-A# show mac address-table vlan 711
368364
Legend:
369365
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
370366
age - seconds since last seen,+ - primary entry using vPC Peer-Link,
@@ -373,7 +369,7 @@ Legend:
373369
---------+-----------------+--------+---------+------+----+------------------
374370
* 711 0015.5dc8.2006 dynamic 0 F F Eth1/3
375371

376-
ToR1# show mac address-table vlan 712
372+
TOR-A# show mac address-table vlan 712
377373
Legend:
378374
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
379375
age - seconds since last seen,+ - primary entry using vPC Peer-Link,
@@ -405,26 +401,24 @@ For BGP routing configuration and best practices in Azure Local deployments:
405401

406402
## Frequently Asked Questions
407403

408-
### Q: Why must both Storage VLANs be on both ToR switches in Fully Converged?
404+
### Q: How should Storage VLANs be configured in Fully Converged deployments?
409405

410406
**A:**
411-
In Fully Converged deployments, **both storage VLANs (711 and 712) must be configured on both ToR switches**. This is required because:
407+
The recommended baseline design uses **one storage VLAN per ToR switch** for Fully Converged deployments:
412408

413-
1. **Only 2 NICs per host**: Each host connects one NIC to ToR1 and one to ToR2
414-
2. **SET handles traffic routing**: Switch Embedded Teaming maps storage vNICs to physical NICs at the host level
415-
3. **Either VLAN through either NIC**: SET's load balancing may route Storage VLAN 711 or 712 through either physical NIC
409+
- Storage VLAN A (711) is configured only on TOR-A and mapped to one physical NIC
410+
- Storage VLAN B (712) is configured only on TOR-B and mapped to the other physical NIC
411+
- In failure scenarios (NIC or ToR failure), SMB/RDMA traffic automatically fails over to the remaining path with reduced bandwidth but no functional impact
416412

417-
**How it differs from Switched deployment:**
413+
**Storage VLAN Configuration:**
418414

419415
| Deployment Pattern | Storage NICs | ToR VLAN Config | Why |
420416
|-------------------|--------------|-----------------|-----|
421-
| **Fully Converged** | Shared (2 NICs total) | Both VLANs on both ToRs | SET may route either VLAN through either NIC |
422-
| **Switched** | Dedicated (4 NICs total) | One VLAN per ToR | Each storage NIC connects to a specific ToR |
423-
424-
**Key Point:** The "one storage VLAN per ToR" optimization applies to **Switched** deployments where dedicated storage NICs connect to specific ToRs. In Fully Converged, SET's flexibility requires both VLANs on both switches.
417+
| **Fully Converged** | Shared (2 NICs total) | S1 on TOR-A only, S2 on TOR-B only | One storage VLAN per NIC; failover occurs automatically |
418+
| **Switched** | Dedicated (4 NICs total) | S1 on TOR-A only, S2 on TOR-B only | Each storage NIC connects to a specific ToR |
425419

426420
> [!NOTE]
427-
> SET uses vNIC-to-pNIC affinity mapping to optimize traffic paths, but the switches must still be configured to carry both storage VLANs to handle any mapping SET chooses.
421+
> Configuring both storage VLANs on both ToR switches is also supported but optional. Testing has confirmed there is no meaningful resiliency or failover benefit from this configuration, and it increases complexity without improving availability.
428422
429423

430424
## Additional Resources

0 commit comments

Comments
 (0)