Can I aggregate multiple uplink ports between Ethernet switches?












4















Disclaimer: I'm not a network engineer; I just have to try to do a good-enough impression to solve this problem.



We have two rooms in our office. In the first, I would like to put a large copper 10-GbE switch to provide connectivity to several servers, something like this:



https://www.fs.com/products/69378.html



This switch has 48 10G-BASE-T ports and 4 40-GbE uplink ports.



In the second room, we have many devices that require 10-GbE SFP+ connectivity. It appears something like this could work:



https://www.fs.com/products/69226.html



This switch has 48 SFP+ ports and 6 40-GbE uplink ports.



Goal: To the greatest extent possible, I would like to provide 10-GbE line-rate performance between any pair of hosts in both rooms. As long as they are plugged into the same switch, this is straightforward, but I'm wondering how best to accomplish this across the two switches. I don't think I need any fancy features like QoS, VLANs, or anything like that; I just want one flat level 2 network. Can I simply connect 4 of the 40-GbE uplinks between the two switches and get ~160 Gbps of bandwidth between the two rooms?



My application consists of continuous, very high-rate UDP flows; that is, it's common for a particular flow to take between 2-8 Gbps of bandwidth on its own. That means that each host will only be communicating with 1-2 other hosts at any time, putting a limit on the total number of flows that are running simultaneously. I'm not sure whether this simplifies or complicates the issue: I need to ensure that the UDP datagrams are not reordered by any aggregation of the uplink ports.










share|improve this question

























  • HI Jason and welcome to NE. Can I ask how far apart the rooms are? And just for curiosity what application sector this is in?

    – jonathanjo
    Jan 25 at 14:59











  • The uplink run between the rooms would be around 30-40 feet. From what I understand, this should be achievable with QSFP modules in each switch and the appropriate fiber optic cables. This is a digital signal processing application; the signals being processed are very high in bandwidth.

    – Jason R
    Jan 25 at 15:02






  • 3





    The feature you want is called LACP (Link Aggregation Control Protocol), that can bundle the uplinks into one logical link. From reading the specs, it doesn't appear that this switch does that.

    – Ron Trunk
    Jan 25 at 15:07











  • @RonTrunk: Hmm, it looks like LACP is mentioned on each of the above switches' feature lists. Are there any variants of LACP that I need to be aware of, or if a switch states LACP/802.3ad support, should that be sufficient?

    – Jason R
    Jan 25 at 15:20






  • 1





    @JasonR you may want to check the switches' feature set in terms of hashing for link selection in a LAG. As in: which combinations of Src/Dst MAC, Src/Dst IP, Src/Dst Ports are being used, and which set(s) are configurable. If at all possible, make sure you have as many options available as possible.

    – Marc 'netztier' Luethi
    Jan 25 at 18:59


















4















Disclaimer: I'm not a network engineer; I just have to try to do a good-enough impression to solve this problem.



We have two rooms in our office. In the first, I would like to put a large copper 10-GbE switch to provide connectivity to several servers, something like this:



https://www.fs.com/products/69378.html



This switch has 48 10G-BASE-T ports and 4 40-GbE uplink ports.



In the second room, we have many devices that require 10-GbE SFP+ connectivity. It appears something like this could work:



https://www.fs.com/products/69226.html



This switch has 48 SFP+ ports and 6 40-GbE uplink ports.



Goal: To the greatest extent possible, I would like to provide 10-GbE line-rate performance between any pair of hosts in both rooms. As long as they are plugged into the same switch, this is straightforward, but I'm wondering how best to accomplish this across the two switches. I don't think I need any fancy features like QoS, VLANs, or anything like that; I just want one flat level 2 network. Can I simply connect 4 of the 40-GbE uplinks between the two switches and get ~160 Gbps of bandwidth between the two rooms?



My application consists of continuous, very high-rate UDP flows; that is, it's common for a particular flow to take between 2-8 Gbps of bandwidth on its own. That means that each host will only be communicating with 1-2 other hosts at any time, putting a limit on the total number of flows that are running simultaneously. I'm not sure whether this simplifies or complicates the issue: I need to ensure that the UDP datagrams are not reordered by any aggregation of the uplink ports.










share|improve this question

























  • HI Jason and welcome to NE. Can I ask how far apart the rooms are? And just for curiosity what application sector this is in?

    – jonathanjo
    Jan 25 at 14:59











  • The uplink run between the rooms would be around 30-40 feet. From what I understand, this should be achievable with QSFP modules in each switch and the appropriate fiber optic cables. This is a digital signal processing application; the signals being processed are very high in bandwidth.

    – Jason R
    Jan 25 at 15:02






  • 3





    The feature you want is called LACP (Link Aggregation Control Protocol), that can bundle the uplinks into one logical link. From reading the specs, it doesn't appear that this switch does that.

    – Ron Trunk
    Jan 25 at 15:07











  • @RonTrunk: Hmm, it looks like LACP is mentioned on each of the above switches' feature lists. Are there any variants of LACP that I need to be aware of, or if a switch states LACP/802.3ad support, should that be sufficient?

    – Jason R
    Jan 25 at 15:20






  • 1





    @JasonR you may want to check the switches' feature set in terms of hashing for link selection in a LAG. As in: which combinations of Src/Dst MAC, Src/Dst IP, Src/Dst Ports are being used, and which set(s) are configurable. If at all possible, make sure you have as many options available as possible.

    – Marc 'netztier' Luethi
    Jan 25 at 18:59
















4












4








4








Disclaimer: I'm not a network engineer; I just have to try to do a good-enough impression to solve this problem.



We have two rooms in our office. In the first, I would like to put a large copper 10-GbE switch to provide connectivity to several servers, something like this:



https://www.fs.com/products/69378.html



This switch has 48 10G-BASE-T ports and 4 40-GbE uplink ports.



In the second room, we have many devices that require 10-GbE SFP+ connectivity. It appears something like this could work:



https://www.fs.com/products/69226.html



This switch has 48 SFP+ ports and 6 40-GbE uplink ports.



Goal: To the greatest extent possible, I would like to provide 10-GbE line-rate performance between any pair of hosts in both rooms. As long as they are plugged into the same switch, this is straightforward, but I'm wondering how best to accomplish this across the two switches. I don't think I need any fancy features like QoS, VLANs, or anything like that; I just want one flat level 2 network. Can I simply connect 4 of the 40-GbE uplinks between the two switches and get ~160 Gbps of bandwidth between the two rooms?



My application consists of continuous, very high-rate UDP flows; that is, it's common for a particular flow to take between 2-8 Gbps of bandwidth on its own. That means that each host will only be communicating with 1-2 other hosts at any time, putting a limit on the total number of flows that are running simultaneously. I'm not sure whether this simplifies or complicates the issue: I need to ensure that the UDP datagrams are not reordered by any aggregation of the uplink ports.










share|improve this question
















Disclaimer: I'm not a network engineer; I just have to try to do a good-enough impression to solve this problem.



We have two rooms in our office. In the first, I would like to put a large copper 10-GbE switch to provide connectivity to several servers, something like this:



https://www.fs.com/products/69378.html



This switch has 48 10G-BASE-T ports and 4 40-GbE uplink ports.



In the second room, we have many devices that require 10-GbE SFP+ connectivity. It appears something like this could work:



https://www.fs.com/products/69226.html



This switch has 48 SFP+ ports and 6 40-GbE uplink ports.



Goal: To the greatest extent possible, I would like to provide 10-GbE line-rate performance between any pair of hosts in both rooms. As long as they are plugged into the same switch, this is straightforward, but I'm wondering how best to accomplish this across the two switches. I don't think I need any fancy features like QoS, VLANs, or anything like that; I just want one flat level 2 network. Can I simply connect 4 of the 40-GbE uplinks between the two switches and get ~160 Gbps of bandwidth between the two rooms?



My application consists of continuous, very high-rate UDP flows; that is, it's common for a particular flow to take between 2-8 Gbps of bandwidth on its own. That means that each host will only be communicating with 1-2 other hosts at any time, putting a limit on the total number of flows that are running simultaneously. I'm not sure whether this simplifies or complicates the issue: I need to ensure that the UDP datagrams are not reordered by any aggregation of the uplink ports.







ethernet ieee-802.1ax uplinks






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 25 at 16:22









Zac67

28.1k21457




28.1k21457










asked Jan 25 at 14:52









Jason RJason R

1233




1233













  • HI Jason and welcome to NE. Can I ask how far apart the rooms are? And just for curiosity what application sector this is in?

    – jonathanjo
    Jan 25 at 14:59











  • The uplink run between the rooms would be around 30-40 feet. From what I understand, this should be achievable with QSFP modules in each switch and the appropriate fiber optic cables. This is a digital signal processing application; the signals being processed are very high in bandwidth.

    – Jason R
    Jan 25 at 15:02






  • 3





    The feature you want is called LACP (Link Aggregation Control Protocol), that can bundle the uplinks into one logical link. From reading the specs, it doesn't appear that this switch does that.

    – Ron Trunk
    Jan 25 at 15:07











  • @RonTrunk: Hmm, it looks like LACP is mentioned on each of the above switches' feature lists. Are there any variants of LACP that I need to be aware of, or if a switch states LACP/802.3ad support, should that be sufficient?

    – Jason R
    Jan 25 at 15:20






  • 1





    @JasonR you may want to check the switches' feature set in terms of hashing for link selection in a LAG. As in: which combinations of Src/Dst MAC, Src/Dst IP, Src/Dst Ports are being used, and which set(s) are configurable. If at all possible, make sure you have as many options available as possible.

    – Marc 'netztier' Luethi
    Jan 25 at 18:59





















  • HI Jason and welcome to NE. Can I ask how far apart the rooms are? And just for curiosity what application sector this is in?

    – jonathanjo
    Jan 25 at 14:59











  • The uplink run between the rooms would be around 30-40 feet. From what I understand, this should be achievable with QSFP modules in each switch and the appropriate fiber optic cables. This is a digital signal processing application; the signals being processed are very high in bandwidth.

    – Jason R
    Jan 25 at 15:02






  • 3





    The feature you want is called LACP (Link Aggregation Control Protocol), that can bundle the uplinks into one logical link. From reading the specs, it doesn't appear that this switch does that.

    – Ron Trunk
    Jan 25 at 15:07











  • @RonTrunk: Hmm, it looks like LACP is mentioned on each of the above switches' feature lists. Are there any variants of LACP that I need to be aware of, or if a switch states LACP/802.3ad support, should that be sufficient?

    – Jason R
    Jan 25 at 15:20






  • 1





    @JasonR you may want to check the switches' feature set in terms of hashing for link selection in a LAG. As in: which combinations of Src/Dst MAC, Src/Dst IP, Src/Dst Ports are being used, and which set(s) are configurable. If at all possible, make sure you have as many options available as possible.

    – Marc 'netztier' Luethi
    Jan 25 at 18:59



















HI Jason and welcome to NE. Can I ask how far apart the rooms are? And just for curiosity what application sector this is in?

– jonathanjo
Jan 25 at 14:59





HI Jason and welcome to NE. Can I ask how far apart the rooms are? And just for curiosity what application sector this is in?

– jonathanjo
Jan 25 at 14:59













The uplink run between the rooms would be around 30-40 feet. From what I understand, this should be achievable with QSFP modules in each switch and the appropriate fiber optic cables. This is a digital signal processing application; the signals being processed are very high in bandwidth.

– Jason R
Jan 25 at 15:02





The uplink run between the rooms would be around 30-40 feet. From what I understand, this should be achievable with QSFP modules in each switch and the appropriate fiber optic cables. This is a digital signal processing application; the signals being processed are very high in bandwidth.

– Jason R
Jan 25 at 15:02




3




3





The feature you want is called LACP (Link Aggregation Control Protocol), that can bundle the uplinks into one logical link. From reading the specs, it doesn't appear that this switch does that.

– Ron Trunk
Jan 25 at 15:07





The feature you want is called LACP (Link Aggregation Control Protocol), that can bundle the uplinks into one logical link. From reading the specs, it doesn't appear that this switch does that.

– Ron Trunk
Jan 25 at 15:07













@RonTrunk: Hmm, it looks like LACP is mentioned on each of the above switches' feature lists. Are there any variants of LACP that I need to be aware of, or if a switch states LACP/802.3ad support, should that be sufficient?

– Jason R
Jan 25 at 15:20





@RonTrunk: Hmm, it looks like LACP is mentioned on each of the above switches' feature lists. Are there any variants of LACP that I need to be aware of, or if a switch states LACP/802.3ad support, should that be sufficient?

– Jason R
Jan 25 at 15:20




1




1





@JasonR you may want to check the switches' feature set in terms of hashing for link selection in a LAG. As in: which combinations of Src/Dst MAC, Src/Dst IP, Src/Dst Ports are being used, and which set(s) are configurable. If at all possible, make sure you have as many options available as possible.

– Marc 'netztier' Luethi
Jan 25 at 18:59







@JasonR you may want to check the switches' feature set in terms of hashing for link selection in a LAG. As in: which combinations of Src/Dst MAC, Src/Dst IP, Src/Dst Ports are being used, and which set(s) are configurable. If at all possible, make sure you have as many options available as possible.

– Marc 'netztier' Luethi
Jan 25 at 18:59












1 Answer
1






active

oldest

votes


















3















Goal: To the greatest extent possible, I would like to provide 10-GbE line-rate performance between any pair of hosts in both rooms.




In order to truly guarantee 10G, you'll need 10G dedicated bandwidth between the rooms for each host. 10 hosts on each side would require a 10*10 = 100G link. Aggregated links might not be enough as the flows are balanced based on source/destination addresses and ports - two random flows can easily land on the same physical link and fight for bandwidth while another link is idle.



That said, LAG trunks most often work fine unless there's a very busy network or an extreme necessity to guarantee bandwidth at all times.




Can I simply connect 4 of the 40-GbE uplinks between the two switches and get ~160 Gbps of bandwidth between the two rooms?




No. Running multiple connections between switches causes bridge loops which in turn cause broadcast storms that'll bring down the network.



One solution is using a spanning tree protocol (RSTP/MSTP) but that's only useful for redundant links (all but one link is operationally deactivated).



What you need is link aggregation (LAG), preferrably LACP which is an IEEE, vendor-agnostic aggregation protocol. Put the desired interfaces in an LACP trunk group on both sides and then connect the ports. Note that you can only run LACP trunks between two switches - you can't split and recombine a trunk between three or more switches.



For more than two switches there are various proprietary solutions or Shortest Path Bridging (IEEE 802.1aq) which sadly hasn't caught on that much yet.



edit




I need to ensure that the UDP datagrams are not reordered by any aggregation of the uplink ports.




That is exactly why traffic is distributed based on SA/DA hashing: so that each datagram in a flow always uses the same physical path. If you also need to avoid overtaking across flows between the same two end nodes you need to make sure to use only the source/destination IP addresses and not port numbers as well.



/edit



Whether you also trunk VLANs across the LAG trunk doesn't matter. Using STP on all ports is always a good idea in case something goes wrong with the LAG trunk or someone puts up yet another link.






share|improve this answer


























  • Thanks for the answer. I understand that I can't guarantee line rate between any pair of hosts at all times because there isn't enough uplink bandwidth (4x40-GbE) to support that. After reading about it some, it seems that the key will be choosing a good hashing scheme for our application so that flows get mapped to ports in the aggregation group in a way that minimizes the chance of oversubscription of any single port. From what I understand, this hashing scheme is defined in the group configuration, and the protocol doesn't provide a way to dynamically negotiate port usage?

    – Jason R
    Jan 25 at 16:06











  • If you can choose the hashing scheme in a way that there can be no congestion you're set. However, most often you can't. LACP doesn't cover traffic distribution, only the aggregation by itself. Traffic distribution depends on the hardware at hand.

    – Zac67
    Jan 25 at 16:18






  • 1





    @JasonR, the traffic can also be asymmetric, and often is, even with the same hashing algorithm on both switches, because the source and destination addresses (both network and transport) are different in each direction.

    – Ron Maupin
    Jan 25 at 16:25













  • @RonMaupin that makes sense. For what it’s worth, in my case, the traffic is essentially unidirectional: a very high rate in one direction and little to none in the other.

    – Jason R
    Jan 25 at 16:28











  • @JasonR, OK I wanted to make sure that you understand that a carefully balanced traffic flow in one direction can be very unbalanced for return traffic. That really drives some people crazy, but it is a fact of life. Some people think that return traffic naturally follows the original path, but that simply isn't true for LAG frames, or even routed packets, as each is independently switched, regardless of any other frames or packets.

    – Ron Maupin
    Jan 25 at 16:38











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "496"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fnetworkengineering.stackexchange.com%2fquestions%2f56382%2fcan-i-aggregate-multiple-uplink-ports-between-ethernet-switches%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3















Goal: To the greatest extent possible, I would like to provide 10-GbE line-rate performance between any pair of hosts in both rooms.




In order to truly guarantee 10G, you'll need 10G dedicated bandwidth between the rooms for each host. 10 hosts on each side would require a 10*10 = 100G link. Aggregated links might not be enough as the flows are balanced based on source/destination addresses and ports - two random flows can easily land on the same physical link and fight for bandwidth while another link is idle.



That said, LAG trunks most often work fine unless there's a very busy network or an extreme necessity to guarantee bandwidth at all times.




Can I simply connect 4 of the 40-GbE uplinks between the two switches and get ~160 Gbps of bandwidth between the two rooms?




No. Running multiple connections between switches causes bridge loops which in turn cause broadcast storms that'll bring down the network.



One solution is using a spanning tree protocol (RSTP/MSTP) but that's only useful for redundant links (all but one link is operationally deactivated).



What you need is link aggregation (LAG), preferrably LACP which is an IEEE, vendor-agnostic aggregation protocol. Put the desired interfaces in an LACP trunk group on both sides and then connect the ports. Note that you can only run LACP trunks between two switches - you can't split and recombine a trunk between three or more switches.



For more than two switches there are various proprietary solutions or Shortest Path Bridging (IEEE 802.1aq) which sadly hasn't caught on that much yet.



edit




I need to ensure that the UDP datagrams are not reordered by any aggregation of the uplink ports.




That is exactly why traffic is distributed based on SA/DA hashing: so that each datagram in a flow always uses the same physical path. If you also need to avoid overtaking across flows between the same two end nodes you need to make sure to use only the source/destination IP addresses and not port numbers as well.



/edit



Whether you also trunk VLANs across the LAG trunk doesn't matter. Using STP on all ports is always a good idea in case something goes wrong with the LAG trunk or someone puts up yet another link.






share|improve this answer


























  • Thanks for the answer. I understand that I can't guarantee line rate between any pair of hosts at all times because there isn't enough uplink bandwidth (4x40-GbE) to support that. After reading about it some, it seems that the key will be choosing a good hashing scheme for our application so that flows get mapped to ports in the aggregation group in a way that minimizes the chance of oversubscription of any single port. From what I understand, this hashing scheme is defined in the group configuration, and the protocol doesn't provide a way to dynamically negotiate port usage?

    – Jason R
    Jan 25 at 16:06











  • If you can choose the hashing scheme in a way that there can be no congestion you're set. However, most often you can't. LACP doesn't cover traffic distribution, only the aggregation by itself. Traffic distribution depends on the hardware at hand.

    – Zac67
    Jan 25 at 16:18






  • 1





    @JasonR, the traffic can also be asymmetric, and often is, even with the same hashing algorithm on both switches, because the source and destination addresses (both network and transport) are different in each direction.

    – Ron Maupin
    Jan 25 at 16:25













  • @RonMaupin that makes sense. For what it’s worth, in my case, the traffic is essentially unidirectional: a very high rate in one direction and little to none in the other.

    – Jason R
    Jan 25 at 16:28











  • @JasonR, OK I wanted to make sure that you understand that a carefully balanced traffic flow in one direction can be very unbalanced for return traffic. That really drives some people crazy, but it is a fact of life. Some people think that return traffic naturally follows the original path, but that simply isn't true for LAG frames, or even routed packets, as each is independently switched, regardless of any other frames or packets.

    – Ron Maupin
    Jan 25 at 16:38
















3















Goal: To the greatest extent possible, I would like to provide 10-GbE line-rate performance between any pair of hosts in both rooms.




In order to truly guarantee 10G, you'll need 10G dedicated bandwidth between the rooms for each host. 10 hosts on each side would require a 10*10 = 100G link. Aggregated links might not be enough as the flows are balanced based on source/destination addresses and ports - two random flows can easily land on the same physical link and fight for bandwidth while another link is idle.



That said, LAG trunks most often work fine unless there's a very busy network or an extreme necessity to guarantee bandwidth at all times.




Can I simply connect 4 of the 40-GbE uplinks between the two switches and get ~160 Gbps of bandwidth between the two rooms?




No. Running multiple connections between switches causes bridge loops which in turn cause broadcast storms that'll bring down the network.



One solution is using a spanning tree protocol (RSTP/MSTP) but that's only useful for redundant links (all but one link is operationally deactivated).



What you need is link aggregation (LAG), preferrably LACP which is an IEEE, vendor-agnostic aggregation protocol. Put the desired interfaces in an LACP trunk group on both sides and then connect the ports. Note that you can only run LACP trunks between two switches - you can't split and recombine a trunk between three or more switches.



For more than two switches there are various proprietary solutions or Shortest Path Bridging (IEEE 802.1aq) which sadly hasn't caught on that much yet.



edit




I need to ensure that the UDP datagrams are not reordered by any aggregation of the uplink ports.




That is exactly why traffic is distributed based on SA/DA hashing: so that each datagram in a flow always uses the same physical path. If you also need to avoid overtaking across flows between the same two end nodes you need to make sure to use only the source/destination IP addresses and not port numbers as well.



/edit



Whether you also trunk VLANs across the LAG trunk doesn't matter. Using STP on all ports is always a good idea in case something goes wrong with the LAG trunk or someone puts up yet another link.






share|improve this answer


























  • Thanks for the answer. I understand that I can't guarantee line rate between any pair of hosts at all times because there isn't enough uplink bandwidth (4x40-GbE) to support that. After reading about it some, it seems that the key will be choosing a good hashing scheme for our application so that flows get mapped to ports in the aggregation group in a way that minimizes the chance of oversubscription of any single port. From what I understand, this hashing scheme is defined in the group configuration, and the protocol doesn't provide a way to dynamically negotiate port usage?

    – Jason R
    Jan 25 at 16:06











  • If you can choose the hashing scheme in a way that there can be no congestion you're set. However, most often you can't. LACP doesn't cover traffic distribution, only the aggregation by itself. Traffic distribution depends on the hardware at hand.

    – Zac67
    Jan 25 at 16:18






  • 1





    @JasonR, the traffic can also be asymmetric, and often is, even with the same hashing algorithm on both switches, because the source and destination addresses (both network and transport) are different in each direction.

    – Ron Maupin
    Jan 25 at 16:25













  • @RonMaupin that makes sense. For what it’s worth, in my case, the traffic is essentially unidirectional: a very high rate in one direction and little to none in the other.

    – Jason R
    Jan 25 at 16:28











  • @JasonR, OK I wanted to make sure that you understand that a carefully balanced traffic flow in one direction can be very unbalanced for return traffic. That really drives some people crazy, but it is a fact of life. Some people think that return traffic naturally follows the original path, but that simply isn't true for LAG frames, or even routed packets, as each is independently switched, regardless of any other frames or packets.

    – Ron Maupin
    Jan 25 at 16:38














3












3








3








Goal: To the greatest extent possible, I would like to provide 10-GbE line-rate performance between any pair of hosts in both rooms.




In order to truly guarantee 10G, you'll need 10G dedicated bandwidth between the rooms for each host. 10 hosts on each side would require a 10*10 = 100G link. Aggregated links might not be enough as the flows are balanced based on source/destination addresses and ports - two random flows can easily land on the same physical link and fight for bandwidth while another link is idle.



That said, LAG trunks most often work fine unless there's a very busy network or an extreme necessity to guarantee bandwidth at all times.




Can I simply connect 4 of the 40-GbE uplinks between the two switches and get ~160 Gbps of bandwidth between the two rooms?




No. Running multiple connections between switches causes bridge loops which in turn cause broadcast storms that'll bring down the network.



One solution is using a spanning tree protocol (RSTP/MSTP) but that's only useful for redundant links (all but one link is operationally deactivated).



What you need is link aggregation (LAG), preferrably LACP which is an IEEE, vendor-agnostic aggregation protocol. Put the desired interfaces in an LACP trunk group on both sides and then connect the ports. Note that you can only run LACP trunks between two switches - you can't split and recombine a trunk between three or more switches.



For more than two switches there are various proprietary solutions or Shortest Path Bridging (IEEE 802.1aq) which sadly hasn't caught on that much yet.



edit




I need to ensure that the UDP datagrams are not reordered by any aggregation of the uplink ports.




That is exactly why traffic is distributed based on SA/DA hashing: so that each datagram in a flow always uses the same physical path. If you also need to avoid overtaking across flows between the same two end nodes you need to make sure to use only the source/destination IP addresses and not port numbers as well.



/edit



Whether you also trunk VLANs across the LAG trunk doesn't matter. Using STP on all ports is always a good idea in case something goes wrong with the LAG trunk or someone puts up yet another link.






share|improve this answer
















Goal: To the greatest extent possible, I would like to provide 10-GbE line-rate performance between any pair of hosts in both rooms.




In order to truly guarantee 10G, you'll need 10G dedicated bandwidth between the rooms for each host. 10 hosts on each side would require a 10*10 = 100G link. Aggregated links might not be enough as the flows are balanced based on source/destination addresses and ports - two random flows can easily land on the same physical link and fight for bandwidth while another link is idle.



That said, LAG trunks most often work fine unless there's a very busy network or an extreme necessity to guarantee bandwidth at all times.




Can I simply connect 4 of the 40-GbE uplinks between the two switches and get ~160 Gbps of bandwidth between the two rooms?




No. Running multiple connections between switches causes bridge loops which in turn cause broadcast storms that'll bring down the network.



One solution is using a spanning tree protocol (RSTP/MSTP) but that's only useful for redundant links (all but one link is operationally deactivated).



What you need is link aggregation (LAG), preferrably LACP which is an IEEE, vendor-agnostic aggregation protocol. Put the desired interfaces in an LACP trunk group on both sides and then connect the ports. Note that you can only run LACP trunks between two switches - you can't split and recombine a trunk between three or more switches.



For more than two switches there are various proprietary solutions or Shortest Path Bridging (IEEE 802.1aq) which sadly hasn't caught on that much yet.



edit




I need to ensure that the UDP datagrams are not reordered by any aggregation of the uplink ports.




That is exactly why traffic is distributed based on SA/DA hashing: so that each datagram in a flow always uses the same physical path. If you also need to avoid overtaking across flows between the same two end nodes you need to make sure to use only the source/destination IP addresses and not port numbers as well.



/edit



Whether you also trunk VLANs across the LAG trunk doesn't matter. Using STP on all ports is always a good idea in case something goes wrong with the LAG trunk or someone puts up yet another link.







share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 25 at 16:25

























answered Jan 25 at 16:00









Zac67Zac67

28.1k21457




28.1k21457













  • Thanks for the answer. I understand that I can't guarantee line rate between any pair of hosts at all times because there isn't enough uplink bandwidth (4x40-GbE) to support that. After reading about it some, it seems that the key will be choosing a good hashing scheme for our application so that flows get mapped to ports in the aggregation group in a way that minimizes the chance of oversubscription of any single port. From what I understand, this hashing scheme is defined in the group configuration, and the protocol doesn't provide a way to dynamically negotiate port usage?

    – Jason R
    Jan 25 at 16:06











  • If you can choose the hashing scheme in a way that there can be no congestion you're set. However, most often you can't. LACP doesn't cover traffic distribution, only the aggregation by itself. Traffic distribution depends on the hardware at hand.

    – Zac67
    Jan 25 at 16:18






  • 1





    @JasonR, the traffic can also be asymmetric, and often is, even with the same hashing algorithm on both switches, because the source and destination addresses (both network and transport) are different in each direction.

    – Ron Maupin
    Jan 25 at 16:25













  • @RonMaupin that makes sense. For what it’s worth, in my case, the traffic is essentially unidirectional: a very high rate in one direction and little to none in the other.

    – Jason R
    Jan 25 at 16:28











  • @JasonR, OK I wanted to make sure that you understand that a carefully balanced traffic flow in one direction can be very unbalanced for return traffic. That really drives some people crazy, but it is a fact of life. Some people think that return traffic naturally follows the original path, but that simply isn't true for LAG frames, or even routed packets, as each is independently switched, regardless of any other frames or packets.

    – Ron Maupin
    Jan 25 at 16:38



















  • Thanks for the answer. I understand that I can't guarantee line rate between any pair of hosts at all times because there isn't enough uplink bandwidth (4x40-GbE) to support that. After reading about it some, it seems that the key will be choosing a good hashing scheme for our application so that flows get mapped to ports in the aggregation group in a way that minimizes the chance of oversubscription of any single port. From what I understand, this hashing scheme is defined in the group configuration, and the protocol doesn't provide a way to dynamically negotiate port usage?

    – Jason R
    Jan 25 at 16:06











  • If you can choose the hashing scheme in a way that there can be no congestion you're set. However, most often you can't. LACP doesn't cover traffic distribution, only the aggregation by itself. Traffic distribution depends on the hardware at hand.

    – Zac67
    Jan 25 at 16:18






  • 1





    @JasonR, the traffic can also be asymmetric, and often is, even with the same hashing algorithm on both switches, because the source and destination addresses (both network and transport) are different in each direction.

    – Ron Maupin
    Jan 25 at 16:25













  • @RonMaupin that makes sense. For what it’s worth, in my case, the traffic is essentially unidirectional: a very high rate in one direction and little to none in the other.

    – Jason R
    Jan 25 at 16:28











  • @JasonR, OK I wanted to make sure that you understand that a carefully balanced traffic flow in one direction can be very unbalanced for return traffic. That really drives some people crazy, but it is a fact of life. Some people think that return traffic naturally follows the original path, but that simply isn't true for LAG frames, or even routed packets, as each is independently switched, regardless of any other frames or packets.

    – Ron Maupin
    Jan 25 at 16:38

















Thanks for the answer. I understand that I can't guarantee line rate between any pair of hosts at all times because there isn't enough uplink bandwidth (4x40-GbE) to support that. After reading about it some, it seems that the key will be choosing a good hashing scheme for our application so that flows get mapped to ports in the aggregation group in a way that minimizes the chance of oversubscription of any single port. From what I understand, this hashing scheme is defined in the group configuration, and the protocol doesn't provide a way to dynamically negotiate port usage?

– Jason R
Jan 25 at 16:06





Thanks for the answer. I understand that I can't guarantee line rate between any pair of hosts at all times because there isn't enough uplink bandwidth (4x40-GbE) to support that. After reading about it some, it seems that the key will be choosing a good hashing scheme for our application so that flows get mapped to ports in the aggregation group in a way that minimizes the chance of oversubscription of any single port. From what I understand, this hashing scheme is defined in the group configuration, and the protocol doesn't provide a way to dynamically negotiate port usage?

– Jason R
Jan 25 at 16:06













If you can choose the hashing scheme in a way that there can be no congestion you're set. However, most often you can't. LACP doesn't cover traffic distribution, only the aggregation by itself. Traffic distribution depends on the hardware at hand.

– Zac67
Jan 25 at 16:18





If you can choose the hashing scheme in a way that there can be no congestion you're set. However, most often you can't. LACP doesn't cover traffic distribution, only the aggregation by itself. Traffic distribution depends on the hardware at hand.

– Zac67
Jan 25 at 16:18




1




1





@JasonR, the traffic can also be asymmetric, and often is, even with the same hashing algorithm on both switches, because the source and destination addresses (both network and transport) are different in each direction.

– Ron Maupin
Jan 25 at 16:25







@JasonR, the traffic can also be asymmetric, and often is, even with the same hashing algorithm on both switches, because the source and destination addresses (both network and transport) are different in each direction.

– Ron Maupin
Jan 25 at 16:25















@RonMaupin that makes sense. For what it’s worth, in my case, the traffic is essentially unidirectional: a very high rate in one direction and little to none in the other.

– Jason R
Jan 25 at 16:28





@RonMaupin that makes sense. For what it’s worth, in my case, the traffic is essentially unidirectional: a very high rate in one direction and little to none in the other.

– Jason R
Jan 25 at 16:28













@JasonR, OK I wanted to make sure that you understand that a carefully balanced traffic flow in one direction can be very unbalanced for return traffic. That really drives some people crazy, but it is a fact of life. Some people think that return traffic naturally follows the original path, but that simply isn't true for LAG frames, or even routed packets, as each is independently switched, regardless of any other frames or packets.

– Ron Maupin
Jan 25 at 16:38





@JasonR, OK I wanted to make sure that you understand that a carefully balanced traffic flow in one direction can be very unbalanced for return traffic. That really drives some people crazy, but it is a fact of life. Some people think that return traffic naturally follows the original path, but that simply isn't true for LAG frames, or even routed packets, as each is independently switched, regardless of any other frames or packets.

– Ron Maupin
Jan 25 at 16:38


















draft saved

draft discarded




















































Thanks for contributing an answer to Network Engineering Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fnetworkengineering.stackexchange.com%2fquestions%2f56382%2fcan-i-aggregate-multiple-uplink-ports-between-ethernet-switches%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to change which sound is reproduced for terminal bell?

Can I use Tabulator js library in my java Spring + Thymeleaf project?

Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents