Showing posts with label Fundamentals. Show all posts

Tuesday, April 2, 2013

Evaluating Network Gear Performance

Choosing the right equipment for your network is hard. Even ignoring the ever-growing roster of features one must account for when evaluating candidate hardware, it's important not to overlook performance limitations. Below are some of the most crucial characteristics to consider when doing your research.

Throughput

Throughput is the rate at which a device can convert input to output. This is different from bandwidth, which is the rate at which data travels across a medium. An Ethernet switch, for example, might have 48 ports running at an individual bandwidth of 1 Gbps each but be able to switch only a total of 12 Gbps among the ports at any given time. This is said to be the switch's maximum throughput.

Throughput is measured in two units: bits per second (bps) and packets per second (pps). Most people are most familiar with bits per second. This is the amount of data which flows through a particular point within a duration of one second, typically expressed as megabits (Mbps) or gigabits (Gbps) per second. Capitalization is important here. A lowercase 'b' indicates bits, whereas an uppercase 'B' indicates bytes. Speed is always measured in bits per second, with a lowercase 'b' (Kbps or Mbps).

Packets per second, similarly expressed most often as Kpps or Mpps, is another way of evaluating throughput. It conveys the number of packets or frames which can be processed in one second. This approach to measuring throughput is used to expose limitations of the processing power of devices, as shorter packets demand more frequent forwarding decisions. For example, a router might claim a throughput of 30 Mbps per second using full-size packets. However, it might also be limited to processing 40 Kpps. If each packet received was the minimum size of 64 bytes (512 bits), the router would be limited to just 20.48 Mbps (512 * 40,000) of throughput.

Cisco maintains often cited baseline performance measurements for its most popular routers and switches. If you work out the math, you can see that the Mbps numbers listed in the router performance document were derived using minimum-length (64 byte) packets. These numbers thus present a worst case scenario. Packets on a production network typically vary widely in size, and larger packets will yield higher bits-per-second rates.

Keep in mind that these benchmarks were taken with no features other than IP routing enabled. Adding additional features and services such as access control lists or network address translation may reduce throughput. Unfortunately, it's impractical for a vendor to list throughput rates with and without myriad features enabled, so you'll have to do some testing yourself.

Oversubscription

Ethernet switches are often built with oversubscribed backplanes. Oversubscription refers to a point of congestion within a system where the potential rate of input is greater than the potential rate of output. For example, a switch with 48 1 Gbps ports might have a backplane throughput limitation of only 16 Gbps. This means that only 16 ports can transmit at wire rate (the physical maximum throughput) at any point in time. This isn't usually a problem at the network edge, where few users or servers ever need to transmit at these speeds for a prolonged time. However, oversubscription imposes much more critical considerations in the data center or network core.

As an example, let's look at the 16-port 10 Gbps Ethernet module WS-X6816-10G-2T for the Cisco Catalyst 6500 switch. Although the module provides an aggregate of 160 Gbps of potential throughput, its connection to the chassis backplane is only 40 Gbps. The module is oversubscribed at a ratio of 4:1. This module should only be used in situations where the aggregate throughput demand from all interfaces is not expected to exceed 40 Gbps.

IP Route Capacity

The maximum number of routes a router can hold in its routing table is limited by the amount of available content-addressable memory (CAM). Although a low-end router may be able to run BGP and exchange routes with BGP peers, it likely won't have sufficient memory to accept the full IPv4 Internet routing table, which comprises over 400 thousand routes. (Of course, low-end routers should never be implemented in a position where they would need to receive the full routing table.) Virtual routing contexts, in which a router stores multiple copies of a route in separate forwarding tables, increase routing table size exponentially, further elevating the importance of properly sizing routers for the role they play.

Maximum Concurrent Sessions

Firewalls and intrusion prevention systems perform stateful inspection of traffic transiting from one trust zone to another. These devices must be able to keep up with the demand for throughput not only in terms of bits per second and packets per second but also in the number of concurrent stateful sessions. A single web request might trigger the initiation of one or two dozen TCP connections to various content servers from an internal host. The firewall or IPS must be able to track the state of and inspect potentially thousands of sessions at any point in time. If the device's maximum capacity is reached, attempts to open new sessions may be rejected until a number of current sessions are closed or expire. Such devices are likewise limited in how fast they can create new sessions.

Basics: What’s the Difference Between STP BPDU Guard and Root Guard

Courtesy - Ethereal Mind

BPDU Guard and Root Guard are enhancements to Spanning Tree Protocol (STP) enhancements that improve the reliability of the protocol to unexpected events.

Why ?

Remember that the purpose of the the Spanning Tree algorithm is to create a single path through the network to prevent loops because the Ethernet frame has no loop prevention mechanism. As a result an Ethernet network is always designed like an inverted tree like this:

There are loops in this design that are implemented for resilience ie. STP will block a given path in planned operation but an alternate path can be activated if the primary path fails.

However, STP is susceptible to various failures due to poor network design 1 or certain types of operational problems. Both BPDU Guard and Root Guard are used to enforce design discipline and ensure that the STP protocol operates as designed.

BPDU Guard

BPDU guard disables the port upon BPDU reception if PortFast is enabled on the port. This effectively denies devices connected to these ports from participating in the desgined STP thus protecting your data centre core.

Note: In the event of the BPDU being received the port will typically be shutdown in “errdisable” state and will require manually reenabling the port. Alternately you can configure the port to attempt to re-enable by configuring the “errdisable timeout”

Root Guard

Root guard allows the device to participate in STP as long as the device does not try to become the root. If root guard blocks the port, subsequent recovery is automatic. Recovery occurs as soon as the offending device ceases to send superior BPDUs.

Where ?

Because BPDU Guard and Root Guard are primarily to ensure design enforcement ( integrity / security) , they must configured in specific locations in the networks.

By “design” I mean that people add new switches in the wrong places which breaks that controlled design as shown here.

Monday, April 1, 2013

The Science of Network Troubleshooting

Courtesy - Jemery Stretch

Troubleshooting is not an art. Along with many other IT methodologies, it is often referred to as an art, but it's not. It's a science, if ever there was one. Granted, someone with great skill in troubleshooting can make it seem like a natural talent, the same way a professional ball player makes hitting a home run look easy, when in fact it is a learned skill. Another common misconception holds troubleshooting as a skill derived entirely from experience with the involved technologies. While experience is certainly beneficial, the ability to troubleshoot effectively arises primarily from the embrace of a systematic process, a science.

It's said that troubleshooting can't be taught, but I disagree. More accurately, I would argue that troubleshooting can't be taught easily, or to great detail. This is because traditional education encompasses how a technology functions; troubleshooting encompasses all the ways in which it can cease to function. Given that it's virtually impossible to identify and memorize all the potential points of failure a system or network might hold, engineers must instead learn a process for identifying and resolving malfunctions as they occur. To borrow a cliché analogy, teach a man to identify why a fish is broken, rather than expecting him to memorize all the ways a fish might break.

Troubleshooting as a Process

Essentially, troubleshooting is the correlation between cause and effect. Your proxy server experiences a hard disk failure, and you can no longer access web pages. A backhoe digs up a fiber, and you can't call a branch office. Cause, and effect. Moving forward, the correlation is obvious; the difficulty lies in transitioning from effect to cause, and this is troubleshooting at its core.

Consider walking into a dark room. The light is off, but you don't know why. This is the observed effect for which we need to identify a cause. Instinctively, you'll reach for the light switch. If the light switch is on, you'll search for another cause. Maybe the power is out. Maybe the breaker has been tripped. Maybe someone stole all the light bulbs (it happens). Without much thought, you investigate each of these possible causes in order of convenience or likelihood. Subconsciously, you're applying a process to resolve the problem.

Even though our light bulb analogy is admittedly simplistic, it serves to illustrate the fundamentals of troubleshooting. The same concepts are scalable to exponentially more complex scenarios. From a high-level view, the troubleshooting process can be reduced to a few core steps:

Identify the effect(s)
Eliminate suspect causes
Devise a solution
Test and repeat
Mitigate

Step 1: Identify the Effect(s)

If you've been a network engineer for more than a few hours, you've been told at least once that the Internet is down. Yes, the global information infrastructure some forty years in the making has fallen to its knees and is in a state of complete chaos. All this is, of course, confirmed by Mary in accounting. Last time it was discovered her Ethernet cable had come unplugged, but this time she's certain it's a global catastrophe.

Correctly identifying the effects of an outage or change is the most critical step in troubleshooting. A poor judgment at this first step will likely start you down the wrong path, wasting time and resources. Identifying an effect is not to be confused with deducing a probable cause; in this step we are focused solely on listing the ways in which network operation has deviated from the norm.

Identifying effects is best done without assumption or emotion. While your mind will naturally leap to possible causes at the first report of an outage, you must force yourself to adopt an objective stance and investigate the noted symptoms without bias. In the case of Mary's doomsday forecast, you would likely want to confirm the condition yourself before alerting the authorities.

Some key points to consider:

What was working and has stopped?

An important consideration is whether an absent service was ever present to begin with. A user may report an inability to reach FTP sites as an outage, not realizing FTP connections have always been blocked by the firewall as a matter of policy.

What wasn't working and has started?

This is can be a much less obvious change, but no less important. One example would be the easing of restrictions on traffic types or bandwidth, perhaps due to routing through an alternate path, or the deletion of an access control mechanism.

What has continued to work?

Has all network access been severed, or merely certain types of traffic? Or only certain destinations? Has a contingency system assumed control from a failed production system?

When was the change observed?

This critical point is very often neglected. Timing is imperative for correlation with other events, as we'll soon see. Also remember that we are often limited to noting the time a change was observed, rather than when it occurred. For example, an outage observed Monday morning could have easily occurred at any time during the preceding weekend.

Who is affected? Who isn't?

Is the change limited to a certain group of users or devices? Is it constrained to a geographical or logical area? Is any person or service immune?

Is the condition intermittent?

Does the condition disappear and reappear? Does this happen at predictable intervals, or does it appear to be random?

Has this happened before?

Is this a recurring problem? How long ago did it happen last? What was the resolution? (You do keep logs of this sort of thing, right?)

Correlation with planned maintenance and configuration changes

Was something else being changed at this time? Was a device added, removed, or replaced? Did the outage occur during a scheduled maintenance window, either locally or at another site or provider?

Step 2: Eliminate Suspect Causes

Once we have a reliable account of the effect or effects, we can attempt to deduce probable causes. I say probable because deducing all possible causes is impractical, if not impossible. One possible cause is a power failure. Another possible cause is spontaneous combustion. Only one of these possible causes is probable.

There is a popular mantra of "always start with layer one," suggesting that the physical connectivity of a network should be verified before working on the higher layers. I disagree, as this is misleading and often impractical. You're not going to drive out to a remote site to verify everything is plugged in if a simple ping verifies end-to-end connectivity. Similarly, it's unlikely that any cables were disturbed if you can verify with relative certainty no one has gone near the suspect devices. Perhaps this is an oversimplified argument, but verifying physical connectivity is often needlessly time consuming and superseded by alternative methods.

Instead, I suggest narrowing causes in order of combined probability and convenience. For example, there might be nothing to indicate DNS is returning an invalid response, but performing a manual name resolution takes roughly two seconds, so this is easily justified. Conversely, comparing a current device configuration to its week-old backup and accounting for any differences may take a considerable amount of time, but this presents a high probability of exposing a cause, so it too is justified.

The order in which you decide to eliminate suspect causes is ultimately dependent on your experience, your familiarity with the infrastructure, and your allowance for time. Regardless of priority, each suspect cause should undergo the same process of elimination:

Define a working condition

You can't test for a condition unless you know what condition to expect. Before performing a test, you should have in mind what outcome should be produced in the absence of an outage. For example, performing a traceroute to a distant node is meaningless if you can't compare it against a traceroute to the same destination under normal conditions.

Define a test for that condition

Ensure that the test you perform is in fact evaluating the suspect cause. For instance, pinging an E-mail server doesn't explicitly guarantee that mail services are available, only the server itself (technically, only that server's address). To verify the presence of mail services, a connection to the relevant daemon(s) must be established.

Apply the test and record the result

Once you've applied the test, record its success or failure in your notes. Even if you've eliminated the cause under suspicion, you have a reference to remind you of this and avoid wasting time repeating the same test again unnecessarily.

It is common to uncover multiple failures in the course of troubleshooting. When this happens, it is important to recognize any dependencies. For example, if you discover that E-mail, web access, and a trunk link are all down, the E-mail and web failures can likely be ignored if they depend on the trunk link to function. However, always remember to verify these supposed secondary outages after the primary outage has been resolved.

Step 3: Devise a Solution

Once we have identified a point of failure, we want to continue our systematic approach. Just as with testing for failures, we can apply a simple methodology to testing for solutions. In fact, the process very closely mirrors the steps performed to eliminate suspect causes.

Define the failure

At this point you should have a comfortable idea of the failure. Form a detailed description so you have something to test against after applying a solution. For example, you would want to refine "The Internet is down" to "Users in building 10 cannot access the Internet because their subnet was removed from the outbound ACL on the firewall."

Define the proposed solution

Describe exactly what changes are to be made, and exactly what the expected outcome is. Blanket solutions such as arbitrarily rebooting a device or rebuilding a configuration from scratch might fix the problem, but they prevent any further diagnosis and consequently impede mitigation.

Apply the solution and record the result

Once we have a defined failure and a proposed solution, it's time to pit the two against each other. Be observant in applying the solution, and record its result. Does the outcome match what you expected? Has the failure been resolved?

In addition to our defined process, some guidelines are well worth mentioning.

Maintain focus

Far too often I encounter a technician who, upon becoming frustrated with a failure or failures, opts to recklessly reboot, reconfigure, or replace a device instead of troubleshooting systematically. This is the high-tech equivalent of pounding something with a hammer until it works. Focus on one failure at a time, and one solution at a time per failure.

Watch out for hazardous changes

When developing a solution, remember to evaluate what effect it might have on systems unrelated to those being troubleshot. It's a horrible feeling to realize you've fixed one problem at the expense of causing a much larger one. The best course of action when this happens is typically to immediately reverse the change which was made. Note that this is only possible with a systematic approach.

Step 4: Test and Repeat

Upon implementing a solution and observing a positive effect, we can begin to retrace our steps back toward the original symptoms. If any conditions were overlooked because they were decided to be dependent on the recently resolved failure, test for them again. Refer to your notes from the initial report and verify that each symptom has been resolved. Ensure that the same tests which were used to identify a failure are used to confirm functionality.

If you notice that a failure or failures remain, pick up where you left off in the testing cycle, annotate it, and press forward.

Step 5: Mitigate

The troubleshooting process does not end when the problem has been resolved and everyone is happy. All of your hard work up until this point amounts to very little if the same problem occurs again tomorrow. In IT, problems are commonly fixed without ever being resolved. Band-aids and hasty installations are not acceptable substitutes for implementing a permanent and reliable solution. So to speak, many people will go on mopping the floor day after day without ever plugging the leak.

A permanent solution may be as complex as redesigning the routed backbone, or as simple as moving a power strip somewhere it won't be tripped on anymore. A permanent solution also doesn't have to be 100% effective, but it should be as effective as is practical. At the absolute minimum, ensure that you record the observed failure and the applied solution, so that if it the condition does recur you have an accurate and dated reference.

A Final Word

Everyone has his or her own preference in troubleshooting, and by no means do I consider this paper conclusive. However, if there's only one concept you take away, make it this: above all else, remain calm. You're no good to anyone in a panic. One's ability to manage stress and maintain a professional demeanor even in the face of utter chaos is what makes a good engineer great.

Most network outages, despite what we're led to believe by senior management, are not the end of the world. There are instances where downtime can lead to loss of life; fortunately, this isn't the case with the vast majority of networks. Money may be lost, time may be wasted, and feelings may be hurt, but when the problem is finally resolved, odds are you've learned something valuable.

Thursday, March 28, 2013

Understanding Spanning Tree Protocol

Spanning-tree Protocols
802.1d (Standard Spanning-tree)
So the entire goal of spanning-tree is to create a loop free layer 2 domain. There is no TTL in a layer 2 frame so if you don’t have spanning-tree, a frame can loop forever. So the original 802.1d standard set out to fix this. There are a few main pieces to the 802.1d process. They are…

1. Elect a root bridge.
This bridge is the ‘root’ of the spanning-tree. In order to elect a root bridge, all of the switches send out BPDU (Bridge Protocol Data Units). The BPDU has a bridge priority in it which the switches use to determine which switch should be the root. The lowest ID wins. The original standard specified a bridge ID as…

As time progressed there became a need to create multiple spanning-trees for multiple VLANs (we’ll get to that later). So, the bridge ID format had to be changed. What they came up with was..

So, now you know why you need to have a bridge priority that’s in multiples of 4096 (if you don’t.. A total of 4 bits gives you a total of 16 values, 16 * 4096 gives you 65,536 which is the old bridge priority max value – 1).

So at this point, we have a mess of switches swarming around with BPDUs. If a switch receives a BPDU with a lower bridge priority it knows that it isn’t the root. At that point, it stops sending out it’s own bridge ID and starts sending out BPDUs with the better (lower) priority that it heard of. In the end, all of the switches will be forwarding BPDUs with the lowest bridge ID. At that point, the switch originating the best(lowest) bridge ID knows that it is the root bridge.

2. Each switch selects a root portSo now that we know which switch is the root, every non-root switch needs to select it’s root port. That is, the port with the lowest cost to the root switch. To determine this, the root port sends ‘Hellos’ out of all of it’s port every 2 seconds. When a non-root switch receives the hello, it does a couple of things. First, it reads the ‘cost’ from the hello message and updates it by adding the port cost. So if a hello came in a fast Ethernet port with a cost of 4, the switch would add 19 to it giving you a new cost of 23. After all of the hellos are sent, the switch picks it’s root port by selecting the port which had the lowest calculated cost. Now, a bit about port costs. See the table below…

Interface Speed	Original IEEE Port Cost	New IEEE port Cost
10 Mbps	100	100
100 Mbps	10	19
1000 Mbps	1	4
10000 Mbps	1	2

So as you can see, with the increase in speed came a upgrade to the port costs. Now that we have 40 gig interfaces I’m wondering if they will redo that again. At any rate, if there is a tie, say two ports that have a calculated cost of 23. The switch breaks the tie in the following fashion..

1. Pick the lowest bridge ID of switch that sent the hellos
2. Pick the lowest port priority of the switch that sent the hellos
3. Use the lowest port number of the switch that sent the hellos
(We’ll talk about port priorities in a bit) Now that we have a root port we can move onto step 3.

3. Pick a designated portThis part is pretty easy. Basically, each segment can only have one designated port. The switch that forwards the lowest cost hello onto a particular segment becomes the designated switch and the port that it uses to do that is the designated port. So, that would mean that each port on the root bridge would be a designated port. Then, ports that are neither root ports or designated ports (non-designated ports) go into blocking state. If a tie occurs, the same tiebreaker process occurs as in step 2.

At this point, we have a fully converged spanning-tree!

Normal OperationUnder normal operation the root sends hellos out of all it’s active ports. Each connected switch receives the hellos on their root ports, updates it, and forwards it out of it’s designated port (if it has one). Blocked ports receive the hellos, but never forward them.

Topology Changes
When a switch notices a topology change, it’s responsible for telling all other connected switches about the change. The most effective way to do this, is to tell the root switch so that it can tell all of the other switches. When a switch notices a topology change, it sends a TCN (topology change notification) out it’s root port. The switch will send the TCN every hello time until the upstream switch acknowledges it. The upstream switch acknowledges by sending a hello with a TCA (topology change acknowledgement). This process continues until the root becomes notified. The root will then set the TC flag on it’s hellos. When switches in the tree see the TC set in the hello from the root, they know that there has been a topology change and that they need to age out their CAM tables. Switches aging out their CAM tables is an important part of a topology change and reconvergence.

802.1D Port States

Blocking – The port is blocking all traffic with the exception of receiving STP BPDUs. The port will not forward any frames in this state.
Listening – Same as blocking but will now begin to send BPDUs.
Learning – The switch will begin to learn MAC information in this state.
Forwarding – Normal full up and up port state. Forwarding normal traffic.

TimingThere are a couple of main timers in the STP protocol. These are..
Forward Delay Timer – Default of 15 seconds
Hello – Default of 2 seconds
MaxAge – Default of 20 seconds

Spanning-Tree enhancements (Cisco Proprietary)
PortFast – Immediately puts a port into forwarding mode. Essentially disables the STP process. Should only be used for connecting to end hosts.
UplinkFast – Should be used on access layer switches connecting to distribution. Used to fail over the root port in the case of the primary root port failing. CAM entries are timed out by the access layer generating multicast frames with attached devices MACs as the source for the frames. This is different than the normal TCN process as described earlier. UplinkFast also causes the switch to increase the root priority to 49152 and set all of the ports costs to 3000.
BackboneFast – Used to detect indirect STP failures. This way the switch doesn’t have to wait MaxAge to reconverge. The feature needs to be configured on all switches in order for it to work. The switch queries it’s upstream switches when it sops receiving hellos with a RLQ (Root Link Query). If the upstream switch had a failure it can reply to the local switch so that it can converge to another port without waiting for the MaxAge to expire.

802.1w (Rapid Spanning-Tree)
Rapid spanning-tree takes 802.1d and makes it faster. In addition, they take some of the Cisco proprietary features and standardize them. Here are some of the notable changes that 802.1w makes.

-Switches only wait to miss 3 hellos on their root port prior to reconverging. This number in 802.1d was 10 (MaxAge, or 10 times hello).
-Fewer port states. 802.1w takes the number of port states from 5 (Im counting disabled) down to 3.

The new states are discarding, learning, and forwarding.
-Concept of a backup DP when a switch has multiple ports connected to the same segment.
-Standardization of the Cisco proprietary PortFast, UplinkFast, and BackboneFast.

802.1w Link TypesPoint to Point – Connects a switch to another switch in full duplex mode.
Shared – Connects a switch to a hub using half duplex
Edge – A user access port

802.1w Port roles
Root Port – The same as in 802.1d
Designated Port – The same as in 802.1d
Alternate Port – Same as the uplink fast feature, backup RP connection
Backup Port – Alternate DP port, can take over if the existing DP fails

802.1s (Multiple Spanning-Tree)
Multiple spanning-tree (MST) lets you map VLANs into a particular spanning tree. These VLANs are then considered to be part of the same MST region. MST uses the same features as RSTP for convergence, so if you are running MST, you are by default also running RSTP. Much like any other ‘group’ technology, there are several parameters that must be met before switches/vlans can become part of the same region.

-MST must be globally enabled
-The MST region name must be configured (and the same on each switch)
-Define the MST revision number (and make it the same on each switch)
-Map the same VLANs into each region (or instance)

MST can con-exist with other switches that don’t talk MST. In this case, the entire MST region appears to be a single switch to the other ‘external’ spanning-tree. The spanning-tree that connects the region to the ‘outside’ is considered to be the IST, or Internal Spanning Tree.

Spanning-tree Protection
There are several ‘protection’ mechanisms available that can be implemented in conjunction with spanning-tree to protect the spanning-tree from failure or loops.

BPDU Guard – Should be enabled on all ports that will never connect to anything but an end user port. The configuration will err-disable a port if a BPDU is received on that port. To recover from this condition the port must be shut/no shut.

Root Guard – Protects the switch from choosing the wrong RP. If a superior BPDU is heard on this port the port is placed into root-inconsistent state until the BPDUs are no longer heard.

UDLD – Unidirectional link detection is used to detect when one side (transmit or receive) is lost. States like this can cause loops and loss of connectivity. UDLD functions in two modes, aggressive and normal. Normal mode uses layer 2 messaging to determine if a switches transmission capabilities have failed. If this is detected, the switch with the failed transmit side goes into err-disable. In aggressive mode the switch tries to reconnect with the other side 8 times. If this fails, both sides go into err-disable.

Loop Guard – When a port configured with loop guard stops hearing BPDUs it goes into loop-inconsistent state rather than transitioning into forwarding.

Sunday, March 24, 2013

Network and Application Root Cause Analysis

A few years ago, “Prove it’s not the network” was all that a Network Engineer had to do to get a problem off his back. If he could simply show that throughput was good end to end, latency was low, and there was no packet loss, he could throw a performance problem over the wall to the server and application people.

Today, it’s not quite that simple.

Network Engineers have access to tools which give them visibility into the packet level of a problem. This detail in visibility often requires them to work hand in hand with application people all the way through to problem resolution. In order to find performance problems in applications, the network guys have had to take the TCP bull by the horns and simply take ownership of the transport layer.

What does that mean?

First, it means that “Prove it’s not the network” isn’t enough, as we have already mentioned. But it also means that analyzing TCP windows, TCP flags, and slow server responses has fallen on their shoulders. Since this is the case today, let’s look at a quick list of transport layer issues that the Network Engineer should watch for when analyzing a slow application.

1. Check out the TCP Handshake

No connection, slow connection, client to server roundtrip time, and TCP Options. These can all be analyzed by looking at the first three packets in the TCP conversation. It’s important when analyzing an application problem to capture this connection sequence. In the handshake, note the amount of time the server takes to respond to the SYN from the client, as well as the advertised window size on both sides. Retransmissions at this stage of the game are a real killer, as the client will typically wait for up to three full seconds to send a retransmission.

2. Compare server response time to connection setup time

The TCP handshake gave a good idea of the roundtrip time between client and server. Now we can use that timer as a benchmark to measure the server response time. For example, if the connection setup time is 10mSec, and the server is taking 500mSec to respond to client requests, we can estimate that the server is taking around 490mSec to respond. Now, this is not a huge deal when the amount of client requests are low, but if the application is “chatty” (lots of client requests to the server) and we suffer the server delay on each call, this will turn into a bigger problem for performance.

3. Watch for TCP Resets

There are two ways a client can disconnect from the server, the TCP FIN or the TCP Reset. The FIN is basically a three or four packet mutual disconnect between the client and server. It can be initiated by either side, and is done to open up the connection for other calls. However, when a TCP Reset is sent, this represents an immediate shutdown of the connection. If this is sent by the server, it could be that an inactivity timer expired, or worse, a problem in the application code was triggered. It’s also possible that a device in the middle such as a load balancer sent the reset. These issues can be found in the packet trace by setting a filter for TCP Resets and closely analyzing the sender for the root cause.

This list is not exhaustive, but will get you started into looking for the root cause of a performance problem. In future articles on LoveMyTool, we'll give tips and tricks into solving different problems with several analyzers, showing how these can be detected and resolved.

Saturday, March 23, 2013

Fiber and 10 Gig Optics

So here’s a quick run down on fiber and 10 gig optics.

Multimode Fiber-Also referred to as ‘OM’ type cable
-Less expensive than single mode (‘OS’) fiber
-Several variants common today (OM1, OM2, OM3, OM4)
-OM1 and OM2 are orange in color and suitable for gigabit speeds
-OM3 and OM4 are aqua in color and suitable for 10 gigabit speeds
-OM1 has a core size of 62.5 microns
-OM2 has a core size of 50 microns
-OM3 has a core size of 50 microns and is laser optimized
-OM4 has a core size of 50 microns and is laser optimized
-Wider cores allow you use less precise light sources such as LEDs
-Typically uses 850nm or 1300nm light sources (LEDs and Lasers)

Single mode Fiber-Also referred to as ‘OS’ type cable
-Allow for a single ‘mode’ (ray of light)
-Generally yellow in color
-A couple of variants common today (OS1, OS2)
-Core is between 8 to 10 microns
-Can support distances of several thousand kilometers
-Requires laser light sources that are in the 1270nm to 1625nm range

10 Gig Optic Standards
10GBaseT – Uses standard Ethernet cabling. Cat6a or 7
10GBaseSR – Uses multimode fiber pairs. Max of 300m with OM3
10GBaseLX4 – Uses 4 different 2.5Gbps laser on different CWDM wavelengths. Only available in X2 or Xenpak because of the size of all 4 lasers in a single module. Original option to allow 10gig over older MMF fiber (not laser optimized fiber).
10GBaseLRM – Same as the LX4 but uses EDC rather than 4 individual lasers. Uses 1310nm lasers. Can reach as far as 220m on OM1,2,and 3 fiber.
10GBaseLR – 10km over OS fiber. There is no minimum distance so this type of optic can be used for short runs as well.
10GBaseER – 40km over OS fiber. Links less than 20km require the signal to be attenuated as to not damage the receiving optic.
10GBaseZR – 80km over OS fiber. Significant attenuation required for runs shorter than 80km links. Not actually a IEEE standard
10GBaseLW – Same as LR optics with the additional ability of being able to interface directly with OC192 transport.
10 Gig Interface types
There are a variety of different interfaces available for 10 gig interfaces.

XENPAK

X2
XFP

SFP+

Thursday, March 21, 2013

6500 Architecture and evolution

Since I’ve recently become more interested in the actual switching and fabric architectures of Cisco devices I decided to take a deeper look at the 6500 series switches. I’ve worked with them for years but until recently I didn’t have a solid idea on how they actually switched packets. I had a general idea of how it worked and why DFCs were a good thing but I wanted to know more. Based on my research, this is what I’ve come up with. I’d love to hear any feedback on the post since there is a chance that some of what I’ve read isn’t totally accurate. That being said, lets dive right in…

Control vs Data Plane

All actions on a switch can be considered to be part of either the control or the data plane. The 6500 series switch is a hardware based switch which implies that it performs switching in hardware rather than software. The pieces of the switch that perform switching in hardware are considered to be part of the data plane. That being said, there still needs to be software component of the switch that tells the data plane how to function. The parts of the switch that function in software are considered to be the control plane. These components make decisions and perform advanced functions which then tell the data plane how to function. Cisco’s implementation of forwarding in hardware is called CEF (Cisco Express Forwarding).

Switch DesignThe 6500 series switch is a modular switch that is comprised of a few main components. Let’s take a look at each briefly.

The ChassisThe 6500 series switch comes in many shapes and sizes. The most common (in my opinion) is the 6509. The last number indicates the number of slots on the chassis itself. There are also 3, 4, 6, and 13 slot chassis available. The chassis is what holds all of the other components and facilitates connecting them together. The modules plug into a silicon board called the backplane.

The BackplaneThe backplane is the most crucial component of the chassis. It has all of the connectors on it that the other modules plug into. It has a few main components that are highlighted on the diagram below.

The diagram shows the backplane of a standard 9 slot chassis. Each slot has a connection to the crossbar switching fabric, the three buses (D,R,C) that compose the shared bus, and a power connection.

The switch fabric in the 6500 is referred to as a ‘crossbar’ fabric. It provides unique paths for each of the connected modules to send and receive data across the fabric. In initial implementations the SUP didn’t have an integrated switch fabric which required the use of a separate module referred to as the SFM (Switch Fabric Module). With the advent of the SUP720 series of SUPs the switch fabric is now integrated into the SUP itself. The cross bar switching fabric provides multiple non-blocking paths between different modules. The speed of the fabric is a function of both the chassis as well as the device providing the switch fabric.

Standard 6500 Chassis
Provides a total of 40Gbps per slot

Enhanced(e) 6500 ChassisProvides a total of 80Gbps per slot

SFM with SUP32 SupervisorSingle 8 gig Fabric Connection
256Gbps switching fabric
18 Fabric Channels

SUP720 through SUP720-3B SupervisorSingle 20 gig Fabric Connection
720Gbps switching fabric
18 Fabric Channels

SUP720-3C SupervisorDual 20 gig Fabric Connections
720Gbps switching fabric
18 Fabric Channels

SUP2T SupervisorDual 40 gig Fabric Connections
2.08Tbps switching fabric
26 Fabric Channels

So as you can see, there are quite a few combinations you can use here. The bottom line is that with the newest SUP2T and the 6500e chassis, you could have a module with eight 10Gbps ports that wasn’t oversubscribed.

The other bus in the 6500 is referred to as a shared bus. In the initial 6500 implementation the fabric bus wasn’t used. Rather, all communication came across the shared bus. The shared bus is actually comprised of 3 distinct buses.

DBus (Data Bus) – Is the main bus in which all data is transmitted. The speed of the DBus is 32Gbps.
RBus (Results Bus) – Used by the supervisor to forward the result of the forwarding operation to each of the attached line cards. The speed of the RBus is 4Gbps.
CBus (Control Bus) – Relays information between line cards and the supervisor. This is also sometimes referred to as Ethernet Out of Band or EOB or EOBC (Ethernet Out of Band Controller). The speed of the CBus is 100Mbps half duplex.

The Supervisor (Or as well call them, SUPs)

The switch supervisor is the brains of the operation. In the initial implementation of the 6500 the SUP handled the processing of all packets and made all of the forwarding decisions. A supervisor is made up of three main components which include the switch fabric, MFSC (Multi-Layer Switch Feature Card), and the PFC (Policy Feature Card). The image below shows a top down view of a SUP 720 and the location of each component on the physical card.

MSFC – The Multi-Layer Switch Feature Card is considered to be the control plane of the switch. The MSFC runs processes that help build and maintain the layer 3 forwarding table (routing table), process ACLs, run routing protocols, and other services that are not run in hardware. The MSFC is actually comprised of two distinct pieces.

SP – The SP (Switch Processor) handles booting the switch. The SP copies the SP part of a IOS image from bootlfash, boot’s itself, and then copies the RP part of the IOS image to the RP. Once the RP is booted the SP hands control of the switch over to the RP. From that point on the RP is what the administrator talks to in order to administer the switch. In most cases, the SP still handles layer 2 switch protocols such as ARP and STP.

RP – The RP (Route Processor) handles all layer 3 functions of the 6500 including running routing protocols and building the RIB from with the FIB is populated. Once the FIB is built in the RP it can be downloaded to the data plane TCAM for hardware based forwarding of packets. The RP runs in parallel with the SP which it allows to provide the layer 2 functions of the switch.

PFC – The policy feature card receives a copy of CEF’s FIB from the MFSC. Since the MFSC doesn’t actually deal with forwarding any packets, the MFSC downloads the FIB into the hardware on the PFC. Basically, the PFC is used to accelerate layer 2 and layer 3 switching and it learns how to do that from the MFSC. The PFC is considered to be part of the data plane of the switch.

Line CardsThe line cards of a 6500 series switch provide the port density to connect end user devices. Line cards come in different port densities and support many different interface types. Line cards connect to the SUP via the backplane.

The other pieces…

The 6500 also has a fan tray slot as well as two slots for redundant power supplies. I’m not going to cover these in detail since they don’t play into the switch architecture.

Switching modesNow that we’ve discussed the main components of the 6500 lets talk about the different ways in which a 6500 switches packets. There are 5 main modes in which this occurs and the mode that is used relies heavily on what type of hardware is present in the chassis.

Classis modeIn classic mode the attached modules make use of the shared bus in the chassis. When a switchport receives a packet it is first locally queued on the card. The line card then requests permission from the SUP to send the packet on to the DBUS. If the SUP says yes, the packet is sent onto the DBUS and subsequently copied to the SUP as well as all other line cards. The SUP then performs a look up on the PFC. The result of that lookup is sent along the RBUS to all of the cards. The card containing the destination port receives information on how to forward the packet while all other cards receive word to terminate processing on the packet and they delete it from their buffers. The speed of the classic mode is 32gbps half duplex since it’s a shared medium.

CEF256In CEF256 mode each module has a connection to the shared 32Gbps bus as well as a 8Gbps connection to the switch fabric. In addition each line card has a local 16Gbps bus (LCDBUS) on the card itself. When a switchport receives a packet it is flooded on the LCDBUS and the fabric interface receives it. The fabric interface floods the packet header onto the DBUS. The PFC receives the header and makes the forwarding decision. The result is flooded on the RBUS back to the line card and the fabric interface receives the forwarding information. At that point, the entire packet is sent across the 8Gbps fabric connection to the destination line card. The fabric interface on the egress line card floods the packet on the LCDBUS and the egress switchport sends the packet on it’s way out of the switch.

dCEF256In dCEF256 mode each line card has dual 8Gbps to the switch fabric and no connection to the shared bus. In this method, the line card also has a DFC (Distributed forwarding card) which holds a local copy of the FIB as well as it’s own layer 2 adjacency table. Since the card doesn’t need to forward packets or packet headers to the SUP for processing there is no need for a connection to the shared bus. Additionally, dCEF256 cards have dual 16Gbps local line card buses. The first LCDBUS handles half of the ports on the line card and the second LCDBUS handles the second half of the ports. Communication from a port on one LCDBUS to a port on the second LCDBUS go through the switch fabric. Since the line card has all of the forwarding information that it needs it can forward packets directly across the fabric to the egress line card without talking to the SUP.

CEF720Identical operation to CEF256 but includes some upgrades. The switch fabric is now integrated into the SUP rather than on a SFM. And the dual fabric connections from each line card are now 20Gbps a piece rather than 8Gbps.

dCEF720Identical to dCEF256 with addition of same upgrades present in CEF720 (Faster fabric connections and SF in SUP).

Centralized vs Distributed Forwarding

I had indicated earlier that the early implementations of the switch utilized the SUP to make all switching and forwarding decisions. This would be considered to be centralized switching since the SUP is providing all of the functionality required to forward a packet or frame. Lets take a look at how a packet is forwarded using centralized forwarding.

Line cards by default (in most cases) come with a CFC or centralized forwarding card. The card has enough logic on it to know how to send frames and packets to the Supervisor when it needs an answer. In addition, most cards can accept a DFC or distributed forwarding card. DFCs are the functional equivalent to the PFC located on the SUP and hold an entire copy of CEF’s FIB and adjacency tables. With a DFC in place, a line card can perform distributed forwarding which takes the SUP out of the picture.

How centralized forwarding works…

1. Frame arrives at the port on a line card and is passed to the CFC on the local line card.
2. The bus interface on the CFC forwards the headers to the supervisor on the DBus. All other line cards connected to the DBus ignore the headers.
3. The PFC on the supervisor makes a forwarding decision based on the headers and floods the result on the RBus. All other line cards on the RBus ignore the result.
4. The CFC forwards the results ,along with with the packet, to the line cards fabric interface of the line card. The fabric interface forwards the results and the packet onto the switch fabric towards their final destination.
5. The egress line card’s fabric ASIC receives the packet and forwards the data out towards the egress port.

How distributed forwarding works…

1. Frame arrives at the port on a line card and is passed to the fabric interface on the local line card.
2. The fabric interface sends just the headers to the DFC located on the local line card.
3. The DFC returns the forwarding decision of it’s lookup to the fabric interface.
4. The fabric interface transmits the packet onto the switch fabric and towards the egress line card
5. Egress line card receives the packet and forwards the packet on to the egress port.

So as you can see, distributed forwarding is much quicker than centralized forwarding just from a process perspective. In addition, it doesn’t require the use of the shared bus.

ConclusionThere are many pieces of the 6500 that I didn’t cover in this post but hopefully it’s enough to get you started if you are interested in knowing how these switches work. Hopefully I’ll have time soon to do a similar post on the Nexus 7000 series switch.

Wednesday, February 27, 2013

Wireless 101 - Part 2

Antenna

An antenna is a device to transmit and/or receive electromagnetic waves. Electromagnetic waves are often referred to as radio waves. Most antennas are resonant devices, which operate efficiently over a relatively narrow frequency band. An antenna must be tuned (matched) to the same frequency band as the radio system to which it is connected otherwise reception and/or transmission will be impaired.

Types of antenna

There are 3 types of antennas used with mobile wireless, omnidirectional, dish and panel antennas.
+ Omnidirectional radiate equally in all directions
+ Dishes are very directional
+ Panels are not as directional as Dishes.

Decibels

Decibels (dB) are the accepted method of describing a gain or loss relationship in a communication system. If a level is stated in decibels, then it is comparing a current signal level to a previous level or preset standard level. The beauty of dB is they may be added and subtracted. A decibel relationship (for power) is calculated using the following formula:

“A” might be the power applied to the connector on an antenna, the input terminal of an amplifier or one end of a transmission line. “B” might be the power arriving at the opposite end of the transmission line, the amplifier output or the peak power in the main lobe of radiated energy from an antenna. If “A” is larger than “B”, the result will be a positive number or gain. If “A” is smaller than “B”, the result will be a negative number or loss.

You will notice that the “B” is capitalized in dB. This is because it refers to the last name of Alexander Graham Bell.

Note:

+ dBi is a measure of the increase in signal (gain) by your antenna compared to the hypothetical isotropic antenna (which uniformly distributes energy in all directions) -> It is a ratio. The greater the dBi value, the higher the gain and the more acute the angle of coverage.

+ dBm is a measure of signal power. It is the the power ratio in decibel (dB) of the measured power referenced to one milliwatt (mW). The “m” stands for “milliwatt”.

Example:

At 1700 MHz, 1/4 of the power applied to one end of a coax cable arrives at the other end. What is the cable loss in dB?

Solution:

=> Loss = 10 * (- 0.602) = – 6.02 dB

From the formula above we can calculate at 3 dB the power is reduced by half. Loss = 10 * log (1/2) = -3 dB; this is an important number to remember.

Beamwidth

The angle, in degrees, between the two half-power points (-3 dB) of an antenna beam, where more than 90% of the energy is radiated.

OFDM

OFDM was proposed in the late 1960s, and in 1970, US patent was issued. OFDM encodes a single transmission into
multiple sub-carriers. All the slow subchannel are then multiplexed into one fast combined channel.

The trouble with traditional FDM is that the guard bands waste bandwidth and thus reduce capacity. OFDM selects channels that overlap but do not interfere with each other.

OFDM works because the frequencies of the subcarriers are selected so that at each subcarrier frequency, all other subcarriers do not contribute to overall waveform.

In this example, three subcarriers are overlapped but do not interfere with each other. Notice that only the peaks of each subcarrier carry data. At the peak of each of the subcarriers, the other two subcarriers have zero amplitude.

Types of network in CCNA Wireless

+ A LAN (local area network) is a data communications network that typically connects personal computers within a very limited geographical (usually within a single building). LANs use a variety of wired and wireless technologies, standards and protocols. School computer labs and home networks are examples of LANs.

+ A PAN (personal area network) is a term used to refer to the interconnection of personal digital devices within a range of about 30 feet (10 meters) and without the use of wires or cables. For example, a PAN could be used to wirelessly transmit data from a notebook computer to a PDA or portable printer.

+ A MAN (metropolitan area network) is a public high-speed network capable of voice and data transmission within a range of about 50 miles (80 km). Examples of MANs that provide data transport services include local ISPs, cable television companies, and local telephone companies.

+ A WAN (wide area network) covers a large geographical area and typically consists of several smaller networks, which might use different computer platforms and network technologies. The Internet is the world’s largest WAN. Networks for nationwide banks and superstore chains can be classified as WANs.

Bluetooth

Bluetooth wireless technology is a short-range communications technology intended to replace the cables connecting portable and/or fixed devices while maintaining high levels of security. Connections between Bluetooth devices allow these devices to communicate wirelessly through short-range, ad hoc networks. Bluetooth operates in the 2.4 GHz unlicensed ISM band.

Note:

Industrial, scientific and medical (ISM) band is a part of the radio spectrum that can be used by anybody without a license in most countries. In the U.S, the 902-928 MHz, 2.4 GHz and 5.7-5.8 GHz bands were initially used for machines that emitted radio frequencies, such as RF welders, industrial heaters and microwave ovens, but not for radio communications. In 1985, the FCC Rules opened up the ISM bands for wireless LANs and mobile communications. Nowadays, numerous applications use this band, including cordless phones, wireless garage door openers, wireless microphones, vehicle tracking, amateur radio…

WiMAX

Worldwide Interoperability for Microwave Access (WiMax) is defined by the WiMax forum and standardized by the IEEE 802.16 suite. The most current standard is 802.16e.

Operates in two separate frequency bands, 2-11 GHz and 10-66 GHz
At the higher frequencies, line of sight (LOS) is required – point-to-point links only
In the lower region, the signals propagate without the requirement for line of sight (NLOS) to customers

Basic Service Set (BSS)

A group of stations that share an access point are said to be part of one BSS.

Extended Service Set (ESS)

Some WLANs are large enough to require multiple access points. A group of access points connected to the same WLAN are known as an ESS. Within an ESS, a client can associate with any one of many access points that use the same Extended service set identifier (ESSID). That allows users to roam about an office without losing wireless connection.

IEEE 802.11 standard

A family of standards that defines the physical layers (PHY) and the Media Access Control (MAC) layer.

* IEEE 802.11a: 54 Mbps in the 5.7 GHz ISM band
* IEEE 802.11b: 11 Mbps in the 2.4 GHz ISM band
* IEEE 802.11g: 54 Mbps in the 2.4 GHz ISM band
* IEEE 802.11i: security. The IEEE initiated the 802.11i project to overcome the problem of WEP (which has many ﬂaws and it could be exploited easily)
* IEEE 802.11e: QoS
* IEEE 802.11f: Inter Access Point Protocol (IAPP)

More information about 802.11i:

The new security standard, 802.11i, which was ratified in June 2004, fixes all WEP weaknesses. It is divided into three main categories:

1. Temporary Key Integrity Protocol (TKIP) is a short-term solution that fixes all WEP weaknesses. TKIP can be used with old 802.11 equipment (after a driver/firmware upgrade) and provides integrity and confidentiality.
2. Counter Mode with CBC-MAC Protocol (CCMP) [RFC2610] is a new protocol, designed from ground up. It uses AES as its cryptographic algorithm, and, since this is more CPU intensive than RC4 (used in WEP and TKIP), new 802.11 hardware may be required. Some drivers can implement CCMP in software. CCMP provides integrity and confidentiality.
3. 802.1X Port-Based Network Access Control: Either when using TKIP or CCMP, 802.1X is used for authentication.

Wireless Access Points

There are two categories of Wireless Access Points (WAPs):
* Autonomous WAPs
* Lightweight WAPs (LWAPs)

Autonomous WAPs operate independently, and each contains its own configuration file and security policy. Autonomous WAPs suffer from scalability issues in enterprise environments, as a large number of independent WAPs can quickly become difficult to manage.

Lightweight WAPs (LWAPs) are centrally controlled using one or more Wireless LAN Controllers (WLCs), providing a more scalable solution than Autonomous WAPs.

Encryption

Encryption is the process of changing data into a form that can be read only by the intended receiver. To decipher the message, the receiver of the encrypted data must have the proper decryption key (password).

TKIP

TKIP stands for Temporal Key Integrity Protocol. It is basically a patch for the weakness found in WEP. The problem with the original WEP is that an attacker could recover your key after observing a relatively small amount of your traffic. TKIP addresses that problem by automatically negotiating a new key every few minutes — effectively never giving an attacker enough data to break a key. Both WEP and WPA-TKIP use the RC4 stream cipher.

TKIP Session Key

* Different for every pair
* Different for every station
* Generated for each session
* Derived from a “seed” called the passphrase

AES

AES stands for Advanced Encryption Standard and is a totally separate cipher system. It is a 128-bit, 192-bit, or 256-bit block cipher and is considered the gold standard of encryption systems today. AES takes more computing power to run so small devices like Nintendo DS don’t have it, but is the most secure option you can pick for your wireless network.

EAP

Extensible Authentication Protocol (EAP) [RFC 3748] is just the transport protocol optimized for authentication, not the authentication method itself:

” EAP is an authentication framework which supports multiple authentication methods. EAP typically runs directly over data link layers such as Point-to-Point Protocol (PPP) or IEEE 802, without requiring IP. EAP provides its own support for duplicate elimination and retransmission, but is reliant on lower layer ordering guarantees. Fragmentation is not supported within EAP itself; however, individual EAP methods may support this.” — RFC 3748, page 3

Some of the most-used EAP authentication mechanism are listed below:

* EAP-MD5: MD5-Challenge requires username/password, and is equivalent to the PPP CHAP protocol [RFC1994]. This method does not provide dictionary attack resistance, mutual authentication, or key derivation, and has therefore little use in a wireless authentication enviroment.
* Lightweight EAP (LEAP): A username/password combination is sent to a Authentication Server (RADIUS) for authentication. Leap is a proprietary protocol developed by Cisco, and is not considered secure. Cisco is phasing out LEAP in favor of PEAP.
* EAP-TLS: Creates a TLS session within EAP, between the Supplicant and the Authentication Server. Both the server and the client(s) need a valid (x509) certificate, and therefore a PKI. This method provides authentication both ways.
* EAP-TTLS: Sets up a encrypted TLS-tunnel for safe transport of authentication data. Within the TLS tunnel, (any) other authentication methods may be used. Developed by Funk Software and Meetinghouse, and is currently an IETF draft.
*EAP-FAST: Provides a way to ensure the same level of security as EAP-TLS, but without the need to manage certificates on the client or server side. To achieve this, the same AAA server on which the authentication will occur generates the client credential, called the Protected Access Credential (PAC).
* Protected EAP (PEAP): Uses, as EAP-TTLS, an encrypted TLS-tunnel. Supplicant certificates for both EAP-TTLS and EAP-PEAP are optional, but server (AS) certificates are required. Developed by Microsoft, Cisco, and RSA Security, and is currently an IETF draft.
* EAP-MSCHAPv2: Requires username/password, and is basically an EAP encapsulation of MS-CHAP-v2 [RFC2759]. Usually used inside of a PEAP-encrypted tunnel. Developed by Microsoft, and is currently an IETF draft.

RADIUS

Remote Authentication Dial-In User Service (RADIUS) is defined in [RFC2865] (with friends), and was primarily used by ISPs who authenticated username and password before the user got authorized to use the ISP’s network.

802.1X does not specify what kind of back-end authentication server must be present, but RADIUS is the “de-facto” back-end authentication server used in 802.1X.

Roaming

Roaming is the movement of a client from one AP to another while still transmitting. Roaming can be done across different mobility groups, but must remain inside the same mobility domain. There are 2 types of roaming:

A client roaming from AP1 to AP2. These two APs are in the same mobility group and mobility domain

Roaming in the same Mobility Group

A client roaming from AP1 to AP2. These two APs are in different mobility groups but in the same mobility domain

Monday, February 25, 2013

Wireless 101 - Part 1

In this article we will discuss about Wireless technologies mentioned in CCNA.

Wireless LAN (WLAN) is very popular nowadays. Maybe you have ever used some wireless applications on your laptop or cellphone. Wireless LANs enable users to communicate without the need of cable. Below is an example of a simple WLAN:

Each WLAN network needs a wireless Access Point (AP) to transmit and receive data from users. Unlike a wired network which operates at full-duplex (send and receive at the same time), a wireless network operates at half-duplex so sometimes an AP is referred as a Wireless Hub.

The major difference between wired LAN and WLAN is WLAN transmits data by radiating energy waves, called radio waves, instead of transmitting electrical signals over a cable.

Also, WLAN uses CSMA/CA (Carrier Sense Multiple Access with Collision Avoidance) instead of CSMA/CD for media access. WLAN can’t use CSMA/CD as a sending device can’t transmit and receive data at the same time. CSMA/CA operates as follows:

+ Listen to ensure the media is free. If it is free, set a random time before sending data
+ When the random time has passed, listen again. If the media is free, send the data. If not, set another random time again
+ Wait for an acknowledgment that data has been sent successfully
+ If no acknowledgment is received, resend the data

IEEE 802.11 standards:

Nowadays there are three organizations influencing WLAN standards. They are:

+ ITU-R: is responsible for allocation of the RF bands
+ IEEE: specifies how RF is modulated to transfer data
+ Wi-Fi Alliance: improves the interoperability of wireless products among vendors

But the most popular type of wireless LAN today is based on the IEEE 802.11 standard, which is known informally as Wi-Fi.

* 802.11a: operates in the 5.7 GHz ISM band. Maximum transmission speed is 54Mbps and approximate wireless range is 25-75 feet indoors.
* 802.11b: operates in the 2.4 GHz ISM band. Maximum transmission speed is 11Mbps and approximate wireless range is 100-200 feet indoors.
* 802/11g: operates in the 2.4 GHz ISM band. Maximum transmission speed is 54Mbps and approximate wireless range is 100-200 feet indoors.

ISM Band: The ISM (Industrial, Scientific and Medical) band, which is controlled by the FCC in the US, generally requires licensing for various spectrum use. To accommodate wireless LAN’s, the FCC has set aside bandwidth for unlicensed use including the 2.4Ghz spectrum where many WLAN products operate.

Wi-Fi: stands for Wireless Fidelity and is used to define any of the IEEE 802.11 wireless standards. The term Wi-Fi was created by the Wireless Ethernet Compatibility Alliance (WECA). Products certified as Wi-Fi compliant are interoperable with each other even if they are made by different manufacturers.

Access points can support several or all of the three most popular IEEE WLAN standards including 802.11a, 802.11b and 802.11g.

WLAN Modes:

WLAN has two basic modes of operation:

* Ad-hoc mode: In this mode devices send data directly to each other without an AP.

* Infrastructure mode: Connect to a wired LAN, supports two modes (service sets):

+ Basic Service Set (BSS): uses only a single AP to create a WLAN
+ Extended Service Set (ESS): uses more than one AP to create a WLAN, allows roaming in a larger area than a single AP. Usually there is an overlapped area between two APs to support roaming. The overlapped area should be more than 10% (from 10% to 15%) to allow users moving between two APs without losing their connections (called roaming). The two adjacent APs should use non-overlapping channels to avoid interference. The most popular non-overlapping channels are channels 1, 6 and 11 (will be explained later).

Roaming: The ability to use a wireless device and be able to move from one access point’s range to another without losing the connection.

When configuring ESS, each of the APs should be configured with the same Service Set Identifier (SSID) to support roaming function. SSID is the unique name shared among all devices on the same wireless network. In public places, SSID is set on the AP and broadcasts to all the wireless devices in range. SSIDs are case sensitive text strings and have a maximum length of 32 characters. SSID is also the minimum requirement for a WLAN to operate. In most Linksys APs (a product of Cisco), the default SSID is “linksys”.

Wireless Encoding

When a wireless device sends data, there are some ways to encode the radio signal including frequency, amplitude & phase.

Frequency Hopping Spread Spectrum(FHSS): uses all frequencies in the band, hopping to different ones after fixed time intervals. Of course the next frequency must be predetermined by the transmitter and receiver.

The main idea of this method is signals sent on different frequencies will be received at different levels of quality. By hopping to different frequencies, signals will be greatly improved the possibility that most of it will get through. For example, suppose there is another device using the 150-250 kHz range. If our device transmits in this range then the signals will be significantly interfered. By hopping at different frequencies, there is only a small interference while transmitting and it is acceptable.

Direct Sequence Spread Spectrum (DSSS): This method transmits the signal over a wider frequency band than required by multiplying the original user data with a pseudo random spreading code. The result is a wide-band signal which is very “durable” to noise. Even some bits in this signal are damaged during transmission, some statistical techniques can recover the original data without the need for retransmission.

Note: Spread spectrum here means the bandwidth used to transfer data is much wider than the bandwidth needs to transfer that data.

Traditional communication systems use narrowband signal to transfer data because the required bandwidth is minimum but the signal must have high power to cope with noise. Spread Spectrum does the opposite way when transmitting the signal with much lower power level (can transmit below the noise level) but with much wider bandwidth. Even if the noise affects some parts of the signal, the receiver can easily recover the original data with some algorithms.

Now you understand the basic concept of DSSS. Let’s discuss about the use of DSS in the 2.4 GHz unlicensed band.

The 2.4 GHz band has a bandwidth of 82 MHz, with a range from 2.402 GHz to 2.483 GHz. In the USA, this band has 11 different overlapping DSSS channels while in some other countries it can have up to 14 channels. Channels 1, 6 and 11 have least interference with each other so they are preferred over other channels.

Orthogonal Division Multiplexing (OFDM): encodes a single transmission into multiple sub-carriers to save bandwidth. OFDM selects channels that overlap but do not interfere with each other by selecting the frequencies of the subcarriers so that at each subcarrier frequency, all other subcarriers do not contribute to overall waveform.

In the picture below, notice that only the peaks of each subcarrier carry data. At the peak of each of the subcarriers, the other two subcarriers have zero amplitude.

Below is a summary of the encoding classes which are used popularly in WLAN.

Encoding	Used by
FHSS	The original 802.11 WLAN standards used FHSS, but the current standards (802.11a, 802.11b, and 802.11g) do not
DSSS	802.11b
OFDM	802.11a, 802.11g, 802.11n

WLAN Security Standards

Security is one of the most concerns of people deploying a WLAN so we should grasp them.

Wired Equivalent Privacy (WEP)

WEP is the original security protocol defined in the 802.11b standard so it is very weak comparing to newer security protocols nowadays.

WEP is based on the RC4 encryption algorithm, with a secret key of 40 bits or 104 bits being combined with a 24-bit Initialisation Vector (IV) to encrypt the data (so sometimes you will hear “64-bit” or “128-bit” WEP key). But RC4 in WEP has been found to have weak keys and can be cracked easily within minutes so it is not popular nowadays.

The weak points of WEP is the IV is too small and the secret key is static (the same key is used for both encryption and decryption in the whole communication and never expires).

Wi-Fi Protected Access (WPA)

In 2003, the Wi-Fi Alliance developed WPA to address WEP’s weaknesses. Perhaps one of the most important improvements of WPA is the Temporal Key Integrity Protocol (TKIP) encryption, which changes the encryption key dynamically for each data transmission. While still utilizing RC4 encryption, TKIP utilizes a temporal encryption key that is regularly renewed, making it more difficult for a key to be stolen. In addition, data integrity was improved through the use of the more robust hashing mechanism, the Michael Message Integrity Check (MMIC).

In general, WPA still uses RC4 encryption which is considered an insecure algorithm so many people viewed WPA as a temporary solution for a new security standard to be released (WPA2).

Wi-Fi Protected Access 2 (WPA2)

In 2004, the Wi-Fi Alliance updated the WPA specification by replacing the RC4 encryption algorithm with Advanced Encryption Standard-Counter with CBC-MAC (AES-CCMP), calling the new standard WPA2. AES is much stronger than the RC4 encryption but it requires modern hardware.

Standard	Key Distribution	Encryption
WEP	Static Pre-Shared	Weak
WPA	Dynamic	TKIP
WPA2	Both (Static & Dynamic)	AES

Wireless Interference

The 2.4 GHz & 5 GHz spectrum bands are unlicensed so many applications and devices operate on it, which cause interference. Below is a quick view of the devices operating in these bands:

+ Cordless phones: operate on 3 frequencies, 900 MHz, 2.4 GHz, and 5 GHz. As you can realize, 2.4 GHz and 5 GHz are the frequency bands of 802.11b/g and 802.11a wireless LANs.

Most of the cordless phones nowadays operate in 2.4 GHz band and they use frequency hopping spread spectrum (FHSS) technology. As explained above, FHSS uses all frequencies in the the entire 2.4 GHz spectrum while 802.11b/g uses DSSS which operates in about 1/3 of the 2.4 GHz band (1 channel) so the use of the cordless phones can cause significant interference to your WLAN.

An example of cordless phone

+ Bluetooth: same as cordless phone, Bluetooth devices also operate in the 2.4 GHz band with FHSS technology. Fortunately, Bluetooth does not cause as much trouble as cordless phone because it usually transfers data in a short time (for example you copy some files from your laptop to your cellphone via Bluetooth) within short range. Moreover, from version 1.2 Bluetooth defined the adaptive frequency hopping (AFH) algorithm. This algorithm allows Bluetooth devices to periodically listen and mark channels as good, bad, or unknown so it helps reduce the interference with our WLAN.

+ Microwaves (mostly from oven): do not transmit data but emit high RF power and heating energy. The magnetron tubes used in the microwave ovens radiate a continuous-wave-like at frequencies close to 2.45 GHz (the center burst frequency is around 2.45 – 2.46 GHz) so they can interfere with the WLAN.

+ Antenna: There are a number of 2.4 GHz antennas on the market today so they can interfere with your wireless network.

+ Metal materials or materials that conduct electricity deflect Wi-Fi signals and create blind spots in your coverage. Some of examples are metal siding and decorative metal plates.

+ Game controller, Digital Video Monitor, Wireless Video Camera, Wireless USB may also operate at 2.4 GHz and cause interference too.

Network Enhancers - "Delivering Beyond Boundaries" Headline Animator