Metering TCP Flows via SNMP

Jaime Chambron Scient

Alix Marchandise-Franquet Alta Vista

Quality of Service (QoS) has become an issue both in research and the marketplace. Not only do they want to provide better service to network users, but they also want to monitor network usage, so to help create guarantees on network quality per service level. An issue with metering network performance is the potential need of ISPs to share information.

In the fall of 1998 we completed a specially designed networking course at Harvard focusing on QoS. This course brought together experts from Carnegie Mellon (via telecasted classes every week), the Harvard computer science department and the Harvard Business School to explain some of the technical concepts behind Differentiated Service (DiffServ) and Truncation and present us with case studies on how technical companies like Microsoft and Yahoo! started their businesses. In this course we focused on DiffServ, and with the help of Harvard professor HT Kung and Harvard graduate student teaching fellow Mark Gaynor, we completed experiments on our own method for metering network usage through an SNMP server.

In this article we present our method and results. We will take you step by step through how we laid out our SNMP server, how it avoids the issue of ISP collaboration, what our server measures, our experimentation setup, and our thoughts for this server’s use and future QoS.

THE PROBLEM WE TACKLED

Metering TCP flows is important to network performance, maintenance and research, for it provides information on the quality of the service. Some researchers have begun looking into metering differentiated services traffic, as this is an important part of service allocation as described in [Alloc97]. One could meter bandwidth, the number of packets dropped, RTT and DiffServ specific information, such as use of the network within each level of service. However, some of this data is very difficult to obtain without collaboration of all the actors within the network. There are several ways that this information could be accessed, such as through a Web page or via SNMP.

In this project we decided to implement an SNMP agent-manager system. We were able to test our SNMP server on an experimental network. Our SNMP agents are located at edge nodes and are transparent to the rest of the network. We referred to [RFC1157] and [RFC1303] in creating our SNMP server to meet the standards. Because our agent is to be transparent, no collaboration is possible in this scenario. Some information that we would like to record about the network will then be "hard" to determine and new methods will have to be elaborated.

One "hard" piece of information gathered by our SNMP agent and recorded by our SNMP manager is the number of packets dropped in a TCP connection. Even though TCP has a way to allow the sender to know when a data packet was dropped, it is hard for one point on the network to find the number of retransmitted and ACK packets being dropped. This is important because dropped packets on a network cause loss of throughput and timeouts, thus affecting performance. Researchers would want to know this information in helping to create better algorithms for TCP. Also, buyers and sellers of network services want to be able to give guarantees on the loss rate. Sellers also need to prove that DiffServ's premium service provides the best QoS, including a smaller packet loss rate.

Thus, we created an algorithm to help estimate dropped data packets on a network. To help in developing our tests of this algorithm we referred to [Basu]. Note that a majority of the network experiments conducted at most institutions have been on simulated networks, which differs from the experimental network we were able to use.

 

THE EXPERIMENTATION SETUP

Network configuration and tools used

We used an experimental network at Harvard specifically built for this project using six PCs running Free BSD version 2.2.7 with a kernel modified here at Harvard by Mark Gaynor. Each computer was equipped with an AMD K6 300 MHz processor and 32 MB of memory. The PCs were networked on 10Mbps links through an Intel switch. Our SNMP agent code was incorporated in traffic conditioner code from Nortel that used BPF and classified TCP traffic per flow. Such a classification is necessary to implement differentiated services, as described in [Arch98]. To establish TCP connections and send packets we used the stcp/rtcp program, which was created at Harvard. We used tcpdump, netstat, and tools built at Harvard to help analyze our experimental data.

Metering bandwidth

To test our implementation of bandwidth estimation and to prove that our SNMP server functions properly, we ran the following experiment: we sent N greedy TCP flows, using stcp/rtcp from the sender, router 5, through the SNMP agent to the receiver, router 2 (Figure 1). During the experiment, the SNMP manager queried the SNMP agent every second to obtain and record the value of the bandwidth. We used the information provided by stcp/rtcp on router 2 (which was the bandwidth seen at router 2 every second) to determine the actual bandwidth.

Figure 1: Configuration of a link with our invisible SNMP agent

Estimating packet drops

There are two different ways of gathering the data necessary to estimate the number of drops. The first approach is to have the sender and the receiver talk to a third party about what each has seen (Figure 2). The third party (in our case, the SNMP agent) would then use the data to estimate the number of dropped packets. However, there is a major flaw in this method: it requires collaboration of all parties, which can be hard to obtain.

Figure 2: Third party configuration

We have taken a different approach to estimate the number of packets dropped, as shown in Figure 3. Our method does not necessitate any active sharing of information from the sender nor the receiver. The third party (our SNMP agent) monitors the traffic transparently.

Figure 3: Invisible agent configuration

However, we needed to put a limitation on the network configuration. There should be no alternate routes, i.e. all the data and ACK packets must go through our SNMP agent. Figure 4 displays an example of an invalid configuration for our testing purposes. (Note: in our discussion section we will explain how this scenario will be possible using multiple agents).

Figure 4: Invalid network configuration

Algorithms for estimating packet drops We used two different algorithms to estimate the number of dropped data packets: the gap method and the retransmission method. The combination of these estimates gave us a range for the number of packets dropped. In both approaches, we made the assumption that there are no out-of-order packets because it is not possible to distinguish between a retransmit and an out-of-order packet without accounting for time. The first algorithm, referred to as the "Gap Method", estimates the number of missing packets by looking for sequence number jumps in the data packets going through the agent. On the other hand, the second algorithm, referred to as the "Retransmission Method", focuses on the number of packets retransmitted to estimate the number of packets dropped (please refer to appendix section 6.1 to view detailed algorithm and explanation).

Local tests We first ran tests to make sure our two bounds were correctly keeping track of what was happening on the network. This was accomplished with one flow running on our local network setup, from our sender, router 5, through our agent, router 3, to the receiver, router 2. Thus, we had no competing flows and no congestion, so no packets could be lost on the network link itself.

Then, we tested our dropped packet estimation implementations using the setup of our experimental network shown in Figure 5.

Figure 5: Experimental network setup for metering drops

We sent 2N greedy TCP flows using stcp/rtcp: N from router 5 and N from router 6. This caused congestion on router 2 and therefore packet drops. Our SNMP agent estimated the number of drops and our SNMP manager recorded it by constantly querying the SNMP agent every second. We determined the actual number of drops from tcpdump and netstat information gathered on routers 5, 6, and 1.

Remote experiments Our remote experiments consisted of sending one greedy TCP flow using stcp/rtcp from a machine on ISDNet, France's network, to a router on our experimental network at Harvard, going through our SNMP agent (Figure 6).

Figure 6: Remote experiment network configuration

As with the local tests, the estimated number of dropped packets was computed by our SNMP agent and recorded by our SNMP manager. Also, we determined the actual number of drops from tcpdump and netstat information gathered on the sender and the receiver.

OUR RESULTS

Bandwidth tests

Figure 7 displays the results of one run of the experiment described in section 2.2. It shows the estimated smoothed bandwidth at the SNMP agent and the actual bandwidth recorded at the destination, router 2. Results were off slightly because the SNMP agent and stcp/rtcp program running at router 2 were not recording the flow bandwidth at the exact same moment, but were both recording on 1second intervals.

Figure 7: Estimated bandwidth at SNMP agent vs. actual bandwidth at destination router 2

Drop packet tests

After running numerous local and remote tests, we were able to show how the two bounds in our algorithm do keep a rough estimate of the number of dropped packets. Also, we showed that characteristics about our estimate bounds that we hypothesized, being one bound is an equivalent to the number of retransmitted packets and the other bound being equivalent to the minimal number of packets in a sequence gap, do hold true.

Local tests We first ran our one flow test. The data collected from these tests prove that the algorithm we encoded works. The retransmit bound on every run was equal to the actual number of packets retransmitted by the sender. A further explanation and data sample from these experiments are in appendix section 6.2.1.

The results of our local experiments with congestion conclude that both of our bounds provide a reasonable estimate of the actual number of dropped packets. Figure 8 displays the actual number of packets dropped and the estimate gap and retransmit bounds for one test run. This was calculated by looking at the number of packets dropped for every X number of packets received. As shown in this figure, and as true in other test runs, the retransmit bound was larger than the gap bound estimate and the actual number of packets dropped. This retransmit estimate was typically 20 - 30% higher than the actual number of packets dropped. This is either because of TCP's window allowing for more packets to be resent than dropped or because of bytes from the originally dropped packet being resent in more than one packet. However, this retransmit estimate was off by 0-5 packets (up to 2% off) from the actual number of packets resent from routers 5 and 6, and this is due to retransmit packets being dropped that the agent does not see.

Also, Figure 8 shows that the gap bound is extremely close to the actual number of packets dropped. In most of the other test cases the gap bound was off 1% from the actual number of packets dropped (one case showed that the gap bound was 10% higher). The gap bound provides a good estimate in our experimental network because there are no packets out of order. More data from these tests is available in appendix section 6.2.1.

Figure 8: Gap, retransmit and actual bounds for 6:23 experiment from routers 5 and 6 to 1

Remote tests Lastly, we ran our remote tests from France to our Harvard experimental network. It should be noted that in all of these tests bandwidth was extremely low (around 10Kbps). One test run is graphed in figure 9. As with the local tests, this was calculated by looking at the number of packets dropped for every X number of packets received. In this particular test case, both of the bounds were extremely close to the actual number of packets dropped. After closely examining other data recorded from this test (such as tcpdump and netstat) we saw that no packets were out of order. This explains why the gap bound was a good extimate for this case. The retransmit bound was extremely close as well, and no packets out of order was part of the reason. Another reason is because few, if any, packets were resent in more than one packet. This bound is lower because it may not have seen a dropped retransmit. Also, because the bandwidth is so low, thus causing a smaller window size from what we saw on the experimental network, very few, if any, extra packets were retransmitted.

In appendix section 6.2.2 is a chart containing more data from the multiple test runs. All of these tests produce gap and retransmit bounds that are anywhere from 0 to +-7 packets off from the actual number of packets dropped. The gap bound is slightly higher in some cases because it does see some packets out of order, and thus thinks a gap occurred. And in the cases where the retransmit bound it higher, it is possible that either a few extra packets were retransmitted or a packet's bytes were retransmitted in more than one packet. It should be noted that in all cases the retransmit bound is lower than the actual number of retransmitted packets seen in netstat at the French router. This is simply because the French router is sending data, and thus retransmitting data, to other receivers than just our experimental network. More data and analysis is still needed on these remote tests, because we are now looking at more complicated scenarios outside of our experimental network.

Figure 9: Gap, retransmit and actual bounds for 7:15 remote experiment

CHALLENGES WE FORESEE

The main challenges of this project resided in estimating the number of dropped packets. This is due to the nature of TCP. Fast retransmit fast recovery implies that when a loss occurs, the sender's window size is divided by two. Therefore, bytes that were sent together the first time are often retransmitted in different packets. Since our SNMP agent only sees the retransmits, it is hard to determine how many packets were originally lost because that number does not match the number of packets retransmitted. Also, some losses are harder to detect, such as retransmit and ACK loss. Retransmit loss is hard to detect because of the following. First, a packet may be retransmitted numerous times, but always dropped before the SNMP agent sees the retransmit. Second, if this packet is dropped after the SNMP agent sees it, this packet could still potentially be lost before reaching the receiver, but the receiver may have previously received a packet containing those bytes, and will thus ACK those missing bytes. ACK loss is hard to detect because of the different implementations of TCP that currently exist (where ACKs could be sent for every packet received, or every other packet received, or with the delayed ACK scheme). Since our agent does not know which implementation of TCP is running on the receiver, it can't make any assumptions on when to expect an ACK.

We just mentioned that determining ACK loss is hard. Thus to begin thinking about providing an ACK loss estimate, we have focused on the loss of data packets. This is because we are assuming that the ACK loss is proportional to the data loss on a given link in a given direction since the link is congested the same way for all packets.

Also, in section 2.3.2 we made an assumption that no packets can be out-of-order. This assumption is very unrealistic for a network. We believed that our remote testing would provide us with out-of-order packets, but because of the link we ended up establishing, few, if any, packets were out of order. Thus, we need to set up a remote experiment with high bandwidth, which also possesses multiple routes to send data on, so that we can see the effects of out-of-order packets to our estimate bounds.

Another challenge pertains to where the SNMP agent resides on the network. As noted in section 2.3.1, we placed a limitation on the network configuration that may not be realistic (refer to figure 4). This was necessary because we needed to make sure that our agent would see all the data and ACK packets, otherwise the estimation of drops would not be possible. However, it is still feasible to use our SNMP server by placing an SNMP agent at each edge node and allowing the SNMP manager to centralize the information gathered at the SNMP agents (figure 10). This will create more challenges for us, such as needing to send more information to the manager, not having real-time data, and allowing the manager to determine when it has received all information from all of its agents for a specific query.

Figure 10: Centralized SNMP manager configuration

Lastly, the transparency of our SNMP agent is insured by its design. This is because we used a traffic conditioner from Nortel that uses BPF and is shown not to slow down the traffic nor drop any packets. However, we were only able to run these tests on 10Mbps links, and thus our SNMP agent may not be as transparent on faster links. Thus, we will need to experiment on a faster link (>45Mbps) and make attempts to move our SNMP agent code from user space to the kernel to help speed up the process.

WHAT WE WANT TO DO NEXT

Our main focus will be to refine our algorithms for estimating the number of packets dropped on a network. First we want to determine the ACK loss more directly by looking at actual ACK packets rather than just the data, and thus update our algorithms to incorporate ACK loss. Second we are going to modify our algorithms to take into account time: this will help determine which packets were retransmitted versus out-of-order, which retransmits were dropped, and detect time outs.

Second, we will add more functionality to our SNMP server. We will include differentiated services specific information, such as percent of traffic used per service level. Plus, our SNMP manager will be modified to enable centralized information management from multiple SNMP agents on a network.

Lastly, we will look at the transparency issue of our SNMP agent. This will require us to conduct tests on faster links and alter code, as explained in the Challenges section above.

OUR CONCLUSION

We have built an SNMP server to meter the quality of differentiated services. This server does not require collaboration among ISP providers. However, it needs to be placed at edge nodes. Currently our SNMP server meters bandwidth and data packet loss information per TCP flow. Metering bandwidth is quite simple, whereas metering the number of data packets dropped is a bit more complicated and necessitates the implementation of specific algorithms to calculate an estimation of this number. We do show that our invisible SNMP agent placed at an edge node can reasonably account for packet loss. Our agent is capable of doing this by implementing two different algorithms, one algorithm looking at the number of packets retransmitted and the other looking at jumps in byte sequence numbers.

From our tests thus far we believe that our gap estimate is a better indicator of data packet loss than our retransmit estimate. First, in all of our tests the gap algorithm was off by at most 14%, where as the retransmit algorithm was off by as much as 30%. Second, we feel that there are many more issues that directly affect the retransmit number than the gap number, such as bytes being resent in different packets, retransmitting packts that were already received, and mistaking an out-of-order packet for a retransmitted packet. The biggest concerns there are with the gap method are out-of-order packets and assuming that all packets are the same size as the MTU. However, our experiments were limited due to the setup at our disposal: limited in bandwidth, network configuration. Thus, our above local tests did not see any out-of-order packets, and we saw very few out-of-order packets with our remote tests as well, and this could have been because of the low bandwidth sent from France.

We still need to run many more tests before refining our algorithm. First, we need to test our server on a faster link. Second, remote tests that cause more packets to be out-of-order should also be performed to see how well both algorithms can deal with this. From here, we will go into refining our algorithm by first taking into consideration time. Time may help the retransmit method take care of mistaking out-of-order packets for retransmits. It could also help the gap method with out-of-order packets as well. We will also investigate if there are ways to combine the two methods to provide not just one estimate, but a better estimate. Other things we still need to look at are ACK loss and retransmission timing.

We do believe that our server properly meters the bandwidth and the number of dropped packets on a very limited setup but can be expanded and work with some restrictions. With more refinement and functionality, our SNMP server will be able to be used as a tool for research, an ISP system utility to monitor TCP traffic, and a helper mechanism to market Differentiated Services by providing guarantees on the quality of a given service level.

APPENDIX

Gap Method

In the gap algorithm, we focused on jumps in the sequence numbers of the data packets going through the SNMP agent. We kept track of the highest data packet sequence number seen so far and detected a gap whenever the current sequence number was greater than the expected one, which was one added to the highest sequence number. To estimate how many data packets a given gap contained, we divided the gap size by the Maximum Transmission Unit (MTU):

drops += (current_seq_num - highest_seq_num) / MTU
when
current_seq_num > highest_seq_num + 1

This algorithm works best when all the packets that are lost are of the size of the MTU (and holding to our assumption that no packets are out-of-order). Figure 11 presents a scenario wherethe MTU is 10 bytes, and the three packets that were dropped and retransmitted, the packets starting with bytes 21, 31, and 41, were all of the size of the MTU. What will happen here is that when the receiver gets the packet starting with byte 11, it believes that the next packet arriving will start with byte 21. However, since the packet that actually arrives starts with byte 51, the destination realizes that there was a gap in the sequence numbers of the packets received. The algorithm will then subtract 11 from 51, giving it 30, and then divide 30 by the MTU, which results in detecting a loss of 3 packets, which is correct for this case.

Figure 11: Transmission with same packets resent

However, if the size of the packets dropped is not equal to the MTU (the packets dropped might be of smaller size than the MTU), the algorithm will underestimate the number of packets dropped. Figure 12 illustrates such a scenario: four of the packets dropped between 11 and 51 are each of size 5 and one of size 10. As previously, our algorithm will determine that 3 packets were lost, not five.

Figure 12: Transmission with MTU causing a miscalculation

Thus, if our assumption that no packets are out-of-order holds, the gap algorithm will always provide an estimate that is equal to or less than the actual number of packets dropped. However, if this assumption does not hold, as shown in figure 13, then the gap algorithm will calculate that packets were dropped, although they were not dropped but just out-of-order. Thus, this algorithm overestimate the number of packets dropped when our assumption does not hold.

Figure 13: Transmission with destination receiving out-of-order packets

Retransmission Method

In this algorithm, we used retransmitted packets to estimate the number of drops. We kept track of the highest data packet sequence number seen and considered any incoming data packet with a lower sequence number to be a retransmission:

drops++
when
current_seq_num < highest_seq_num

The retransmission algorithm will correctly determine the number of packets dropped in the scenario presented in figure 11. However, with the scenario in figure 12, we see the byte issue previously mentioned, with bytes lost being retransmitted in a different number of packets from how they were originally sent. Thus, in this scenario the retransmit algorithm would see three packets with sequence numbers lower than the highest one seen so far (21, 31, 41) and thus calculate that 3 packets were dropped. However, in this case 5 packets were really dropped. Also, there is a chance with resending bytes that more packets will be used to retransmit those bytes. Thus, the retransmit estimate, even with the assumption that no packets are out-of-order, can be an under or overestimate of the actual number of packets dropped.

If our out-of-order assumption does not hold, then there are more complications with the retransmission algorithm as well. First, by referring back to figure 13, we see that when the destination router receives packets 21, 31, and 41, it will think that all three of these are retransmits since all three of these packets have sequence numbers lower than the one the receiver is expecting, 71. It is also possible, when the packets come in out-of-order, that the receiver will signal the sender to resend packets that really were not lost, and thus the retransmit estimate will be even higher. Without the out-of-order assumption, time will need to come into consideration in helping to determine if a packet really was a retransmission or just that it came in out-of-order.

Experimental Data

The following are charts of some of our data from the above local and remote experiments. It should be noted that after running numerous tests on the local network and looking at tcpdump, we found a problem with the stcp/rtcp program sending data on our machines. Basically, stcp/rtcp was creating too many packets before sending them on the wire, and thus these packets that were "dropped" even before hitting the wire and ended up being retransmitted. This caused both of our algorithms to overestimate the number of packets dropped on our local network on the limited number of tests we ran. However, there were no problems found with running the program on our remote network. This is because the system in France was generating packets slower than they were being sent.

Local Experiments

Table 1: Transmission from router 5 to 2 experimental data

Table 2: Transmissions from router 5 and 6 to 2 experimental data

Remote Experiments

Table 3: Remote test experimental data

REFERENCES

[Arch98] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, W. Weiss, "An Architecture for Differentiated Services", Internet Draft August 1998.

[Alloc97] D. Clark, J. Wroclawski, "An Approach to Service Allocation in the Internet", Internet Draft , July 1997.

[Basu] A. Basu, Z. Wang, "A Comparative Study of Schemes for Differentiated Services"

[RFC1157] J. Case, M. Fedor, M. Schoffstall, J. Davin, "A Simple Network Management Protocol (SNMP)", Internet RFC 1157, May 1990.

[RFC1303] K. McCloghrie, M. Rose. "A Convention for Describing SNMP-based Agents", Internet RFC 1303, February 1992.

 

Jaime Chambron is a developer for Scient. She graduated 1999 from Harvard with a Bachelors in Computer Science. She can be contacted at chambron@post.harvard.edu

Alix Marchandise-Franquet is a developer for Alta Vista. She graduated 1999 from Harvard with a Bachelors in Computer Science. She can be contacted at alix@panhard.com