2/16/00 Dear Colleagues, Yesterday, I entertained a visit from Don Wassem of Giganet. Here is a summary. Don is a nice guy. Giganet is a relatively small company which produces gigabit class network product, primarily for Intel hardware. Product ------- The giganet product includes both hardware and software components, although they are not sold separately. The hardware consists of network interface cards (NICs), cables, and switches to provide 1.25 Gbps, "non-blocking", connections for a private network. The technology is copper, although an optical option is available for long cable lengths. The products are branded by the name "cLAN". The switches are either 8 port or 32 port. If you have more nodes than fit in their largest switch then Giganet provides more switches to form what Don called a "fat tree". Basically, by adding an intermediate layer(s) of switches one can connect more ports than fit on a single switch. Don claimed that this scheme is non-blocking, i.e. it is equivalent to a wider full bandwidth switch. It seemed plausible when he said it, but in retrospect I have some mild reservations. The software consists of drivers and libraries for VI ("Virtual Interface"?), which would replace the TCP/IP stack and provide a relatively direct path between applications and the hardware. This software solution has the benefit of reducing OS induced latency in message passing between nodes. Don showed some graphs (which I think are on their web site) which show a factor of 2 improvement in latency for VI giganet over TCP/IP gigabit ethernet for small messages. VI drivers are available for both LINUX and NT. I think I was looking at an NT benchmark. We should ask for similar benchmarks for LINUX, which may be leaner in its TCP/IP implementation. The NICs are intelligent, and in combination with the VI are supposed to reduce the CPU load to handle the network traffic. I believe that in regular ethernet the CPU has to not only set up the message, but also has to handle the read/write of the message data to/from memory. I believe that the cLAN cards can do direct memory access. As an indicator of the scope of this problem, Don quoted a fast ethernet application where 80% of CPU time went to handling the communication, but dropped to 5-10% CPU utilization for cLAN with VI. I believe that other gigabit solutions do similar DMA tricks, so I am not sure if this is a benefit over other gigabit solutions, or just in comparison with fast ethernet. Altough this looks impressive, it is not clear if this will have a significant effect on application performance. If the application is communication bound, then the CPU could do something else in principle, but unless the nodes are task switching, it is not clear what they would be doing but waiting for communication to finish anyway. At the cross-over from CPU to communication limited applications, I guess CPU utilization can affect things by a factor of 2, and giganet would give some of that back. Performance ----------- So, how well does it work? Don showed some benchmark results from a Dell presentation. Several applications were run with a number of configurations. The network options were Giganet, fast ethernet, and a gigabit ethernet solution. The nodes were Intel Xeon configurations with 1, 2, or 4 processors per node. The number of nodes varied, I think, between 2 and 8, although there might have been some other sizes. In general, 2 and 4 processor nodes are not as efficient. Presumably, the Intel bus architecture presents a memory bandwidth bottleneck. 2 CPUs may be acceptable, but 4 seems like a waste of money for Intel Xeons. (Note: my previous comparisons were not for Xeons, but for regular Pentium III's which can only be put in dual processor configs. Xeons seem quite expensive to me, so I did not pursue them.) Some of the applications scaled reasonably with the number of nodes on all networks, some did not scale particularly well on any, and some scaled well on the giganet solution, less well on gigabit ethernet, and horribly on fast ethernet. Of course, the latter ones highlight the giganet product. Even in those applications which scaled reasonably on all networks, the gigabit solution gave the best performance, ranging from a 10% improvement to a factor of 2 better than gigabit ethernet, and more compared to fast ethernet. Which applications will we be running? My guess is some of each. There was no data showing how the good (from giganet perspective) applications continue to scale as the number of nodes increases to 16, 32... Cost ---- There are component prices, but in a large system with many switches Giganet would be pricing at about $2000 per port (cost benefit discussion below). It ws not clear whether this was kind of a final target, or a starting point from which we could apply educational discounts, marketing incentives, etc. We discussed a couple of variations for how things could be purchased. Pretty clearly, the preferred model from Don's perspective would be to purchase our whole system through a vendor, who would provide Giganet as part of the package. It was not clear if a direct purchase from Giganet was even possible. Cost/Benefit (editorial) ------------------------ There are clear benefits of gigabit networking, and Giganet in particular, on some benchmarks. However, it is far from obvious that there is a compelling case for using Giganet on our cluster. First, there is the question of the mix of applications and the overall benefit of gigabit in general. Second, even if we decide we need gigabit, Giganet may not be the best solution for us. At the beginning of this summary, I stated that Giganet is primarily an Intel solution. The company is not supporting Alpha implementations, although Don implied there were a couple with home grown drivers. I believe there is no Sun option at all, although we did not explicitly discuss it. At $2000 per node, Giganet would roughly double the targetted x-86 node cost, but to be fair, we should get a quote from a vendor that can put together a complete package for us. For comparison, we have a quote for a myrinet network configuration from DCG which is approximately $1700 per port, costs from Microway are a bit higher. Although the cost seem similar, on the Alpha systems each port would be supporting a $10000 dual processor Alpha and would therefore represent approximately a 20% increase in cost. From the description of the quoted hardware, this would also be some sort of fat tree, but perhaps not as fat as the one Don described to me. I do not have any information that would allow for a performance comparison of myrinet and giganet. Product Demo ------------ Due to the uncertainty in how much typical applications might improve, Don and I discussed an interesting possibility, namely to loan us a 16 port solution to place onto "Baby Beowulf" for a trial period. My sense is that this would be very worthwhile, both for us and for Giganet. We would acquire useful information towards a final purchase. Giganet would potentially acquire application benchmark information that could be in used in marketing their product. A difficulty is timing. My sense is we are not quite ready to play this kind of game with Wulfie. It would likely be a month or more before there are half a dozen apps ready to run useful tests. At the same time we have been talking about a making a purchasing decision for MRI on that time scale or shorter. Summary ------- Giganet offers improved networking for Intel platforms, but at significant cost. Cost/benefit analysis is unclear at this point. Other platforms would not be supported by Giganet. Don Wassen, our contact, is interested in further exploring how our applications may benefit from Giganet technology. I believe he is interested in visiting again to discuss user applications in more detail, and the possiblity of a loaner network to measure performance is a possibility. Please direct comments to Bill or me, or to the group at large.