hehe, ok, ethernet 101
Ethernet is a collision avoidance network scheme. The way it works, machine A sends traffic to Machine B, Machine C, D, E, F, G all hear those broadcasts, but, ignore the traffic because it isn't destined for them. If Machine A and Machine C send traffic at the same time, each will wait a random amount of time and retransmit -- until the packet goes through. This is a half-duplex circuit. Half-duplex = bad.
So, the engineers said, lets develop a switch, but, to avoid collisions (collisons beget retries, retries = latency, latency = worse performance) we'll design a system that only sends traffic to the machine or wire that it is destined for. There are two major designs for switches, cut-through and store and forward. I think 99% of the switches today are store and forward. With either, a machine could run full duplex without having to worry about collisions between the machine and the switch.
With a cut-through switch, the switch itself would detect the collision and signal the retry, but, it was like a virtual relay that would jumper the wire so that traffic would flow directly.
Store and forward would buffer the request and send it so that the collision would be avoided.
Of course, if you have 1 internet connection and 23 machines hooked to a switch, that first port has a lot of contention, and probably does have collisions, but, overall, the rest of the network runs faster because they are not listening to traffic unless it is destined for them.
But, you need more than 23 machines, so, you buy another switch and connect it to port 2. Now you have to figure out where to send the traffic, so, each mac address does a little broadcast, and the proxy arp tells the first switch, hey, I have these 8 mac addresses and the first switch says, ok, anything destined for those 8 gets sent down port 2's wire for the other switch to handle.
Well, pretty soon, a large enough network will have more than 1024 mac addresses to contend with. The 'core' switch must then figure out where to send that traffic, but, if it exceeds the number of entries allowed in the spanning tree, it will spend quite a bit of time discovering where those machines are. That's sort of the basic theory behind it.