TCP for the Uninitiated - Part II (Expanding the Basics / Using Tools)
By: Erik Iverson
Today many of us are finding ourselves in a common Internet
connectivity situation. Either at
home or at work, we have a local area network.
Your office probably has a LAN, your home may, and for college students
in the dorms, your PC is probably on a LAN too. So just how does data get all around when you’re chatting
on IRC, reading web pages, and sending email?
What different steps are necessary, why are they necessary, how does it
all work together?
Most of our LANS are Ethernet LANS. Ethernet
is an older technology, first introduced in the 1970’s.
By no means is it state of the art, but it is still very popular. The reasons for this are speed and interoperability.
Practically any device available can be hooked up to an Ethernet network,
and when people ask if you have a “network card” they almost certainly are
referring to an Ethernet adapter. The
standard has been around long enough to gain acceptance, and even though it isn’t
that fastest technology, it still holds up well enough for today’s
applications. The standard Ethernet
speed is 10 megabits per second, but newer cards support 100 megabits per
second. Ethernet can be run across
different types of cabling. By far
the most common type today is unshielded twisted pair, or UTP.
The UTP is terminated with an RJ-45 jack, which looks similar to a normal
phone jack, but a bit wider. The
connectors have 8 pins, of which 4 are used.
Two for receiving, and two for sending. Ethernet uses unique Media Access Control (MAC) addresses to
address data. The MAC address is a
property of the actual network card, and although it can be configured, it is
very rare that you would want to change the MAC on your card.
Data on the Ethernet is addressed to the 6-byte address on the card, not
the IP address. The IP address is contained within the data, but the Ethernet
doesn’t need to know about it, only the MAC address. Ok, so how does Ethernet fit in with TCP/IP?
Pinging a Machine
Let’s just start with something simple.
I want to ping a machine. If
you don’t know what a ping is, it is a helpful and incredibly simple network
diagnostic tool. The main function
of the ping command is to check if a remote host is “alive” or able to send
and receive datagrams on a network. I
have a home network with a half dozen PC’s.
Let’s try pinging one. To
ping a machine, I must know its host name or IP address.
I know the machine’s IP address (I assigned it myself), so let’s try
pinging it and see what happens. Below
is a dump of what PING looks like on my Windows 2000 machine.
I am going to ping the IP address 192.168.0.1.
Microsoft Windows 2000 [Version 5.00.2195]
(C) Copyright 1985-1999 Microsoft Corp.
Pinging 192.168.0.1 with 32 bytes of data:
Reply from 192.168.0.1: bytes=32 time<10ms TTL=255
Reply from 192.168.0.1: bytes=32 time<10ms TTL=255
Reply from 192.168.0.1: bytes=32 time<10ms TTL=255
Reply from 192.168.0.1: bytes=32 time<10ms TTL=255
Ping statistics for 192.168.0.1:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum =
0ms, Average = 0ms
And here is a ping dump from an OpenBSD machine.
bash-2.03$ ping 192.168.0.10
PING 192.168.0.10 (192.168.0.10): 56 data bytes
64 bytes from 192.168.0.10: icmp_seq=0 ttl=128
64 bytes from 192.168.0.10: icmp_seq=1 ttl=128
64 bytes from 192.168.0.10: icmp_seq=2 ttl=128
64 bytes from 192.168.0.10: icmp_seq=3 ttl=128
--- 192.168.0.10 ping statistics ---
4 packets transmitted, 4 packets received, 0% packet
round-trip min/avg/max = 0.413/0.434/0.450 ms
So what just happened here?
Pinging a machine may seem trivial in practice, but what is going on
behind the scenes? How does data
get from my machine to the other machine, and then how does it know what to do
with that data?
Ethernet has no concept of IP addresses; any number of
different protocols can be run on top of Ethernet. It is a layer 2 (link layer) protocol and layer 1 (physical)
protocol. If you think about this,
it makes sense. You shouldn’t be
locked into having to use TCP/IP just because you use Ethernet cabling and
network cards. Novell LANs use IPX
instead of IP, IPX works just fine across Ethernet.
So how do Ethernet adapters communicate with one another?
Clearly there must be some type of addressing mechanism available.
The answer is that each adapter has a unique MAC address. A MAC address is a 6-byte (48-bit) number usually represented
in hex. All network on the Ethernet
destined to a particular machine will have that machines MAC address as the
destination in the Ethernet header. But
I didn’t tell the ping program the remote machines MAC address, so how did it
know what MAC address to send the data to if I only gave it the remote IP?
The answer is the Address Resolution Protocol, or ARP.
Address Resolution Protocol (ARP)
ARP is a simple protocol used to map physical (MAC) addresses, to network layer
(IP) addresses. It can be used to
map different addressing mechanisms too, but we’re using IP and MAC addresses,
by far the most popular use of ARP. How
it does this is quite simple. Each
machine has an ARP table that can be looked at by us, the users.
Go to a command prompt in your operating system, which would be a DOS
prompt for Windows9x users. Type
ARP at the command prompt and you’ll get a list of parameters. Most likely, all you’ll want to do is view your ARP table,
so type “arp –a” (no quotes however).
Here is what happens on my machines.
Interface: 192.168.0.10 on Interface 0x2
bash-2.03$ arp -a
24-216-87-1.hsacorp.net (126.96.36.199) at
furby.dragonmount.net (192.168.0.10) at 0:10:5a:19:cd:bc
paradise.dragonmount.net (192.168.0.11) at
lith.dragonmount.net (192.168.0.12) at 8:0:9:85:bc:c1
Ok, the output is similar.
The Internet Address column in the Windows 2000 screen tells us the
machines IP address. The Physical
Address tells us the MAC address of the Ethernet adapter.
And finally Type can either by set to dynamic or static.
Except in rare situations, you won’t see static entries in the ARP
table. Dynamic entries eventually
timeout, and will needed to be looked up again.
This is a good idea because IP addresses can change over time, and you
wouldn’t want to be sending data to the wrong MAC address.
Great, how did the information get in the ARP table though?
Well remember how we wanted to send a ping to 192.168.0.1 originally.
At first our ARP table had no entry for that IP address, so how did it
get one? It sent out an ARP
request. ARP requests are broadcast
to every machine on the network. We’re
talking about Ethernet LANs still, so the broadcast address is FF:FF:FF:FF:FF:FF.
This is a 6 byte MAC reserved for broadcast messages.
So in English, here is what my machine did.
“The user wants me to ping 192.168.0.1, so I’m going to look up the
physical (MAC) address in my ARP table. I
don’t have an entry for 192.168.0.1 in my ARP table yet, I don’t know to
which physical address to send that request, so I have to ask.
I’m going to ask every adapter out there ‘Who has the IP 192.168.0.1?’
by sending an ARP request to the broadcast address.”
Next, what happens is that the machine that has that IP
(192.168.0.1) sends an ARP reply back, saying “I have this IP address, and
here is my physical address. (MAC address)” Then my machine, using the remote
machines physical address, sends out an Ethernet frame on the wire, destined to
the remote machine’s MAC address, which has an IP datagram containing an ICMP
Ping request. All that, and a bit
more, happens just to ping a machine. I didn't go into the details of how
the data is actually sent out on the wire. Ethernet uses something called
Carrier Sense Multiple Access with Collision Detection (CSMA/CD) to regulate
what machine is "in control" of the LAN at any point in time.
The basic idea is that no two machines can be transmitting to the same LAN at
any point in time, or a collision will occur. In the case of a collision,
both machines need to transmit, and will wait a "random" amount of
time, thus the "collision detection." A machine will not attempt
to transmit if it sees that another machine is transmitting, thus the
"carrier sense." And multiple machines share the same media,
thus the "multiple access." Easy, eh? Like I said, the
example is slightly simplified; I’m assuming the data doesn't have to cross
any routers in this example. However,
in a small home or office network, they most likely wouldn’t have to.
We’ll look at an example soon that needs to leave the LAN, the
situation differs slightly.
Ok great, so where do TCP and the three-way handshake fit
into this picture? They don't in
this example. To ping a machine, we
don’t have to use TCP. Ping uses
ICMP, which is a protocol on about the same level as IP.
TCP would come into play if we were doing more than a simple ping.
Say I wanted to access a web page on the computer.
The same sequence of events would occur, except instead of sending a ping
request, I’d send the first part of the 3-way handshake described in my
previous article. From that point on, the MAC address would be mapped to
the IP in the ARP table, and would remain there until its value timed out.
When we're talking about the LAN level protocols such as Ethernet, they really
could care less if they are carrying TCP, UDP, or anything else. These
things, for the most part, only matter to the machines at either end of the
More Tools: traceroute, netstat, route
I’ve slowly been introducing various tools including ping
and arp. These are just two of many
tools provided with most operating systems for network management and
diagnostics. Another tool sure to
come in handy is traceroute on UNIX based systems, and tracert on Windows based
systems. What traceroute does is
just that, traces the route datagrams take to get from point A (your computer),
to point B (some remote computer). I’ll
show you a couple examples and you’ll get the hang of it.
Then I’ll briefly explain how traceroute does its job.
Tracing route to 192.168.0.1 over a maximum of 30 hops
1 <10 ms
<10 ms <10 ms
Tracing route to www.hampsterdance2.com [188.8.131.52]
over a maximum of 30 hops:
<10 ms <10 ms
<10 ms 192.168.0.1
10 ms 10 ms
10 ms 24-240-42-1.hsacorp.net
10 ms 10 ms
80 ms 50 ms
40 ms 512.Hssi5-0-0.GW1.MSP1.ALTER.NET
60 ms 70 ms
50 ms 151.ATM1-0.XR2.CHI4.ALTER.NET
90 ms 90 ms
40 ms 194.ATM2-0.TR2.CHI4.ALTER.NET
91 ms 80 ms
90 ms 106.ATM7-0.TR1.DCA8.ALTER.NET
90 ms 80 ms
71 ms 196.ATM5-0.XR2.TCO1.ALTER.NET
100 ms *
161 ms 192.ATM8-0-0.GW3.DCA3.ALTER.NET
70 ms 70 ms
80 ms dn4-gw.customer.ALTER.NET
91 ms 120 ms
80 ms 184.108.40.206
Ok, the first output is from me tracing to a machine on my
LAN. Notice that it only takes one
“hop” to get there, as there are no intermediate computers between Point A
and Point B. The second trace, to www.hampsterdance2.com,
is a different story. Here it takes
11 hops, meaning my data goes through 10 nodes on the Internet before reaching
hampsterdance2.com. These nodes may
be other computers, or simply routers. Each
node is queried three times, and the resulting data is showed along with the
machines host name and/or IP address. “*”’s
in the data represent lost datagrams, where either my datagram never made it to
the machine, or their reply never made it back.
Since traceroute is implemented using UPD, not TCP (a reliable protocol),
no effort to retransmit the lost packets was made. UDP is unreliable, a best
effort protocol. No acknowledgement of the data is made by the receiving
end, as in TCP. Only TCP can ensure
data delivery, at least within the protocols of the TCP/IP suite.
You might think, did an ARP request occur when trying to
access hampsterdance2.com? The
answer is yes, but not to hampsterdance2.com.
Don’t get confused. We’re
going to introduce the basics of routing tables, how to view them, how routing
decisions are made, and default gateways.
Routing is what it sounds like. Your packets are directed by routers to get to their
destination. Routers have routing
tables, and they are often very large. Complex
protocols and algorithms are used to update router’s large and ever-changing
tables. Your machine connected to
the Internet also has a routing table; it is most likely small.
Let’s look at my routing table and then explain what it means.
To get the routing table we spawn a Cmd.exe shell and type “route print”.
Windows 9x users use command.com.
Routing Table of Windows 2000 Machine.
0x1 ........................... MS TCP Loopback
0x2 ...00 10 5a 19 cd bc ...... 3Com EtherLink PCI
255.255.255.0 192.168.0.10 192.168.0.10
192.168.0.10 192.168.0.10 1
Look under Active Routes.
There are four fields. First,
at the top there is an interface list. Each
interface on a machine has one or more IP addresses associated with it.
My machine has two interfaces. One
of these is simply the loopback interface.
This is used to connect to my own machine.
Using the IP address 127.0.0.1 will connect me back to my machine, which
helps for a variety of debugging and testing reasons.
The other interface on my machine is the 3Com EtherLink PCI card, which
is my network card in my computer. You'll see the MAC address is
00:10:5A:19:CD:BC. Look back at the ARP output from the OpenBSD machine,
see how the IP address matches up with the MAC address?
When I show the OpenBSD output, you’ll see that there are two network
So here’s the situation.
There are two things that are depended on when deciding to route data. The first is the destination of the data; clearly this makes
sense. The second is a more
difficult idea to grasp, and that is subnet masking.
You have to know a little about binary representation of numbers for this
to all make sense. I’ll withhold
talking about that for now and concentrate on the Network Destination column.
Usually, traffic won’t be destined for your local LAN.
You may be sending emails to correspondents across the globe, surfing the
web, or sending ICQ messages to your friend across town, it doesn’t really
matter. The point is, traffic needs
to leave your LAN and go over the Internet.
The Network Destination of 0.0.0.0 is the default entry in the routing
table. When data needs to be sent,
and it doesn’t match up to anything in the routing table, it uses the default
route. In our example above, the default
route goes to 192.168.0.1, which is my routing machine on my LAN. When I
need to send data to hampsterdance2.com, first my machine looks up
hampsterdance2.com's IP address via the DNS system. It then determines
that to send data to hampsterdance2, the traffic has to leave the LAN.
This is because no addresses matches up in the routing table, so it uses the
default gateway. It will issue an ARP request to my default gateway, then
address the Ethernet packet to the default gateway, but the destination IP
address will not be the default gateway, it is hampsterdance2.com's. See
how the different layers work together? We'll see in a bit how and why my
OpenBSD machine, the default gateway, knows where to send the data. It has
to do with something called NAT.
Gateways are very important.
The gateway column tells the IP address of the machine to send the data
to when a match is found in the routing table.
If a match isn’t found, the default route is used, and the data is sent
to the IP address of the gateway machine for the default route.
This machine, the gateway for the 0.0.0.0 (default) destination, is
called the default gateway. You’ll
see here my default gateway is 192.168.0.1, that is the IP of my OpenBSD
machine, which indeed acts as a gateway to the Internet for my machine and the
other machines on my LAN. No
traffic goes to the Internet without passing through my OpenBSD machine, and no
traffic gets back to my LAN without first coming back through the OpenBSD
machine. Hopefully you understand
how to read the default entry in the routing table, but what about the rest of
As mentioned before, the next line is for the destination
127.0.0.1. If a packet is destined
for this address, it will use itself as the gateway.
This number is reserved and will connect you back to your own machine. I’ll
mention in a bit why this might be useful.
This line is important.
It is how I communicate with machines on my LAN.
The first field means, “this information applies to data that is
addressed to machines starting with 192.168.*.*”.
The “0”’s effectively are wild characters in the routing table.
(Note that this isn’t completely accurate, but it will suffice for this
discussion. Not every “0” is a
wild card character.) Notice how
the gateway is 192.168.0.10. That
happens to be my machine’s IP address on my private LAN.
Machines from the Internet can’t connect to this address, but on my
private LAN it works just fine. So
I’m using myself as the gateway. This
is because for data to get to another machine on my LAN, it doesn’t need to
pass between any intermediate machines, my machine simply puts the data out on
the wire addressed to the listening machine, and the machine who is interested
in the data receives it. Keep in
mind it would be doing ARP requests and actually addressing the data to the
other machines MAC address, the IP address is only used to make routing
decisions. Now let’s look at my
OpenBSD machines routing table to get the hang of it.
The command to view it on this machine is “route show”.
I believe Linux varies but I’ll leave that for you to find out.
I think the command “netstat –r” usually shows the same output,
too. The “-n” parameter causes
route not to look up hostnames for the IP addresses.
bash-2.03$ route -n show
See how there are four quasi-categories in the Destination
column. First is the default one,
which means when data doesn’t match any of the rest of the entries, send it
onward. In this case, it is being
sent to 220.127.116.11. This is my
gateway to the Internet. When I can’t
get on the Internet, I try pinging this host.
If I can’t ping it, then the problem almost certainly rests either on
my network or with that machine. If
I can ping that machine, or router as the case maybe, and still can’t reach
web sites and things like that, the problem most likely rests further away than
I can help. Pinging your default
gateway should be one of the first steps you take when diagnosing network
Ok, the other groups we can see are ones starting with
24.217.87.*. This makes sense; this
is my IP address that I have on the Internet, it is 18.104.22.168 right now.
Link#1 refers to the first network card in the machine, there are two.
You’ll see the default gateway has its own entry in the routing table,
but it is a MAC address, not an IP address.
Also notice how 22.214.171.124 uses 127.0.0.1 (the localhost) as the
interface. This is because that is
my IP on the Internet, so sending data to it is equivalent to sending to the
The next group is the local loopback entries in the routing
table. 127.0.0.1 is the localhost,
your own machine. You can ping your
own machine and connect to open ports on it.
It is a very handy IP address to know if you’re developing a network
application. You can start a server
and then connect to the server on the same machine by directing the client to
connect to localhost, or 127.0.0.1. It
is always best to actually test network applications across a real network, but
sometimes the best you can do is locally. When
working on machines not connected to the Internet, it works wonders.
The next group is the 192.168.0.* group.
This is my local network. 192.168.*.*
addresses were set aside by the IANA as private addresses.
Many organizations and home users can use these addresses, and then route
their traffic through a device that translates these addresses into usable
Internet addresses. This is the
function of the OpenBSD machine in my network; it translates private addresses
into my real Internet address. This
is called Network Address Translation (NAT).
The same functionality on Linux is obtained with ipchains.
To other machines on the Internet, my network has only one machine, the
OpenBSD one with the 126.96.36.199 IP address.
They send data back to that machine, which sends it on to my private
machines. You’ll see that there are four entries in the 192.168.0.
range. These are my four machines
on my LAN, addressable by MAC addresses. The
192.168.0.1 is the private address of my OpenBSD machine.
So really, it has two addresses, one for each network card.
With IP, get in the habit of thinking about the address as the address of
the interface, not of the machine itself.
The first network card receives all the data on the private
LAN and determines if it needs to change the IP address to the real Internet
address. It will then send the data
out on the Internet if needed, on the second network card.
It keeps track of all the connections so when data comes back, it is
addressed to the OpenBSD machine. The
program running on it then works some magic, it knows to send the data out on
the first network card to my machine on the private network.
You’ll remember my Windows 2000 machine had the private
address 192.168.0.1 as its default gateway, which is my OpenBSD machine.
All the computers on my LAN have this as the default gateway, except the
OpenBSD machine itself, which has a router owned by someone other than myself as
the default gateway.
Ok, I know, a lot of information. One more thing for this installment, and it’s easy.
Sometimes working with IP addresses is cumbersome.
Typing in 192.168.0.1, although second nature for system admin types,
takes time and may be difficult to remember.
There is a very simple solution to this; it is called a hosts file.
Windows comes with a hosts.sam (sample) file. I believe it is located in C:\windows. For Windows 2000, the file is c:\winnt\system32\drivers\etc\hosts.sam.
For all Windows, the .sam just means sample, you’ll want a file just
called hosts. For Unix like
operating systems, the file is /etc/hosts.
This file simply maps IP addresses to names that you can choose.
Here is mine on this machine.
This is why localhost is synonymous with 127.0.0.1.
You’ll see instead of pinging 192.168.0.1, I can type “ping gateway”
and get the same results. This file comes in handy; you can put any IP address you want
in there. If you visit a certain
web site quite often, make a shortcut to it in your hosts file, or just bookmark
One more program that is really, really useful is netstat.
Your homework is to use netstat, and find out what it does and learn how
to read the output. Also mess
around with some of the command line switches available on your operating
You have learned how to use some of the utilities provided
with most operating systems to see protocols in action. We took a look at
interactions between various layers of the OSI model. Our example used a
tool called ping and showed how IP interfaces with the lower level LAN
protocols, in this case our LAN was running on Ethernet.
We also took a first look at routing IP addresses, and how
you can tell where data addressed to different places will go on the Internet or
your LAN. This is a fairly complex subject to get a good handle on, below
are links to help further your knowledge, along with recommended books.
The big idea I want you to take away, is that things like ARP and the six-byte
LAN addresses are only used on the LAN. Once your data leaves the LAN,
routing decisions are made using the network layer protocol, IP. IP is a
network layer protocol, its addresses let you talk to anyone else on any
connected network running IP, theoretically speaking. Things such as
firewalls can and will prohibit you from talking directly with another machine
running IP, but the point is that the layer 3 protocol, the network layer
protocol, allows you to leave the LAN. Within the LAN, your data will get
where it's going by using the MAC address. Off the LAN, various other
protocols besides Ethernet will most likely come into effect, none of which your
LAN needs to know about. At that point, the IP address will assist the
intermediary routers in deciding where your data goes. Note that between
the routers, various addressing schemes other than Ethernet might be used, but
from the IP layer up (TCP/UDP and application layers), the data in the packet
remains unmolested. As always, I appreciate any feedback and criticism,
along with any errors that you have discovered in this document. I can be
reached via email at firstname.lastname@example.org.
I want to do a tutorial specifically showing what ARP packets look like going
over the network compared to other packets.
Links and References
Ok, with all this information, I can imagine you’d like
some other ways of learning it. After
all, how I arrange my words might not work best for everyone.
Below are links relevant to this discussion that I found helpful learning
things of this sort. Good luck, and like always, ask me anything, correct me if I’m
wrong. I have received many
wonderful comments about my last tutorial, along with some suggestions for
improving the technical accuracy of the document.
I appreciate all of them and hope you’ll continue to get something
beneficial out of these tutorials. The
best way to learn these things is to run the programs described here with
different command line options and see what output you get.
Please don’t hesitate to ask me anything, I’m more than willing to
help. Once again, my email is email@example.com.
I hate linking to links pages, but this one is worth it:
Explains subnetting very well.
IBM’s redbook on TCP/IP, yah! Free and 700+ pages!
I didn’t cover netmasks here, but this does!
Still confused on subnetting?
Microsoft takes its stab at explaining addressing and
Great tutorials on TCP/IP, routing, ATM, and frame relay!
The Ethernet Authority!
TCP/IP Illustrated: Volume 1
Stevens classic, you still don’t own it?
Interconnections, Second Edition
Concentrates on connecting networks into larger networks,
also a classic!
Designing Routing and Switching Architectures
Need to build a high performance network by next month?
Get some caffeine and read this book.