09/01/2021 | Press release | Distributed by Public on 09/01/2021 02:24
Network operators now need more information than ever just to get the job done; occasionally some of the foundational concepts can get hazy as time passes. With that in mind, I decided to revisit what really went into creating some of the core concepts used in routing.
In Internet Protocols (IP), the network 'prefix' is a really useful concept that allows efficient use of the Internet address space in IPv4 and IPv6. They're fundamental to how the network address plan and routing system works.
What is a prefix in this context? Before we explore the idea, it's important to understand the problem prefixes are solving. This problem is the routing problem: How is it possible to efficiently and effectively send packets between source and destination, in the Internet?
To start with, the basic answer is lists. Yep. That's it. The magic technology the Internet uses for routing, is lists of numbers. The routing problem is simply a list problem.
How big is that list? How do you look things up in it?
Is Internet routing basically just looking up lists?
Pretty much. In fact, you can make a network 'work' by just giving out unique IP addresses, of any kind, to each host (a host can be a device, like your smartphone). Then, you'll need a bigger table listing every single unique assignment to a host and the physical network link it, on which it can be reached.
But that would be kind of like giving every house on a block it's own street name instead of a number. It would still work but be a nightmare to look up, so wouldn't it be better to have just a number for each house, and they all share the same street name? That way, you can find your way to the street first, and go from there.
Now, imagine that the street has more than a dozen houses, it has millions or even billions. You'd have no hope of finding the right house if they're not arranged in some logical fashion, and imagine how long the 'master list' of all those houses has to be!
A list of street names, each with numbered houses, starts to make some sense. You can see where I'm going with this. The street names are a lot like our prefixes.
When the Internet was tiny, it wasn't so bad. You could have a messy neighbourhood without a logical structure for addresses. This meant a bigger table than was necessary to find the IP address where you needed to go.
But today's Internet isn't small. A chaotic list of IP addresses is not fine at all. If addresses weren't sorted into sub-lists, each under its own prefix, the memory requirements imposed on each router would be enormous.
That is the entire point of the Regional Internet Registry (RIR) system's address distribution efforts. RIRs maintain the address space by preserving the block structure in the method of delegation. RIRs delegate addresses in continuous blocks, and those blocks can be called prefixes if they obey one simple rule.
The only rule of prefixes: The addresses within share part of the address
A prefix is just a way of putting IP addresses in a logical sequence. But a sequence can't just be any sequence. For example, the numbers 72, 84, 101, and 103 form a sequence: They're in increasing order.
But that kind of sequence isn't good for sorting in this manner. Let's look at them in a core language of the Internet, binary notation. When we refer to these numbers in binary, we call them 'bits'.
Table 1 - Some numbers with binary equivalents that aren't in a useful order.
Looked at in binary, there's no useful pattern to these numbers (which uniquely identify them). They're in order but they're not complete, logically speaking, when looked at this way.
That doesn't make it necessary to discard the idea of working through numbers in order, though. It's just necessary to be a little more selective. Is it possible to find numbers that are in order but share certain characteristics in binary? Let's look at 72 to 75.
Table 2 - Some numbers with binary equivalents that share the first five digits in common.
Ok, that's better. Let's look at the shared digits at the beginning of each binary. They're all starting with 10010. Then there are two bits (marked XX above) that change. Congratulations, that can work as a prefix! This is a prefix with 'length five' because it's got five shared numbers. The prefix is five bits long and there are two bits of interchangeable values.
This is a set of four things that we can group together as one. One prefix describes all four things here, and usefully, it doesn't describe things when counting further. Counting further gives you 76, 77, 78, and 79 (1001100, 10001101, 1001110, and 1001111 in binary), so the prefix 'length five' part doesn't work anymore. They don't start with 10010. That's handy, because it's more useful to have a range with a start and a finish point.
However, what if it's necessary to bundle the group of 72 to 75 together with the group of 76 to 79? That can be done, but it might bundle in more numbers as well. The binary of 76 to 79 all start with 1001. This leads to a prefix with length four.
You could go further and take your prefix down to length three, but as you reduce the length of the prefix it becomes a bigger and less specific 'street' of addresses because there are more addresses bundled up there.
The magic of prefixes is being able to refer to a block of addresses as one thing, but you need to keep in mind how big of a 'chunk' you want to refer to. The prefix becomes more and more 'efficient' (doubling its exponential growth) as its prefix length gets bigger, because you've got a smaller 'street', so it's easier to navigate.
A prefix is shorthand for a specific part of the global address space, or even all of it
In our example there was a prefix of 10010XX, then when it got a bit broader the prefix changed to 1001XXX. There were more X values, and it was less specific. This means that 10010XX is actually 'inside' 1001XXX, because 1001XXX includes the number combinations afterwards (including 10010XX).
Imagine somebody is using prefix 192.168.1.0/24 (this is a block of 255 addresses, inside the RFC 1918 private address space and you probably use it in your home Wi-Fi). Then somebody might refer to 192.168.0.0/16, which is the block of 65,535 addresses, which includes the smaller 255 block. This is sometimes referred to as the 'larger' prefix 'covering' the 'smaller' prefix.
You could say that the 'smaller' prefix is 'more specific' inside the 'less specific' larger prefix. Or you could say it's a 'longer match' because more of the bits of the fixed part are matched.
These ideas of being more specific or 'inside' are strongly aligned to the property of being in a tree structure of branching decisions. They're like Venn Diagram set theory. If you remember your set theory from school, you might remember the Total Set: Everything.
There is a special prefix sometimes called '0/0' (although in IPv4 it's really 0.0.0.0/0 and in IPv6 you'd call it ::0/0). That special prefix refers to ALL the address space in that family. In IPv4, it's all four billion addresses - one prefix, the entire Internet. We will talk more about that shortly.
Why are you talking about binary?
You can refer to an address as 192.168.0.1 for example but it's actually the binary value 11000000101010000000000000000001. I don't know about you, but I struggle to remember more than three or four numbers. Phone numbers are more than three or four digits but they typically get recalled in groups of two, three, or four digits.
Internet addresses are no different. The binary value of 11000000101010000000000000000001 is the same as the 32-bit numerically encoded value 3,232,235,521. You could have used that, but it's hard to remember. You can't see the prefix easily.
If you group the binary into four chunks (32-bits is four 8-bit bytes) you get 11000000 10101000 00000000 00000001 which then converts to four 8-bit values of 192, 168, 0, and 1.
192.168.0.1 is a way to show the binary in values that are easier to think about. Dots are just how we separate the chunks of numbers. In the early days, some computer systems (Univac for instance) used commas to separate these values, and it's routine to refer to an Internet address by the hexadecimal value if you are booting computers over the network, but the defined standard for coding is the 'dotted quad' form.
Figure 1 shows an IPv4 address, an IPv6 address, the binary equivalents, and the hexadecimal for the IPv4 address.
How do the prefixes work with IP addresses?
IPv4 addresses consist of 32 bits. IPv6 addresses consist of 128 bits. This is important.
Generally, no prefixes smaller than a /24 are delegated in current RIR address policy for IPv4. But what does this mean in bits? Remember the examples we talked about in binary? There were five numbers in the prefix and two changing ones denoted by XX. The XX is the range of addresses available.
In these small blocks of IPv4 addresses, there will be 24 numbers in the prefix. Subtract that from 32 to figure out that you will have eight bits of changing XXXXXXXX values. That works out to 256 addresses.
Generally in IPv6, the standard unit of address is /32 but for argument's sake let's use /24. In IPv6, you would subtract 24 from 128 to get a lot more addresses, which is why there's a much bigger pool of IPv6 address space for every delegate. RIRs generally don't delegate /24 prefixes, /32 is the normal option. RIRs delegate /32 in a way that ensures one prefix can keep growing if needed.
Right now, /24 is the smallest unit of routing that global BGP speakers want to routinely see in the public routing space. They want nothing smaller, but it's not policed well and that has caused problems over time.
How do the prefixes work in practice?
Looking up the table consists of doing a simple binary operation. For the given length and prefix, do the 'bits' of the address you want to find match the prefix, at the stated length? If they do, you've found the table entry for that prefix, and once you 'go' there, you can find the individual address on that street. If they don't match, you have to look elsewhere.
But what does that mean in practice?
If nothing else matches, the magic 0/0 default will match everything. You don't get the search benefit of a prefix match so you've got to find the address another way.
Table lookup can (and has) been encoded very efficiently using data structures called 'radix trees' that represent the prefixes as a binary tree structure. Radix trees are often how computer systems represent this kind of data. They are used in the BSD and Linux Kernel to maintain routing data, and by BGP systems written to run on general purpose hardware.
Faster routers use Ternary Content-Addressable Memory (T/CAM). These special memory chips can be used to make the prefix part be the address, in memory, to fetch a value at the speed of the hardware. However, T/CAM chips are very expensive, so there is a direct cost/performance trade off. For a given size and price of router, you can either use general purpose computing models like radix trees to hold the table or you can use special purpose hardware to hold the table. The trade off is size (memory) and cost (memory) and time (lookup in T/CAM versus computer indexing in the tree).
No matter which you choose, having the ability to refer to blocks of address by prefix reduces the 'costs' of finding the block. The bigger the block, the easier it is to find the block, but there are more addresses in it.
Route default is equivalent to 0.0.0.0/0
You may be familiar with the idea of a default route from your computer's routing table. It's the place you send things when you don't have a more specific (better) way to send them. This is exactly the same as 0/0. It means that you can use the same tree and table lookup method you use for the better routes you have to find one last, default route, to handle things you don't know.
That's a useful and powerful idea. Every home router that connects to the global Internet (unless you've paid extra to do complex routing) uses default routing to send things out into the world. Every local area network tends to have one device that can handle the more complex routing, and it in turn, becomes the default route for other hosts on that segment of the network. If you run Virtual Private Network it might be sending all traffic over the VPN but not the traffic local to your network. In that case, it has recast the default route 0/0 to point into the VPN instead of sending it to the local router to be handled.
Right in the middle of the address plan is one special block, an entire /8 prefix (16 million addresses), that is 'you' no matter where you are. It's the localhost address, it's always about you. Every host, from 127.0.0.0 to 127.255.255.255, is there in 127.0.0.0/8, although most people only refer to it by the first host, 127.0.0.1.
IPv6 might look different, but it's just bigger
If you've enabled IPv6 on your home network (and if not, why not?), then you'll know that IPv6 addresses don't look like IPv4.
192.168.0.0 is an IPv4 address in dotted quad notation. IPv6 has to refer to 128 bits of address, and rejected the dotted-quad form to go with hexadecimal encoded values for 16 bit fields, separated by the colon character. So, you get an IPv6 address like 2401:2000:6660::1 instead.
That double colon :: isn't a typo, we just didn't want to write out an enormous number of zeros.
In IPv6, a single continuous run of 'all-zeros' can be represented by the :: notation, to shorten the length. These IPv6 addresses are not that hard to decode into binary form, and usefully easy to decode for bit masks is along the prefix/length rule.
IPv6 hosts typically use 64 bits of this entire 128 bit address space to mark the locally assigned host address. So, the smallest unit we normally expect to see talked about in most of the global Internet is far bigger than a single host of 128 bits. It's actually a /64 worth of address uniquely identifying a small network.
IPv6 gives people more addresses and more freedom but we've kept the basic prefix/length model. We expect it to continue to serve as an 'efficiency' method, for the foreseeable future of routing.
Not bad for a simple trick of arithmetic.
For more in-depth lessons on routing, check out the APNIC Academy.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.