The URL is a handy way to navigate to a specific web page. But where did it come from? How does it work and why are they almost always in English?
Welcome to tech Stuff, a production from I Heart Radio. Hey there, and welcome to tech Stuff. I am your host, Jonathan Strickland. I'm an executive producer with I Heart Radio and how the tech are you? It's time for a tech Stuff classic episode. This episode originally published on June twenty nine, two thousand fifteen. It is titled What's My Address? It's all about Internet addresses and network addresses and why those are important. Let's sit back and listen. So to get the obvious out of the way, the Internet is a network of computer networks. That's why it's called the Internet. It's what allows your computer to communicate with other computers. But in order for there to be any communication, you have to have a couple of things. You need rules that all the computers are going to abide by. You need some sort of common language that all the different types of computers can understand, because otherwise you would only be able to receive information from similar machines as your own, because they all operate with different operating systems, using different file types that sort of thing. And you also need a method for computers to know where to send a message, because if you didn't have that, the Internet would just be a bunch of computers shouting into the void, hoping that the machine they are trying to reach actually hears them, while simultaneously hoping that everyone else ignores them. And it might seem like that's how things work in your typical Internet forum, but that's not what's really going on. So every device connected to the Internet has an Internet Protocol address, an I p address. Now, an IP address is sort of like a phone number or a physical address. It's the number that includes the information needed for data to arrive at that particular device. It tells the network where that device is in the grand connection of the network of networks. Using the old i p v four method, the version four of the Internet Protocol, IP addresses have a thirty two bit number. Now that creates a hard limit for addresses, which is specifically four billion, two million, nine d sixty seven thousand, two hundred ninety six of them. Now. Of those, more than two million of them are reserved for specific uses, and so you effectively have around four billion addresses that could be assigned. An i p v four address is represented by four groups of numbers. Those range between zero and two fifty five, meaning they are two hundred fifty six total potential numbers for each one and each of those four groups of numbers separated from the others by dots. Each of those numbers groups they represent a group of eight bits. So here's an example of an ip v four IP address. It could be two one six dot to seven dot six one dot one three seven. That's just a random example, and four billion addresses. Sounds like a bunch, but it didn't take that long for those addresses to start getting scarce. The address pool was exhausted in February of two thousand eleven. And you might have heard some stories about how certain companies, like big software companies UH that we're relying on I p v four have had some issues running out of IP addresses to assign to people on their networks, so they're folks couldn't actually connect, and not just people but executives. We're not having any luck connecting to the Internet because they had run out of IP addresses to assign to their employees. It's one of the reasons there's been a big push to move from i p v four to i p v six, which uses eight bit numbers, not thirty two bit numbers. So what does that actually mean? Well, if you go with I p V six, there are three point four oh three times ten to the thirty eight power or two to the hundred and twenty eight power address is available in I p V six. Why am I putting the number in those terms because to actually spell out the number in full would probably take most of this podcast, and that would only be interesting for a few seconds. So how can we put into terms what that actually means? Well, there'd be so many addresses that even if you were adding a huge number every pico second, it would probably last until our son actually burns out. So I think will be good in the long term when everyone switches over, will be in great shape until the sun burns out, at which point will have other problems to worry about. Now, an I p v six address is a little more complicated than the I p V four one. So uh, if you were to look at one of these addresses, an example would sound something like this. To zero zero one, colon c d b A, colon zero zero zero zero, colon zero zero zero zero, colon zero zero zero zero, colon zero zero zero zero, colon three to five seven, colon nine six five to It is significantly longer. Now you can actually simplify things by omitting the groups of zeros. You don't have to include all of those. There is a way of having shorthand to express that number, and the way you do that is you drop the groups of zeros, but you include an extra colon to signify that there is no mission. So the long address I mentioned just a bit ago could be shortened to two zero zero one colon c E d B A colon colon three to seven colon two. That double colon would represent that everything in between those were just empty groups of zeros. So that's one way of shortening it. Now. Your own computer's IP address is not likely to stay the same across multiple uses. Instead receives an IP address from a dynamic host configuration Protocol server that's on your network. The server gives the network a bit more freedom because it can shift IP addresses around whenever necessary, But other computers on the network, like web servers, they have to have the same IP address all the time because otherwise they would just get lost in the cluttery. They would have to consistently update all the registries to alert them of their new IP address whenever they changed or else. Any incoming traffic would never find the server. So while your personal computer might have an IP address one day and a different one the next day, the web server that you want to visit, that has the website you are interested in, is going to have the same IP address day to day. They have what is called a static IP address, So it's going to be assigned to that machine and only that machine. You'll never find another machine with that same IP address, at least as long as the original one is active. Now, the IP address is associated with the media access control address for that specific network interface on the server. Now, that's a MAC address. You probably have heard that term. You typically find the MAC address written on a little sticker that tends to be on these servers. Sometimes it sometimes it's written down in someone's notebooks somewhere. But that is what is permanently associated with the static IP address, and this is how the Internet keeps track of where everything is. It's all machine readable language, but it's not really useful to human it's not human readable. Most of us can't remember I P addresses that easily, particularly once you start adding lots and lots of websites. So if you had to remember the IP address of every single web server that you wanted to visit in order to, you know, access a specific web page. After a while, you would find it very difficult to keep them all straight. So we needed to have a different means of accessing these things to make it easier for people. And that method ends up being one that can correspond to those IP addresses but doesn't require us to remember the strings of numbers. And that's where the u r L comes in. U r L stands for Uniform Resource Locator, and that's what allows us to use human language to reach the information we want when using the web. And it consists of a few different pieces, and all of these correspond to machine language, which the computers understand, uh, and it's important to be able to match the two up. So the pieces that make up a u r L. First, if you look at a web browser and you're looking at the address bar and you're looking at a specific web page, the first thing you're probably going to see, depending upon the browser and which version you're using, is the h T T P colon slash slash prefix. Now that stands for Hypertext Transfer Protocol. That little string of letters defines the message format that tells the computer which sets of rules to follow when exchanging information across the Internet. Because there are lots of different sets of rules, different protocols, it all depends upon what you are actually trying to do. So there's the file Transfer Protocol, which is FTP UH and that's not the only other one. There's also ones like the Internet Message Access Protocol or IMAP. But ht P h T t P is the protocol we use primarily in web browsers. That hypertext transfer protocol now the middle bit of the web address after the H T t P colon slash slash. It corresponds to the server or group of servers you want to access within a top level domain. Well, that means we need to learn what a top level domain is. Uh So, if you're wondering what domain name is, if you look at something the the dot com part of your typical web address, that is the domain name. Domain names generally tell you something about the site you are visiting, So for example, dot com suggests a commercial business, dot org is an organization, dot gov is government, dot m i L is military. Different countries have their own top level domains like dot RU as Russia or dot UK is the United Kingdom. But these have become a lot more fluid over the last few years, particularly with the release of new top level domains that really open the floodgates and make this UH not quite as cut and dry as it was when it first when the web was was brand new. But that is your domain name or the domain name of the site that you're trying to visit. So for www, dot how stuff works dot com, uh, dot com would be the domain name, and how stuff works would be the second level domain off of the dot com top level domain, and Www indicates the host name the specific machine inside that second level domain that contains the information you want UH. And it's not always going to be Www. That's the most common, but it's not always going to be that. If you're looking at a long web address as then there's a slash after that top level domain, so maybe it was how stuff works dot Com slash and then another name there UH. Well, everything that follows that slash after the top level domain UH is a reference to the directory in the file system that contains the specific file you are interested in. So this is just a means of organization. It's a way for the computer to know where to look to pull the specific file you want to look at. So remember that when you're using a web browser and you're looking at a web page, you're really looking at a file and everything that follows that. That top level domain is just a way of pointing the computer, the server to the specific file you are interested in. So it sends it to your browser and you can see it or experience it. However, uh that may be. So for that to work, you can't have duplicate web addresses. Otherwise servers wouldn't know which machine you actually wanted to contact. So you can't have two different sites that both use www. Dot house stuff works dot com. But how do you prevent duplicates from happening? How can you make sure that someone doesn't go out and create a website that already uh you know, that uses an address that's already in use. Well, that's why there is a specific process you have to follow when you establish a web address, and that process begins with a registrar. Registrars are entities that are authorized to assign host names under one or more top level domains like dot com and dot org and that sort of stuff. Registrars then register those names with inter in I see this is a server a service. Rather under i CAN, I CAN as an organization. The acronym stands for the Internet Corporation for Assigned Names and numbers, and i can's job is to maintain order in all this chaos by overseeing the root name servers, among other things. I CAN actually has a lot of different responsibilities, but one of them is to make sure that this system remains orderly. So registration secures the web address for the server containing that relevant information. No one else will be allowed to use that web address, at least no one will be allowed to use it legitimately. Uh So, anyone who uses www dot house stuff works dot com should, in theory, go straight to the house stuff Works homepage and nowhere else. I'll talk about an exception to this a little bit later, but it involves something hinky. So how does the Internet know which computer you need to contact when you type in the web address in human language? Because, like I said, it's human language, not machine language. Machines don't read human language the way we do, at least not natively. You can build in natural language algorithms that can parse language and understand in a way or at least map in a way what that language means and then respond to it. But that's not the way machines typically communicate. So the way machines do this is through the Domain name system or DNS. Now, in the early days of the Internet there was no d n s. The Network Information Center maintained a text file that had web addresses mapped to I P addresses. So, in other words, if you had a web address back in those early early days of the Internet we're talking pre then your web address appeared in this text file and was corresponding. You know, it corresponded to the actual I P address of the web server that contained that web page. So if you had a website at that time, then it would be inside this text file. But as you probably can imagine, this text file got really big, really quickly as more and more entity started jumping on the Internet and and putting web pages on the Internet. This text file grew to an unmanageable size, so it was inefficient to keep a single text document as the reference. It was taking too long to cross reference web addresses to IP addresses. It was actually increasing the amount of time it would take for you to go to a website through your browser because it was just taking too long to resolve the web address name. So that's when the University of Wisconsin formed the Domain Name System and it automatically maps web addresses to IP addresses. So when you type in an address, your request goes out over your network the Internet service provider that you use, UM, it goes over their network to their domain name server. Now, not every computer on the Domain Name system has every web address and IP address stored in it. It would be crazy if they did. So you type in a web address, it goes out over your Internet service providers network to the DNS server. UH. It consults its registry to see if it in fact has the information needed, and if it doesn't, it works with the other servers on the d n S to find that information send it to your computer so that it can contact the appropriate web server and get the file that it once. So, if you're using a browser to look up a web page, your request goes to the d n S and that sends the IP address of the appropriate server once it's identified, it to your browser. Your browser then essentially sends a request to the web server that has that particular page that file and says hey, can I see that? And the web server, assuming everything is on the up and up, says, of course you can, and sends the file like the web page to your browser and uh, you know, travels through the network in this way. It's not it's not a direct pathway from the server to your computer necessarily, and then it shows up in your browser. Now, all this stuff happens in fractions of a second um. Sometimes it can take a little longer, depending upon the stats of the network and the amount of traffic involved, but it's still an incredibly fast process to especially when you consider how much is actually going on here with all the cross referencing to go from web address address to IP address to sending the signal to responding to it, it's amazing. We'll be back with more of this classic episode of tech stuff after this quick break. Now, I've got a couple of other things I wanted to mention. One of those is why are so many web addresses written in English? Now not all of them are, and in fact this has changed quite a bit over the last few years. But for a long time, English was the dominant language in web addresses, even web addresses that were in other countries that don't use English as their primary language. And the reason is pretty you know, well, pretty cut and dry. Really, it's because the people who developed the standards we use is for creating web addresses were mostly English speaking Americans, so um or English speaking natives, whether it was of America or other countries. Now. Uh. The people who established those rules included Tim Berners Lee, who worked for CERN and was what we considered the inventor of the Worldwide Web. He designed the first web page, and then the Internet Engineering Task Force or i e t F. They established the set of standards for web addresses in and in setting up the standards, the i e t F limited web addresses to upper and lowercase Latin letters, in other words, the letters that appear in the English alphabet, and you could also use digits from zero to nine, and also a few symbols, not all of them, but a few of them. And if you spoke English, you you happen to have an English keyboard, a keyboard that had Latin alphabetical letters on it. That was okay, But if you have to live in one of those countries that doesn't use the Latin alphabet, it made using the web more difficult. So for you folks out there using English keyboards such as myself. Imagine if instead the Internet relied upon a different alphabet, like an Arabic alphabet or a Cyrillic alphabet, and you only had the English alphabet or Latin alphabet to work with, it would be much harder for you to navigate the web. You would have to, you know, possibly use either you know, a mapping system, so that's mapping English or Latin letters rather to these other alphabets, or you might have to insert the letters one by one using a the insert um uh option. It's not the easiest thing in the world to do. Um So that was one of the drawbacks to the Web for many years. And it wasn't until two thousand nine that i CAN approved the use of international allied domain names. That meant that Web addresses could finally include non English characters in them. And coincidentally, perhaps two thousand nine was also the year that the United States government gave up control of i CAN and they transitioned it to a multi stakeholder governance model. And you might wonder what that means. It's it's essentially i CAN as a nonprofit organization and it only answers to stakeholders rather than having to answer to the United States government. And you might wonder why it was, I can ever answering to the U. S Government in the first place, But keep in mind the Internet itself is essentially the product of a US government project. It all started back with ARPA back in those days. It wasn't even DARPA yet, with ARPA Net, which established the general structure and protocols that would be used later and evolve into the ones that we use for the Internet. So the United States was very heavily involved in the construction and the standardization of the Internet, which is why things are the way they are now. Besides non Latin characters, you are els can now also contain emojis, those little symbols that mean all sorts of stuff. These days. Used to be just be smiley and frowny faces and winky faces, but now it's all sorts of stuff. And we've seen a couple of different examples of this. Coke launched an ad campaign that used a single emoji Internet address, and recently Norwegian Airlines did this for a special announcement they were launching direct flights from Copenhagen to Las Vegas. So there you are. L was www dot airplane, Emoji, slot machine, Emoji, money emoji dot w S, which is adorable and maddening. Um so this could lead to a new era of emoji. You are l's aimed at people younger than I am, so get off my lawn. Okay, But seriously, this actually does sound like a pretty nifty idea to mean. The limitation really is that unless you have a device that has the emojis available, it makes it harder to access these sites, at least harder to access them directly. You could still get there through other means, like a direct link from another site or search results from a search engine like Google. But maybe it's not a big deal anyway, because I mean, how many people actually bother to type in the web address for the websites they are going to? Besides me? I know I do it. Maybe some of you out there do it a lot too. But in my mind, this is very similar to the limitations we saw when we could only use Latin alphabetical figures or characters rather when typing in web addresses. It's it's very similar to that because if you don't have a smartphone or other device that has these emojis that are built into it, you then have to construct them some other way. So I imagine most of these web addresses will have a language variant of them, not just the emoji ones. All right, so we've got a couple of other things we need to cover before I can wrap up here. One of those is what about you are l shorteners? So these are techniques that redirect traffic to a domain name by using a short string to conserve characters. So, in other words, instead of having a long web address, you might have a much shorter one. Sometimes the shorter one is a vanity you are l so that it's very easy for you to tell somebody, hey, use this very short uh web address, you will go straight to my site. Sometimes it's more of like a random seeming string of letters and numbers, which makes it a little harder to communicate, at least verbally. But at any rate, these techniques are meant to make it easier to navigate to a specific page that otherwise has a long or cumbersome web address. It can be use for lots of different reasons. It can also be used to track traffic, So, in other words, it's a strategy so that you can tag traffic for one reason or another in order to keep an eye on what's going on with a site, So in other words, it's not just a redirect. It's also kind of tagging to get an idea of traffic patterns. Sometimes it's used in a sneaky way to disguise the actual destination of the redirect. So in other words, I could say, Hey, look at this really cool website, and I use a u r L shortener, and unless you have a means of previewing where that shortener is pointing to, you might click on it not knowing what the destination is and it might end up being a place where there's a lot of malware, or maybe it's a website that has tons of pop up ads and I'm really just trying to drive traffic to it because that way I can drive up revenue. Um. There are a lot of kind of ways to abuse the shortened u r L s. Sometimes it's just to shorten the address the purpose of messages that have a real strict character limit like Twitter, and of course Twitter has its own well. It purchased a u r L shortener so that if you are posting a web address into Twitter, it will automatically shorten it for you so that it can conserve some of those characters, because a d forty precious characters is you know, you've gotta you gotta maximize that as much as you can in order to get your point across. We've got more to say in this classic episode of tech stuff after these quick messages. Now, the redirect gets cross reference to the u r L and thus the IP address of the shortened web address. So that means there's a registry database for the shortened links. Uh. So you have the short version that's cross reference to the long version of the web address, which in turn is reference to the IP address of the actual web server that the page lives on. It's a lot of different degrees of separation, but it's still the basic same principle. Now, this idea was first patented in two thousand five. The pattern was filed back in two thousands, so this was something people were thinking about pretty early on, keeping in mind that the web really only got started in the early nineties. One issue with these services is that if the entity that maintains the registry database goes out of business, the links go dead. And that happens whether or not the destination website is working or has gone away as well. So in other words, let's say that I post a blog a blog post where I use a shortened U r L in it, but the company that actually made the short U r L shortener and maintained the registry goes out of business, then that lincolns up being dead, even if the website I linked to is still perfectly fine, whereas if I had used the full web address, then the link would presumably still work just fine, assuming that no one had changed anything the other end. So that is a downside. In fact, some people have really criticized you r L shorteners for that very reason, saying that you are destabilizing the Internet by using them. Here's a little bit of trivia about U r L s. Tim berners Lee has some regrets about how you are L s are actually structured. He feels some sense of responsibility for this, having played a part in creating the standards, and for one, he says that he really wishes that he had not used a double slash after h T T P colon. He says, the colon it means they could have used a single slash. Think of all the time you would have saved in your life not having to have put two slashes in. I mean, two slashes would be awesome in some cases, like it would be an amazing version of guns and roses, but for web addresses that could get pretty irritating. He also which says he wishes that he had used slashes instead of dots to separate each element in a web address. So for the example of h T T P colon slash slash www dot how stuff works dot com, that would instead become h T T P colon slash www slash how stuff works slash calm. So things would look a lot different if Tim burns Ley could go back and do it all over again. But now we've already established what standards are, so it's too late and we just have to struggle through with our dots and extra slashes, and that's all all we can all we can do. Really, here's another fun bit of trivia. So in May of news broke that Google's Chrome browser now has an experimental new feature in it that is an option that uses ultrasonic sound waves to transmit u r L data to nearby devices that have microphones. So ultrasonic sound waves are well outside the range of human hearing, so you wouldn't hear anything when you use this this actual feature, and instead of having to copy and paste a u r L from a web address into a message and then sending that message on to somebody to say, hey, check out this link, assuming that that person is in the same area that you are in, you could press a little button and a your computer speaker would omit this ultrasonic chirp, which again you would be unable to hear. But someone else with a computer or mobile device that has a microphone attached to it could have that get picked up by the device and it would translate the chirp into a u r L, which then you could visit. So if I found a really awesome website and I wanted to share it with folks, here at how stuff works. I could tell people, hey, you know, get ready to to listen with your computers, and then press a little button and transmit it. Just kind of neat. Now here's some not so fun trivia. One common practice that has been an issue since web addresses have become a thing is for competitors to register u r l s that are misspellings or typos of their chief competition, so that they themselves can grab that traffic. In other words, imagine that you are Coca Cola and you end up UH registering p w p s I dot com, so you're one letter off from Pepsi. You've just you've gone instead of doing EVE, done W, which is one key over from the E key, And the reason you've done it is so that anyone who makes the typo trying to visit Pepsi's website instead goes to your website, and your website might just be filled with propaganda about how coke is awesome and Pepsi is stupid and you should just buy coke products and not Pepsi products. This kind of U r L hijacking was really common still is fairly common, not as common today as it used to be, in large part because companies have gotten savvy to it. So a lot of companies will buy various variations of their brand names, including common misspellings of them, so that way, if someone types in the u r L, they get redirected to the actual website they wanted to go to, as opposed to going to some other site that is unrelated to the brand. So we're seeing it less and less simply because companies are taking the effort to prevent it from happening, but it still can happen. There's nothing that protects the system from that sort of stuff. In fact, it would be kind of antithetical to the spirit of the Internet to build in restrictions based on that. But worse than that, worse than U r L hijacking by far, is d n S hijacking. It's also known as d n S redirection. This is when someone redirects traffic to a rogue DNS server instead of the legitimate one that's on your Internet service provider. So remember earlier I said that if you typed in a web address and in your browser and you hit enter, normally, your computer would send this message along to the DNS server that's on your Internet service provider, which would then follow the set of rules to make sure it found the correct IP address to send to your browser, and then you would end up retrieving the proper web page, the one that you wanted. But there's some types of malware that you can encounter that will make fundamental changes to your computer or the web browser. Uh. There are a lot of different types of malware that can do this, and the ones I'm specifically talking about here would change the DNS settings on your computer so it's pointed to a different DNS server, one that's owned by somebody else and not the I s P. So when you open up your web browser and you type in a web address and this malware has affected your computer, instead of sending it to the DNS server on the s I s P, it sends it to this rogue DNS server, which could point you anywhere. It does not have to correlate your web address to the proper i P address and send you to the right place. It might send you someplace, you know, random, which would kind of be a case of someone being mischievous and and and just sort of destructive for no particular reason. Or it might send you to a website that has other malware on it so that your computer gets infected by even more malware. Or you might end up on one of those websites that just has tons of ads on it, because that's how the hacker is getting revenue. Every time you go on the ads, all these different ad impressions happen, and the hacker is getting paid on a per impression basis. Um or you know, you might end up having a mirror site, one that looks like an official site, but is there in order to phish data from you, to convince you that you are on a legitimate website, but in fact you're on a fake one, and the data you are sharing is going straight to the hacker, giving them even more power over you. That's a particularly nasty attack. Um Now, Fortunately, it's the sort of thing you can largely prevent because if you're careful, if you have UH virus protection on your computer, if you have a good firewall set up, you are limiting your exposure to that sort of stuff. If you're careful about the links you visit. You know, all of these things the basic security rules of using the Internet. If you follow those, you should be in pretty good shape. You probably won't encounter the d n S redirect attack. You could have cases of hackers actually targeting DNS servers, but that's something that we as users have no control over, and in fact, I s p s put a lot of money into protecting the servers they have. Obviously their entire business depends upon the viability of those machines, so it's one that we don't have to worry about quite as much. I hope that you enjoyed that classic episode of tech Stuff about addresses one of those topics that I think people take for granted unless they're, you know, in the I T field, in which case they think about it a lot. If you have any suggestions for topics I should cover in future episodes of tech Stuff, please reach out and let me know. The best way to do that is on Twitter. The handle for the show is tex Stuff hs W and I'll talk to you again really soon. Text Stuff is an I Heart Radio production. For more podcasts from my Heart Radio, visit the i Heart Radio app, Apple Podcasts, or wherever you listen to your favorite shows.