TechStuff Tidbits: Bits and Bytes

Published Feb 2, 2022, 9:56 PM

What's the difference between bits and bytes? Has a byte always been 8 bits (no, it has not)? And is a kilobyte 1,000 bytes, or 1,024 bytes? It depends! Let's all get confused with this refresher course in bits and bytes!

Learn more about your ad-choices at https://www.iheartpodcastnetwork.com

Speaker 1

Welcome to tex Stuff, a production from I Heart Radio. Hey there, and welcome to tech Stuff. I'm your host, Johan Strickland. I'm an executive producer with I Heart Radio. And how the tech are you? It's time for a tech Stuff tidbits. And I thought it would be a good idea to do an episode to talk about bits and bytes, largely because I think a lot of folks can confuse the two, include myself in that, and to be fair, it is confusing, like even when you're in the computer science field, this can get confusing. It's totally understandable. So we talk about data transfer rates in terms like megabits or gigabits per second, but we talk about data storage in terms of gigabytes or terabytes. And then we talk about you know, memory in terms of megabytes or gigabytes, and we mean different negabytes and gigabytes that we do with data storage. So it's not hard to get this stuff mixed up. But let's start with the bit. It's the easiest one to understand. So in computer terms, a bit is the smallest unit of information. So in computer data, you know analogies, you would call a bit the same thing as an atom. As like the building block for computer information, and a bit is a binary digit. It's how we, you know, can talk about individual pieces of data and we represent it as being either a zero or a one. It's base two. It's a zero or a one. And I often say you can think of a bit kind of like an on or off switch, and you can say like, well, one means the switches on and zero means the switches off. John Wilder, two key mathematician, suggested the term bit back in n So we're talking about the very early days of modern computer science. I mean a lot of groundwork had been laid by people like you know, Charles Babbage and Ada Lovelace. But the late forties is really where we start getting the foundation of modern computer science, and computers had been around a little bit, byn not by a lot. We were pretty young, um, and a lot of the computer technology actually evolved from earlier machines that were meant to do everything from count ballots to guide a loom when weaving a specific pattern. The evolution of the bit involves stuff like punch cards. But to get into all of that would be a little bit too much for a Tidbits episode. So two key coined this term, but it was Claude Shannon who really popularized it in his work titled A Mathematical Theory of Communication. He credited credited two Key in that work, so Shannon didn't try and pass this all phys his own idea. I think that's awesome because I don't see that a lot, you know, I see people using terms and not indicating that, hey, someone else actually thought this up. That wasn't Claude Shannon's style. Shannon was quick to credit to Key with coming up with the idea anyway. Shannon laid out the that a device capable of two stable positions or states such as off and on. There's the off state and the on state. Well, something like that can store one bit of information, and that this meant for in number of such devices you can store in bits of information. So, in other words, if you have twenty switches right, and each of the switches has an on or off position, you can store up to twenty bits with that system. From there, Shannon dives into how this approach can be used to communicate on a computational basis. The paper itself is free to read online. You can find it again. You just just search for the title, which was a mathematical theory of communication. It'll pop right up and you can read it. Uh, it's a technical document, and honestly, it's the kind of paper where I need to have a separate tab open so I can look up terms and meanings just to try to keep up. And even that is being generous. It it is. Uh, someone in computer science totally makes sense to them, like, no no brainer. For someone like me with my background in English literature, it requires a bit more homework on my part. But it is a truly fascinating and foundational piece of work in the computer science discipline. But then, what can you represent if you have just a single bit with a switch that's off or on. Well, with just two states, you can't really represent anything terribly useful for communication. Uh, you could do yes, no, but that's it. Like you couldn't form a question. You could just maybe given answer that's very very very simple, But you couldn't really process information with a single bit, like a processors that can only handle one bit would be useless. So let's look at what happens when we have more bits in our disposal. Well, each bit again has two potential states off on zero, one. But if you have two bits together, well, then you can get a shaven haircut. Sorry, that's a very ill dated dad joke. If you happen to know the whole shaven a haircut two bits, good for you. You might have appreciated that very bad joke I made. No. No. If you have two bits, you technically can represent four states with those two bits, right, you can have zero zero, that's the first one. You can have zero one, you could have one zero, or you could have one one. So with two bits you can represent four things. Well, what if you had four bits, Well, that means you could represent sixteen different outcomes and they would range from zero zero, zero zero to one one one one, and there will be sixteen different ones. If you had eight bits, you could represent up to two hundred fifty six versions or outcomes. So the easy way to represent this is to take the number two. That represents the number of states that each bit can have. You know, a zero or a one, that's two states. So you take the number two and you raise that too to a power equivalent to the number of bits you're talking about. So eight bits is the same as saying two to the eighth power or two hundred fifty six potential values. So this means that as you double bits, you are you know, or as you increase bits, you are logarithmically increasing the number of potential states. So if we double eight bits to get sixteen bits, that doesn't mean that we double two hundred fifty six to twelve. No, no, no, no, it is two to the power of sixteen. That that is sixty five thousand ft six. So you see that as you add bits to a system, you dramatically increase the number of states. In fact, every time you add a bit two systems capability of handling bits, you double the number of of potential values you can represent. So this is what I meant by a logarithmic increase. All right, So once you go with binary digits, you start to look at how many bits you need to do whatever it is you need to do. So let's say that you want to start off just by representing the Latin alphabet, right, You want to be able to use bits to designate letters of the alphabet and say this combination of bits means A, this one means B. Well, if you're just looking at the number of letters in the Latin alphabet, we would need at least twenty six values. Right, we would need twenty six different combinations in order to represent the just the basic alphabet. Full were bits would let us sixteen values, so that's not enough, But five bits would give us thirty two as because two to the fifth power is thirty two. So with five bits we could represent all the letters of the alphabet, and we'd have a couple of values left over where we could represent simple punctuation. However, we wouldn't be able to have upper and lower case letters. All letters would have to be the same case because each case would be a state or a value of its own. So capital J and a lower case J would each require their own designations, and we don't have enough bits to do that. If we have thirty two, we we would have have to have at least fifty two in order to do that, and we don't have that. We've got thirty two plus. We wouldn't be able to represent any numerals, or at least not all of them. Would just thirty two bits, unless we were to sacrifice some letters of the alphabet, because otherwise the alphabet takes up too many of the states or values. Now, the term by began to pop up a little bit around this time to describe the number of bits engineers were using to represent a character set. So, for example, if we used five bits to represent all the characters in our set, which I mean again, we would be limited to thirty two characters, then we would naturally refer to five bits as a byte in our system. And you've probably heard that eight bits are a byte. Well they are now, but in the early days, what you're referred to as a byte depended upon the system architecture you were working with at the time, so the bite was not always in forever on men. Uh, you know, eight bits, that's not the way that worked. There were five bit bytes, six bit bytes, seven bit bytes. Uh. These were all kind of hashing out over time as various companies were building out computer systems. Also in the early days of computing, designers created machines that had to for an instruction set. Architectures, and some systems used a five bit sized word, which is a collection of bits that becomes the native unit of storage. By storage, we're not just talking about storing data like in a hard drive. We're really talking about storage in the sense of a computer has to be able to hold onto a certain amount of information in order to process the information, you know, to execute some sort of operation on the data. So you're you're essentially talking about was the processor's capacity, how much information can it hold in order to do operations on that data. So smaller word size means you are working with smaller amounts of information that the processor can handle, and it limits what your processor can do and thus limits what your computer can do. So when we talk words, we're talking about stuff like CPU registers, which temporarily store small pieces of information while the CPU executes some sort of process on that data. Some early systems used five bit words, some used six bit words. Uh, and then they grew very quickly from there because that was so limited that you couldn't do much with them. All Right, we were just getting started. We're gonna take a quick break. When we come back, we will continue to talk about bits and bites. Okay, So in the early days of computing, the technological limitations would determine word length. So let's let's review really quickly. A bit is a single unit of information. It's either a zero or a one. A bite is a consecutive group of bits, which we used to represent characters, and a word refers to a consecutive number of bits or bites, used primarily for CPU registers, and you can think of word size as being an indication of how much information on computers CPU can handle for individual operations, large word size means the computer can handle bigger operations effectively. By the nineteen fifties, various companies began using a character set called b c D, which had you know at least forty eight characters in it, and to encode for the eight characters you would need to have at least six bits, So the six bit bite became kind of a standard for a while. By the time IBM was ready to introduce the System three sixty, the company had gravitated towards an eight bit size for bites. Now, technically the three sixty could get by with just seven bits to represent all the characters that it was going to use, but programming is way easier if you're dealing with bites and words that are based on powers of two. Seven is not a power of two, but eight is, so bumping up the bite size from seven bits to eight bits made more practical sense, and it also meant you had two hundred fifty six values to play with instead of a hundred twenty eight, which is what you would get if you were using seven bits to a byte. IBMS move would end up creating the foundation for bites moving forward, though it didn't catch on immediately. So by the time I was a kid learning about personal computers in the late seventies and early eighties, a byte was pretty firmly established as being eight bits, and in fact, I don't remember ever seeing anything that suggested that had not always been the case. So I walked away with the impression that, you know, a bit was always a zero or a one, and a byte had always been eight bits, but the eight bit byte really wasn't standardized until say, the early nineteen seventies. So you've got your bit, you've got your bite, which is eight bits and super quick. We should reference what prefixes like kilo, mega, giga, terra or killa. This is that's the way most people say it, and when they're talking about bytes and bits what those mean, And they come from metric prefixes, but bits and bytes are not metric units, and when it comes to bites in particularly gets real confusing. So kilo or killa means one thousand, Mega means million, Giga means billion, Tera means trillion, and you can go further. I mean, petta would be quadrillion, exa would be quintillion. So if you hear something like an exabyte, they're talking about the quintillion bytes and then it keeps on going up from there. But for the average person, terra is kind of where we max out when we're talking about modern personal computers, like a terabyte hard drive, that kind of thing. Now, let's talk about data transfer speeds and computer storage and computer memory and why we use terms like gigabit or gigabyte and what does actually mean. So when we're talking about transmission speeds, we are framing the discussion in terms of bits per second. Uh. This also can kind of get a little confused with bandwidth. UH, that's pretty easy to confuse with transmission speed. Bandwidth describes the capacity of a network. UH. In other words, the amount of data that network can handle or transmit at any give and time. And networks are not infinitely capable of handling data, they do have a cap This is one of those things that I s p s like to reference when they're talking about having to charge people in arm in a leg to access the Internet because there are capacity limits that a network will hit. Uh, most of the time we don't get to the full capacity limit. But that's still the story we get told whenever, whenever it's time to pay the bill. So um, there's that. But transmission speed is literally the rate at which data crosses from one point in a network to another, and that depends on a whole bunch of stuff, Like it can depend upon the distance between the two points, right, Like if you are transferring data from a computer that's on the opposite side of the world where from where you are, that's going to affect the transmission speed as opposed to like a computer that's you know, a mile away. Then like the kinds of connections, like what kinds of wires are connecting you as a copper is it fiber? You know, that sort of stuff also determines the transmission speed. There are lots of other elements that do too, but ultimately you figure out, you know, what your transmission speed is. Here in the United States, the Federal Communications Commission currently defines broadband internet speed as twenty five megabits per second down. That means the that's the data transfer rate that applies to information that's coming from the Internet to your computer. That that if it's twenty five megabits per second or faster, then that means you have broadband access at least on the down download side, and three megabits per second up. So this would be the speed at which your computer sends data back up to the Internet. So if you have twenty five megabits down and three megabits up, that counts as broadband in the United States. Uh. And since mega means millions, that means million bits per second downloading, three million bits per second uploading. By the way, there are a lot of folks out there, including myself, who say this definition is way too low and we should have a higher standard to qualify for broadband designation. And that is important. It's not just semantics. It's important because there are various government initiatives that are dedicated to extending broadband service too underserved regions and populations in the United States, because people have recognized access to the Internet is one of the most important elements to participating in modern society, particularly during a pandemic. And so if you define broadband as a very low standard, you're you're not really helping out people with these programs to get access to that very low standard, like companies are going to do the bare minimum they need to do in order to get that access to those people. So you could argue that this definition would keep people at a technological disadvantage and that we really should uh change the definition and to be more reflective of what broadband really is. Anyway, transfer speeds are all in bits per second, and the prefixes kilo, mega, giga, and so on are very straightforward. Kill a bit is a thousand bits, so I kill a bit per second. Transfer speed would be terrible, but it would be a thousand bits per second, and a mega bit is a million bits, so very easy to follow. But it is a very different story when we talk about bites, and it's confusing, is all heck to a lot of folks, including folks in computer science. All right, So in the early days, there was this real need to stick to powers of two when you were talking about bites. Again, this dates all the way back to the introduction of the IBM system three sixty where executed IBM, we're saying no, no, no, let's let's just deal with powers of two. It simplifies things. Otherwise stuff breaks down. So rather than describe one thousand bites, you know that a killer bite is a thousand bights. It was more elegant within the computer science model to describe a kilo bite as one thousand twenty four bites. A thousand twenty four is the same thing as two to the tenth power, but two to the tenth power or one thousand twenty four. There there's no easy naming convention to use for to describe one thousand twenty four bites, and the computer science world kind of appropriated kill a bite to describe it, because a thousand twenty four is kind of like a thousand if you squint your eyes a little. From a computational standpoint, sticking with powers of two made things easier. From a semantic standpoint, it done mess things up because I kill a bit is a thousand bits, But to kill a bite at least originally was a thousand twenty four bites bud weight. It'll get worse, I'll explain after we come back from this quick break. Okay, a kilobyte is a thousand twenty four bytes because we wanted to stick to that power of two things. Well, then we get to megabyte. Well, mega means million, so megabyte should mean one million bytes. But in those same little areas of computer science, particularly those dealing with like computer memory, that kind of stuff, a megabyte was really seen as actually being one million, forty eight thousand, five hundred seventy six bytes. And you might say, what, what why? Well again, it's those powers of two, So a kilobyte was two to the tenth power. A megabyte was two to the twenty power or one thousand, twenty four squared. But hey, one million, forty eight thousand, five d seventy six bytes is kind of hard to say, right, So let's just call it a megabyte, right, It's called a megabyte. Who's gonna care. By the late nineties, the International Electrochemical Commission had had enough of this nonsense because it was causing tons of confusion. I mean, the computer science world was going all humpty dumpty on the rest of us. Uh, if you don't understand that reference, Okay, And Alice and through the looking glass there's this encounter she has with Humpty dumpty, you know, the egg that sat on the wall, and Humpty Dumpty says words mean whatever he wants them to mean. He says, like, you know, the only question is who who is to be the master the words or me? And I'm not letting the words push me around. So when I use words, they mean exactly what I want them to mean. That's kind of what the computer science world was doing to the rest of us, And the brave among us said, Yo, you can't do that. Words mean things. So anyway, the I e C made a recommendation that KILLO, mega, giga, etcetera. Would mean the same thing they mean in metric systems. So in other words, if you use the word megabyte, you meant one million bytes. And if you wanted to go to the power of two route like if you if you really wanted to call one million, forty eight thousand bytes something, the I e C said, well, don't use megabyte. That's confusing. Will create a new designation. Call it a membe bite, m E B I byte b y t E. So the one thousand, twenty four version that wouldn't be called a kilobyte anymore, that'd be called a kippi byte. And they also went ahead and said the one thousand to twenty four to the third power or two to the thirty power would no longer be a gigabyte. That would be a gimby byte, and one thousand twenty four to the fourth power or two to the power would be a tabby byte, not a terabyte, and so on, and so if you said gigabyte, you meant a billion bytes, and terabyte would be a trillion bytes. This was meant to clarify things so that everyone knew what people were talking about when they were using a specific designation, and that would clear everything up, except the computer science world at large kind of ignored the suggestion. So there remains this use of the terminology that, depending upon the context, will mean one number of bites or a different number of bites. Like if you're talking about RAM, for example, you're really referring to the power of two version the base to description in other words, like the one twenty four for kill a byte. But if you're talking about hard drives, while you're typically talking about the base ten version, because these days hard drives when they're marketed, are marketed toward that. So a five hundred gigabyte hard drive is supposed to be five hundred billion bytes, although it's usually slightly off from that, but it's supposed to be in that neighborhood, and it's not you know, the the power of two variation of of that. So yeah, it's all clear as mud. Right, So kill a bit is different from a kilobyte, and sometimes a kilobyte is different from a different kilobyte depending on the context. Hurts main things here, science geeks, Now, I'm not I'm actually I'm not so sure about that anymore the more I read into this, uh, and it got more muddy too, because there's not really a universal standard on how to abbreviate things like megabits versus megabytes, so it can be confusing when you're reading a document about whether or not the author means megabits or megabytes. Now, some folks will tell you it all depends on which letter in the abbreviation happens to be capitalized, but really that's not universal. You you really need to define those abbreviations upfront for any given piece, because there's no formal agreement on which one should be used. Where there are some schools and some scientists who have a preference that they demand folks follow, but again, it's not universal, so it doesn't really help. All right, But let's let's talk quickly about using bytes and bits in a practical example, why you would care. Let's say that you had a single sided, single layer DVD and d v D has a data storage capacity of four point seven gigabytes, and in this case we do mean the base ten version, so gigga does mean billion, not one thousand, twenty four to the third power, so we're talking four point seven billion bytes of data. Now, let's say we want to send a copy of the data that's on this DVD to a computer that's on our network. And let's say that the connection between the two computers has a transmission speed of three hundred megabits per second, which means it can transfer three hundred million bits every second. How long will it take us to transfer the information on the DVD to the other computer. Well, we have to remember that a bite is eight bits, so four point seven billion bytes is actually thirty seven point six billion bits, and we can move three hundred million bits per second. So we do some division and we see that means it will take us about a hundred twenty five seconds or just over two minutes to transfer that full DVD to the other computer. That's, of course, assuming that we have a steady transfer speed, which never happens in real life, but you know, for the sake of this example, we'll just assume it works. And a lot of different stuff effects transmission speed, including how many other devices are transmitting data over that same network at that moment um, which also again applies to the network's bannedwidth capacity. But yeah, you get the idea now. Remembering the bits versus bites is really handy if you want to make some rough estimates of how long it's going to take you to download a specific something. I used the DVD example, but perhaps one that would be more applicable to folks listening to this show would be if you wanted to download a game like, let's say, to your PC or to a console. Some games like Call of Duty, Black Ops, Cold War are well over one hundred gigabytes in size, and since digital download has become a prevalent way that gamers get access to games, that means you have to download more than a hundred billion bytes or eight hundred billion bits of information to your console. And I imagine a lot of folks out there don't have access to an Internet connection that has a gigabit per second or faster transfer speed um. For example, I live in a pretty nice area of Atlanta. I mean it's not the nicest, but it's it's pretty nice, and and I max out and around a hundred megabits per second under ideal conditions. Most of the time, I'm somewhere in the fifty to sixty megabits per second range. I don't have access to gigabit fiber, I cannot get gigabit speeds, so fifty to sixty megabits per second is the best I can hope for. So knowing your transmission speed plus remembering that it's eight bits to a bite, can help you estimate how long it's going to take you to download that latest game. For me, it usually means I'll download and it'll finish shortly before the sequel to whatever it is I'm trying to download comes out. Okay, that's hyperbole, but not by much anyway. I hope that this episode has cleared up bits versus bites for most of you. It's it does get way more technical than what I went into. And you know, we didn't talk about things like what is a thirty two bits system versus a sixty four bit system? And you know, what does that mean effectively does that Does that have anything to do with the computer speed? What about things like processors and and how many bits they can handle? Does that mean they're faster? Um? That we might cover in a separate Tech Stuff Tidbits episode. This was really just more of the basics of bits versus bites and the confusing nature once you start getting into you know, the kill a byte and megabyte and gigabyte world, um, particularly if you're talking about RAM, because then you get back to that powers of two things and I get it from the computer science world, like I get the idea, uh that working within base to simplifies things massively and that therefore it makes way more sense to to look at large collections of numbers in terms of base two. Uh. But using the exact same terminology that we use to describe other groups of like data storage space. That's where that's where it grinds my gears, As some of my fellow podcasters like to say. Anyway, hope that that was useful information for you. If you have any suggestions for topics I should cover on tech stuff, whether it's a tidbit, a company, a trend in tech, anything like that, send me a message On Twitter, the handle for the show is text Stuff H s W and I'll talk to you again really soon. Text Stuff is an I heart Radio production. For more podcasts from I heart Radio, visit the i heart Radio app Apple Podcasts, wherever you listen to your favorite shows.

In 1 playlist(s)

TechStuff
2,469 clip(s)

TechStuff

TechStuff is getting a system update. Everything you love about TechStuff now twice the bandwidth wi…

Social links

Website

Follow podcast

RSS feed

Recent clips

Week in Tech: AI’s Problem Solving Problem

The Story: Are the US and China in a Tech War? w/ Jake Sullivan

Week in Tech: Nintendo Thinks Sideways

Browse 2,466 clip(s)