Lecture Notes

Introduction to C Programming (COP 2220)

Internal Data Representation

Objectives

  • To understand how computer memory is structured
  • To understand how the contents of computer memory can be represented in binary or hexadecimal notation
  • To understand how character data is encoded and stored in memory
  • To understand how integer data is encoded and stored in memory
  • To understand how floating point data is encoded and stored in memory

Introduction

  • All instructions and the data they operate on are stored in computer memory
  • Computer memory consists of identical 8-bit bytes; each of the 8 bits can be in one of two states, which we can visualize as "0" and "1"; for example:
0 1 0 1 1 0 1 0
  • Because there are 8 bits in a byte, and each bit can have either the value 0 or 1, there are a total of 256 possible different arrangements, or bit patterns that each byte can hold (2 x 2 x 2 x 2 x 2 x 2 x 2 x 2, or 28); the above illustration shows one of them
  • Each byte has an address in memory, beginning with byte 0
  • Memory addresses are usually given in hexadecimal (base 16); use the Windows Calculator (Scientific View) to convert between decimal and hexadecimal:
Byte Position
(decimal)
Memory Address
(hexadecimal)
0 0x0000
100 0x0064
1,234 0x04D2
10,000 0x2710
  • Instructions and data can be represented in a single byte, or in groups of sequential bytes – you can't tell by just looking at a byte what it represents
  • Memory (and external storage like disk drives) are packaged in groups of bytes:
Unit Abbr Nbr of bytes
kilobyte KB 210 = 1,024
megabyte MB 220 = 1,048,576
gigabyte GB 230 = 1,073,741,824
terabyte TB 240 = 1,099,511,627,776

How many bytes in 15 MB? (15 x 1,048,567 = 15,728,505 bytes)


Representing Bit Patterns as Hexadecimal Digits

  • Because long strings of 0's and 1's are difficult to work with, it is convenient to use a hexadecimal digit to represent a 4-bit nibble
  • This table shows the equivalence of binary (base-2), decimal (base-10) and hexadecimal (base-16) values:
Decimal Binary Hex
0 0000 0
1 0001 1
2 0010 2
3 0011 3
4 0100 4
5 0101 5
6 0110 6
7 0111 7
8 1000 8
9 1001 9
10 1010 A
11 1011 B
12 1100 C
13 1101 D
14 1110 E
15 1111 F
  • The byte displayed above can be represented as:
0 1 0 1 1 0 1 0
  (an 8-bit byte)

  or

0 1 0 1 1 0 1 0
  (two 4-bit nibbles)

  or

5 A
  (two hex digits)
  •  The last version (one byte represented as two hex digits) is that most often used in memory dumps or file dumps as with this program: hexdump3.c
    • Download and compile the program
    • Use the program to "dump" a file
    • Adresses are on the left, in hex (relative to the first byte in the file)
    • Byte contents are shown in hex, 16 bytes per row
    • ASCII (character) interpretation of each byte is shown on the right

Encoding Character Data

  • Character data is used to represent text
  • To encode character data, a coding scheme is used that assigns to each character a binary code.
  • Most common coding schemes are EBCDIC (IBM mainframes), ASCII (teletypes and PCs), and UNICODE (modern, expanded version of ASCII)
  • The standard ASCII coding scheme uses numbers 0-127 (decimal) to represent characters:
    • Codes 0-31 are "non-displayable" characters that are used to control communication display of the data (also called control characters)
    • Codes 32-127 "displayable" characters – those that can be typed on a keyboard (see the ASCII table, here)
  • The 128 values (0-127) can be represented in 7 bits (because 27 = 128), so each ASCII character can be represented in one byte (the first bit of which would be 0)
  • The ASCII table shows the decimal code for each character; for example, 65 (decimal) is the code for the character 'A', and 97 is the code for 'a'
  • The Windows Calculator (in "Scientific View") can be used to convert from decimal to binary (use "digit grouping" for clarity)
  • 65 (decimal) = 0100 0001 (binary, after adding the leading 0), and 97 (decimal) is 0110 0001 (binary)
  • The ASCII table also shows the hex equivalent for each character, so we can directly write the hex encoding for each character (you can verify this with the Calculator, which also has a Hex mode), so 65 (decimal) is 41 (hex), and 97 (decimal) is 61 (hex)
  • Using the hex codes, what would the text "Cat-2" be encoded as? Using the table:
C a t - 2
43 61 74 2D 32
  • What do the actual bytes look like? (Not that we often want to see this, but it's easy to translate a hex digit into a binary nibble.)
C a t - 2
43 61 74 2D 32
0100 0011 0110 0001 0111 0100 0010 1101 0011 0010

Interpreting Character Data

  • The above table can also be used to translate a hex memory dump into the characters it represents –  if the bytes are ASCII coded characters; for example, what text do the five bytes containing the bit patterns 5B 45 3C 6B represent?
5B 45 3C 6B
[ E < k
  • The current character coding standard is the Unicode Character Set, that uses 16-bit (2-byte) codes, which can represent more than 65,000 characters; the first 128 are the same as the ASCII codes, so Unicode is a "superset" of ASCII
  • Most of the ASCII characters with codes 0-31 (control characters) are not often used; of those that are, these are the most common:
Name Hex Literal*
Null 0x00 '\0'
HTab 0x09 '\t'
Newline 0x0A '\n'

* A "literal" is the form we use to write the control character in C source code

  • Another popular coding standard for PC's is "Code Page 437" used on the original IBM PC (1981)

Encoding Integer Values

  • An integer is an exact, whole number; we use integer data to represent values we can count – the number of students in a class, for example
  • An unsigned integer is one that has no sign, only a magnitude
  • A signed integer is one that has a sign – negative, positive, or zero
  • Encoding unsigned decimal integers is easy: just convert it to its base-2 (binary) equivalent (using the Windows Calculator, for example), and prefix it with zeros to fit evenly into the number of bytes dedicated to hold it; for example, to encode the unsigned decimal integer 78 in one byte:

78 (decimal) = 100 1110 (binary) = 0100 1110 (byte) = 4E (hex)

  • The range of unsigned integers that can be represented depends on the the number of bytes used; the smallest value is 0, the largest is 2n, where n is the number of bits for the given number of bytes:

Unsigned Integers
Bytes Bits Smallest Largest
1 8 0 255
2 16 0 65,535
4 32 0 4,294,967,295
b n = 8 * b 0 2n -1
In modern computers, 4-byte integers are commonly used, as this is the width of CPU registers; dedicating more bytes to an integer than this usually means software will be required to perform some calculations, which is less efficient than if hardware alone is sufficient
  • Most of our calculations will require signed integers – positive and negative (and, of course, zero). To do this, one common coding scheme reserves the most significant bit (the leftmost bit) to indicate the sign of  the integer – 1 for negative, and 0 for non-negative. (Note that the integer 0 is grouped with the non-negative integers). The magnitude of the number is represented by the remaining rightmost bits.
  • Here is how the signed integer value +46 would be represented in a 1-byte data item:

  • Reserving one bit as the sign bit does not change the total number of integers that can be represented in n-bits – half of them will now be negative, and half will be non-negative; for signed integers, the range of values that can be represented:
Signed Integers
Bytes Bits Smallest Largest
1 8 -128 +127
2 16 -32,768 +32,767
4 32 -2,147,483,648 +2,147,483,647
b n = 8 * b -(2n-1) +2n-1 - 1

Note that since zero is included in the non-negative half by convention, one more negative integer can be represented than positive

  • Non-negative signed integers are encoded by converting them to binary and prefixing with 0's as necessary to fill the number of allocated bits; note that the left-most bit – the sign bit – must always be 0
  • Negative signed integers are encoded in "two's complement" notation, which allows for more efficient logic circuits. It is done as follows:
  1. Start with the stored form of the positive of the integer in binary
  2. Invert the bits (use the Not operation in the calculator)
  3. Add 1
  • The result is the encoded version of the negative integer. For example, show how the integer value -46 would be represented in two's complement notation in one byte:

    +46 = 0010 1110 -> 1101 0001 (invert) -> 1101 0010 (+1) = D2 (hex)

Here is another example: Show how the integer value -1234 would be stored in two bytes:

   +1234 = 0000 0100  1101 0010
        ->      1111 1011  0010 1101  (inverted)
        ->      1111 1011  0010 1110  (after +1)
        =        FB 2E (hex)


Decoding Integer Values

  • To interpret (decode) an encoded signed integer, use these rules:
    • If the leftmost bit (the sign bit) is 0, the integer is non-negative, and its magnitude is the binary representation specified by the bit pattern

    • If the leftmost bit is 1, the integer is negative, and its magnitude is is the binary representation specified by the two's complement of the bit pattern

    Example 1: Determine the value in decimal of the one-byte signed integer represented by the stored bit pattern 3B.

    3B = 0011 1011 (binary)

    Since the sign bit is 0, this is a non-negative integer, and we can convert it directly to its decimal equivalent: +59

    Example 2: Determine the value in decimal of the one-byte signed integer represented by the stored bit pattern 9A.

    9A = 1001 1010 (binary)

    Since the sign bit is 1, this is a negative integer, and we need to find the two's complement before converting to its decimal equivalent:

    1001 1010 -> 0110 0101 (inverted) -> 0110 0110 (+1) = +102 (dec)

    So the original number must have been the negative of this, or -102.

    Example 3: Determine the value in decimal of the integer represented by the stored bit pattern FF F2 4A B2.

    This is a 4-byte (32-bit) integer, and the bit patterns are given in hex. The sign bit is still the most-significant (leftmost) bit in the most-significant byte, in this case:

    FF ->  1111 1111

    Since the leftmost bit is 1, this represents a negative integer. To find its magnitude, take the two's complement of the bit pattern and convert the resulting binary number to decimal:

      F      F        F      2        4       A        B      2       (hex)
    1111 1111  1111 0010  0100 1010  1011 0010  (bit pattern)
    0000 0000  0000 1101  1011 0101  0100 1110  (2`s comp)

       = +898,382 (decimal), so the encode value must be -898,382


Encoding Floating Point Values

  • A floating point is an approximate number value with a fractional component
  • We use floating point data to represent data we measure, like weight or distance
  • Floating points are always signed, and typically represented in 4 bytes (single-precision) or 8 bytes (double-precision)
  • The number of bytes used affects both the magnitude of the number that can be represented, and its precision: single-precision floating point values are accurate to 6 or 7 significant figures, double-precision to 15 significant figures
  • A floating point number is represented in normalized notation – similar to scientific notation (like 6.02 x 1023), except, of course, in binary, and with a few other complications to make processing more efficient
  • A notional floating point encoding in 4 bytes (32 bits) looks like:

  • Different computers use variations of this scheme: a different number of bits for the exponent or fraction, or a different means of encoding the exponent, but the process described below is similar for all
  • Note that the range of floating point values that can be represented with this scheme (using a 7-bit exponent) is roughly ±1020. The 24-bit normalized binary fraction will accurately represent only 7 decimal significant figures. To increase either the range of values that can be  represented or the accuracy (in significant figures), more bits would be required for the exponent or fraction, respectively.
  • Encoding a decimal floating point number, say -35.2234, requires several steps, including converting both the integer and fractional portions to binary, and normalizing the result to determine the exponent; we won't pursue the details, but note that this indicates that handling floating point values is much less efficient than integers – hence, we'll use integers whenever possible for speed of execution

Decoding Floating Point Values

  • Interpreting (decoding) a 4- or 8-byte bit pattern representing a floating point number requires several steps: extracting and decoding the exponent to denormalize the fractional part, then converting both the integer and fractional portions to decimal; again, we won't pursue the details


Comparing Stored Data

  • Given the above, suppose a part of computer memory contains the bytes whose bit patterns are given (in hex) as:

4D 49 54 21

What values do these bytes represent? The simple answer is that you cannot tell without knowing if these bytes represent character data (in ASCII), a 4-byte integer, or a floating point value. You can decode the first two using the information above, the last is given below:

Decoding 4D 49 54 21
ASCII ?
Integer ?
Floating Point +7.192541 x 10-19
  • So you can see that it is important for us to be consistent in our choice and use of data types in our programming – bit patterns in bytes have no inherent value


Data Types in C

  • There are several valid data types used in C programming; the most common are:

    • char: used to hold signed integers in one byte, most often used to hold the ASCII character code (0-127)
    • int: a signed integer in four bytes
    • double: a double-precision floating point value in eight bytes

There are also several variations of these, but don't unless there is a specific reason to do so.


Self-test Questions

  1. To understand how computer memory is structured

    1. How is computer memory addressed; i.e., how are memory locations assigned?

    2. What is the memory address (in hex) of the byte in position 112 (decimal)?

    3. How many bytes are there in 18 KB? How many bits?

  2. To understand how the contents of computer memory can be represented in binary or hexadecimal notation

    1. Convert the following byte contents from binary to hex:

      1. 1000 1010  1101 0011
      2. 0000 0001  1011 1111
    2. Convert the following byte contents from hex to binary:

      1. AB CD
      2. 12 3F  2C D1
  3. To understand how character data is encoded and stored in memory

    1. If these bytes contain ASCII coded character data, convert to text:

      1. 46 43 43 4A 20 52 6F 63 6B 73 33
      2. 4D 79 20 5A 49 50 20 63 6F 64 65 20 69 73 20 33 32 32 30 34
    2. Convert this character text to ASCII codes; show the conversion in hex.

      1. Hello, world!
      2. Call 1-800-COMCAST
  4. To understand how integer data is encoded and stored in memory

    1. How would these integers (decimal) be stored in memory in a 4-byte, two's-complement form? (Show the answer in hex.)

      1. 12,345
      2. -345
    2. If these bytes represent integers in 4-byte, two's complement form, what integers do they represent? (Show the result in decimal.)

      1. 00 00 0B F0
      2. FF F1 10 A2
  5. To understand how floating point data is encoded and stored in memory

    1. If these memory contents represent floating point values in 8 bytes, are they negative or non-negative?

      1. 8A 23 33 B2 F3 77 00 00
      2. 6E 4B 9C 00 00 00 10 2F
 Updated 01.20.2010