Lecture Notes
Introduction to C Programming (COP 2220)
Internal Data Representation
Objectives
- To understand how computer memory is structured
- To understand how the
contents of computer memory can be represented in binary or hexadecimal
notation
- To understand how character data is encoded and stored in memory
-
To understand how integer data is encoded and stored in memory
- To
understand how floating point data is encoded and stored in memory
Introduction
- All instructions and the data they operate on are stored in computer
memory
- Computer memory consists of identical 8-bit bytes; each of the 8
bits can be in one of two states, which we can visualize as "0" and "1"; for
example:
- Because there are 8 bits in a byte, and each bit can have either the
value 0 or 1, there are a total of 256 possible different arrangements, or
bit patterns that each byte can hold (2 x 2 x 2 x 2 x 2 x 2 x 2 x 2,
or 28); the above illustration shows one of them
- Each byte has an address in memory, beginning with byte 0
- Memory addresses are usually given in hexadecimal (base 16); use the
Windows Calculator (Scientific View) to convert between decimal and
hexadecimal:
Byte Position
(decimal) |
Memory Address
(hexadecimal) |
| 0 |
0x0000 |
| 100 |
0x0064 |
| 1,234 |
0x04D2 |
| 10,000 |
0x2710 |
- Instructions and data can be represented in a single byte, or in groups
of sequential bytes – you can't tell by just looking at a byte what it
represents
- Memory (and external storage like disk drives) are packaged in groups of
bytes:
| Unit |
Abbr |
Nbr of bytes |
| kilobyte |
KB |
210 = 1,024 |
| megabyte |
MB |
220 = 1,048,576 |
| gigabyte |
GB |
230 = 1,073,741,824 |
| terabyte |
TB |
240 = 1,099,511,627,776 |
How many bytes in 15 MB? (15 x 1,048,567 = 15,728,505 bytes)
Representing Bit Patterns as Hexadecimal Digits
- Because long strings of 0's and 1's are difficult to work with, it is
convenient to use a hexadecimal digit to represent a 4-bit nibble
- This table shows the equivalence of binary (base-2), decimal (base-10)
and hexadecimal (base-16) values:
| Decimal |
Binary |
Hex |
| 0 |
0000 |
0 |
| 1 |
0001 |
1 |
| 2 |
0010 |
2 |
| 3 |
0011 |
3 |
| 4 |
0100 |
4 |
| 5 |
0101 |
5 |
| 6 |
0110 |
6 |
| 7 |
0111 |
7 |
| 8 |
1000 |
8 |
| 9 |
1001 |
9 |
| 10 |
1010 |
A |
| 11 |
1011 |
B |
| 12 |
1100 |
C |
| 13 |
1101 |
D |
| 14 |
1110 |
E |
| 15 |
1111 |
F |
- The byte displayed above can be represented as:
or
or
- The last version (one byte represented as two hex digits) is that
most often used in memory dumps or file dumps as with this
program: hexdump3.c
- Download and compile the program
- Use the program to "dump" a file
- Adresses are on the left, in hex (relative to the first byte in the
file)
- Byte contents are shown in hex, 16 bytes per row
- ASCII (character) interpretation of each byte is shown on the right
Encoding Character Data
- Character data is used to represent text
- To encode character data, a coding scheme is used that assigns to each character
a binary code.
- Most common coding schemes are EBCDIC (IBM mainframes), ASCII (teletypes and
PCs), and UNICODE (modern, expanded version of ASCII)
- The standard ASCII coding scheme uses numbers 0-127 (decimal) to represent
characters:
- Codes 0-31 are "non-displayable" characters that are used to control
communication display of the data (also called control characters)
- Codes 32-127 "displayable" characters – those that can be typed on a
keyboard (see the ASCII table, here)
- The 128 values (0-127) can be represented in 7 bits (because 27
= 128), so each ASCII character can be represented in one byte (the first
bit of which would be 0)
- The ASCII table shows the decimal code for each character; for example,
65 (decimal) is the code for the character 'A', and 97 is the code for 'a'
- The Windows Calculator (in "Scientific View") can be used to convert
from decimal to binary (use "digit grouping" for clarity)
- 65 (decimal) = 0100 0001 (binary, after adding the leading 0), and 97
(decimal) is 0110 0001 (binary)
- The ASCII table also shows the hex equivalent for each character, so we
can directly write the hex encoding for each character (you can verify this
with the Calculator, which also has a Hex mode), so 65 (decimal) is 41
(hex), and 97 (decimal) is 61 (hex)
- Using the hex codes, what would the text "Cat-2" be encoded as? Using
the table:
- What do the actual bytes look like? (Not that we often want to see this,
but it's easy to translate a hex digit into a binary nibble.)
| C |
a |
t |
- |
2 |
| 43 |
61 |
74 |
2D |
32 |
| 0100 0011 |
0110 0001 |
0111 0100 |
0010 1101 |
0011 0010 |
Interpreting Character Data
- The above table can also be used to translate a hex memory dump into the
characters it represents – if the bytes are ASCII coded characters;
for example, what text do the five bytes containing the bit patterns 5B 45
3C 6B represent?
- The current character coding standard is the
Unicode Character Set, that uses
16-bit (2-byte) codes, which can represent more than 65,000 characters; the
first 128 are the same as the ASCII codes, so Unicode is a "superset" of
ASCII
- Most of the ASCII characters with codes 0-31 (control characters) are
not often used; of those that are, these are the most common:
| Name |
Hex |
Literal* |
| Null |
0x00 |
'\0' |
| HTab |
0x09 |
'\t' |
| Newline |
0x0A |
'\n' |
* A "literal" is the form we use to write the control character in C
source code
- Another popular coding standard for PC's is
"Code Page
437" used on the original IBM PC (1981)
Encoding Integer Values
-
An integer is an exact, whole number; we use
integer data to represent values we can count – the number of students
in a class, for example
-
An unsigned integer is one that has no sign, only
a magnitude
-
A signed integer is one that has a sign –
negative, positive, or zero
-
Encoding unsigned decimal integers is easy: just convert
it to its base-2 (binary) equivalent (using the Windows Calculator, for example),
and prefix it with zeros to fit evenly into the number of bytes
dedicated to hold it; for example, to encode the unsigned decimal
integer 78 in one byte:
78 (decimal) = 100 1110 (binary) = 0100 1110 (byte)
= 4E (hex)
-
The range of unsigned integers that can be represented
depends on the the number of bytes used; the smallest value is 0, the
largest is 2n, where n is the number of bits
for the given number of bytes:
| Unsigned Integers |
| Bytes |
Bits |
Smallest |
Largest |
| 1 |
8 |
0 |
255 |
| 2 |
16 |
0 |
65,535 |
| 4 |
32 |
0 |
4,294,967,295 |
| b |
n = 8 * b |
0 |
2n -1 |
In modern computers, 4-byte integers are commonly used, as this
is the width of CPU registers; dedicating more bytes to an integer than
this usually means software will be required to perform some
calculations, which is less efficient than if hardware alone is
sufficient
- Most of our calculations will require signed
integers – positive and negative (and, of course, zero). To do this, one
common coding scheme reserves the most significant bit (the leftmost
bit) to indicate the sign of the integer – 1 for negative, and 0 for
non-negative. (Note that the integer 0 is grouped with the non-negative
integers). The magnitude of the number is represented
by the remaining rightmost bits.
- Here is how the signed integer value +46 would be represented in a
1-byte data item:

- Reserving one bit as the sign bit does not change the total number
of integers that can be represented in n-bits – half of them will
now be negative, and half will be non-negative; for signed integers, the
range of values that can be represented:
| Signed Integers |
| Bytes |
Bits |
Smallest |
Largest |
| 1 |
8 |
-128 |
+127 |
| 2 |
16 |
-32,768 |
+32,767 |
| 4 |
32 |
-2,147,483,648 |
+2,147,483,647 |
| b |
n = 8 * b |
-(2n-1) |
+2n-1 - 1 |
Note that since zero is included in the non-negative half by
convention, one more negative integer can be represented than positive
- Non-negative signed integers are encoded by converting them
to binary and prefixing with 0's as necessary to fill the number of
allocated bits; note that the left-most bit – the sign bit – must always
be 0
- Negative signed integers are encoded in "two's complement"
notation, which allows for more efficient logic circuits. It is done as
follows:
- Start with the stored form of the positive of the
integer in binary
- Invert the bits (use the Not operation in the calculator)
- Add 1
- The result is the encoded version of the negative integer. For
example, show how the integer value -46 would be represented in two's
complement notation in one byte:
+46 = 0010 1110 -> 1101 0001 (invert) -> 1101 0010
(+1) = D2 (hex)
Here is another example: Show how the integer value -1234 would be
stored in two bytes:
+1234 = 0000 0100 1101 0010
-> 1111
1011 0010 1101 (inverted)
-> 1111
1011 0010 1110 (after +1)
=
FB 2E (hex)
Decoding Integer Values
- To interpret (decode) an encoded signed integer, use
these rules:
Example 1: Determine the value in decimal of the one-byte
signed integer represented by the stored bit pattern 3B.
3B = 0011 1011 (binary)
Since the sign bit is 0, this is a non-negative integer,
and we can convert it directly to its decimal equivalent:
+59
Example 2: Determine the value in decimal of the one-byte
signed integer represented by the stored bit pattern 9A.
9A = 1001 1010 (binary)
Since the sign bit is 1, this is a negative integer, and
we need to find the two's complement before converting to
its decimal equivalent:
1001 1010 -> 0110 0101 (inverted) -> 0110 0110 (+1) =
+102 (dec)
So the original number must have been the negative of
this, or -102.
Example 3: Determine the value in decimal of the integer
represented by the stored bit pattern FF F2 4A B2.
This is a 4-byte (32-bit) integer, and the bit patterns
are given in hex. The sign bit is still the most-significant
(leftmost) bit in the most-significant byte, in this case:
FF -> 1111 1111
Since the leftmost bit is 1, this represents a negative
integer. To find its magnitude, take the two's complement of
the bit pattern and convert the resulting binary number to
decimal:
F F F 2 4 A B
2 (hex)
1111 1111 1111 0010 0100 1010 1011 0010 (bit pattern)
0000 0000 0000 1101 1011 0101 0100 1110 (2`s comp)
= +898,382 (decimal), so the encode value must be
-898,382
Encoding Floating Point Values

- Different computers use variations of this scheme: a
different number of bits for the exponent or fraction, or a different
means of encoding the exponent, but the process described below is
similar for all
- Note that the range of floating point values that
can be represented with this scheme (using a 7-bit exponent) is roughly
±1020. The 24-bit normalized binary fraction will accurately
represent only 7 decimal significant figures. To increase either
the range of values that can be represented or the accuracy (in
significant figures), more bits would be required for the exponent or
fraction, respectively.
- Encoding a decimal floating point number, say
-35.2234, requires several steps, including converting both the
integer and fractional portions to binary, and normalizing the
result to determine the exponent; we won't pursue the details, but
note that this indicates that handling floating point values is much
less efficient than integers – hence, we'll use integers whenever
possible for speed of execution
Decoding Floating Point Values
-
Interpreting (decoding) a 4- or 8-byte bit
pattern representing a floating point number requires several
steps: extracting and decoding the exponent to denormalize the
fractional part, then converting both the integer and fractional
portions to decimal; again, we won't pursue the details
Comparing Stored Data
4D 49 54 21
What values do these bytes represent? The simple
answer is that you cannot tell without knowing if these bytes
represent character data (in ASCII), a 4-byte integer, or a
floating point value. You can decode the first two using the
information above, the last is given below:
| Decoding 4D 49 54 21 |
| ASCII |
? |
| Integer |
? |
| Floating Point |
+7.192541 x 10-19 |
Data Types in C
There are also several variations of these, but
don't unless there is a specific reason to do so.
Self-test Questions
-
To understand how computer memory is structured
-
How is
computer memory addressed; i.e., how are memory
locations assigned?
-
What is the memory address (in hex) of the byte in
position 112 (decimal)?
-
How many
bytes are there in 18 KB? How many bits?
-
To understand how the
contents of computer memory can be represented in binary or hexadecimal
notation
-
Convert the
following byte contents from binary to hex:
-
1000 1010
1101 0011
-
0000 0001
1011 1111
-
Convert the
following byte contents from hex to binary:
-
AB CD
-
12 3F
2C D1
-
To understand how character data is encoded and stored in memory
-
If these
bytes contain ASCII coded character data, convert to text:
-
46 43 43
4A 20 52 6F 63 6B 73 33
-
4D 79 20
5A 49 50 20 63 6F 64 65 20 69 73 20 33 32 32 30 34
-
Convert this
character text to ASCII codes; show the conversion in hex.
-
Hello, world!
-
Call 1-800-COMCAST
-
To understand how integer data is encoded and stored in memory
-
How would
these integers (decimal) be stored in memory in a 4-byte,
two's-complement form? (Show the answer in hex.)
-
12,345
-
-345
-
If these
bytes represent integers in 4-byte, two's complement form,
what integers do they represent? (Show the result in
decimal.)
-
00 00 0B
F0
-
FF F1 10
A2
-
To
understand how floating point data is encoded and stored in memory
-
If these
memory contents represent floating point values in 8 bytes,
are they negative or non-negative?
-
8A 23 33
B2 F3 77 00 00
-
6E 4B 9C 00 00 00 10 2F
|