skip to content
Table of Contents

Signed Numbers

To evaluate how we store negative numbers, we measure against four key requirements:

  1. Sign bit: Clear indication of polarity (0=+,1=0 = +, 1 = -).
  2. Consistency: Incrementing the bit pattern corresponds to a logical increase in value.
  3. Single Zero: Avoids logical ambiguity (prevents +0+0 and 0-0 logic errors).
  4. Simple Arithmetic: Subtraction can use the same hardware as addition.

Comparison Table

MethodSign Bit?Consistent?Single Zero?Simple Math?
Sign-MagnitudeYesNoNoNo
One’s ComplementYesYesNoNo
Two’s ComplementYesYesYesYes

Two’s Complement

  • The Rule: To negate a number, invert all bits (NOT) and add 1.
  • Why it wins: The CPU uses the same adder circuit for signed and unsigned integers. Subtraction is simply A+(B)A + (-B).
  • Example (4-bit): 5 (1011)+3 (0011)=2 (1110)-5 \text{ (1011)} + 3 \text{ (0011)} = -2 \text{ (1110)}

Bias (Offset) Encoding

Store value as: ValueStored=ValueActual+BiasValue_{Stored} = Value_{Actual} + Bias.

  • Purpose: Shifts the range so all stored bit patterns are non-negative.
  • Benefit: Allows for unsigned comparison of signed values. This is why it is used for exponents in IEEE 754—it makes sorting floating-point numbers faster.

Floating Point (IEEE 754)

Scientific Notation

Standard base-2: 1.xxxx×2exp1.xxxx \times 2^{exp}

  • The leading 1 is implicit (not stored) to maximize precision.

Single Precision (32-bit) Format

  • Sign (1 bit): 0=+,1=0 = +, 1 = -
  • Exponent (8 bits): Biased by 127127.
  • Significand (23 bits): The fractional part (mantissa).

Normalized Formula: Value=(1)Sign×(1+Significand)×2(Exponent127)Value = (-1)^{Sign} \times (1 + Significand) \times 2^{(Exponent - 127)}

Special Cases

CategoryExponentSignificandValue/Purpose
Zero0000 000000±0.0\pm 0.0
Denormal0000 0000Non-zeroUnderflow protection; No implicit 11
Infinity1111 111100±\pm \infty
NaN1111 1111Non-zeroNot a Number (e.g., 0/00/0)

Denormalized Formula: Used for values too small for the standard format. The exponent is fixed at 126-126. Value=(1)Sign×(0+Significand)×2126Value = (-1)^{Sign} \times (0 + Significand) \times 2^{-126}

Precision and Step Size

Step Size: The gap between consecutive floating-point numbers (ULP - Unit in the Last Place).

  • Normalized Step: 2(Exponent12723)2^{(Exponent - 127 - 23)}
  • Denormalized Step: 2(12623)2^{(-126 - 23)} (Constant gap)

Key Implications

  1. Relative Precision: Accuracy is high near zero and decreases as magnitude increases.
  2. Inexact Representation: Most decimal numbers (like 0.10.1) cannot be represented exactly in binary floating point.
  3. Absorption: If a number is large enough, adding 1.01.0 to it does nothing because the “step” is larger than 1.01.0.