O.S.K. BLOG: Arithmetics For Computers (Number System & Operations)

In floating point arithmetic, addition and subtraction are more complex than multiplication and division. This is because of the need for alignment. There are four basic phases of the algorithm for addition and subtraction:

1. Check for zeros

2. Align the significant.

3. Add or subtract the significant.

4. Normalize the result.

Example:

X = 0.3 x 10² = 30

Y = 0.2 x 10³ = 200

X + Y = (0.3 x 10^2-3 + 0.2) x 10³ = 0.23 x 10³ = 230

X - Y = (0.3 x 10^2-3 -0.2) x 10³ = (-0.17) x 10³ = -170

X x Y = (0.3 x 0.2) x 10^2-3 = 0.06 x 10⁵ = 6000

X + Y = (0.3 x 0.2) x 10^2-3 = 1.5 x 10^-1 = 0.15

Floating Point

l We need a way to represent

- Numbers with fractions, e.g. : 3.142

- Very small numbers, e.g. : 0.0000001

- Very large numbers, e.g. : 3.1428 x 10⁹

l Representation:

- sign, exponent, significant : (–1)^sign × significant × 2^exponent

- more bits for significant gives more accuracy

- more bits for exponent increases range

l IEEE 754 floating point standard

- single precision: 8 bit exponent, 23 bit significant

- double precision: 11 bit exponent, 52 bit significant

IEEE 754 floating point standard

l Leading “1” bit of significant is implicit

l Exponent is “biased” to make sorting easier

- all 0s is smallest exponent all 1s is largest

- bias of 127 for single precision and 1023 for double precision

- summary: (–1)^sign × (1+significand) × 2^{exponent – bias}

l Example:

- decimal: -.75 = -3/4 = -3/22

- binary: -.11 = -1.1 x 2-1

- floating point: exponent = 126 = 01111110

- IEEE single precision: 10111111010000000000000000000000

IEEE 754 Standard

Representation of floating point numbers in IEEE 754 standard:

Magnitude of numbers that can be represented is in the range:

2^-126(1.0) to 2₁₂₇(2-2^-23)

This is approximately:

1.8 x 10^-38 to 3.40 x 10³⁸

Floating Point Complexities

l In addition to overflow we can have “underflow” •

l Accuracy can be a big problem

- IEEE 754 keeps two extra bits, guard and round

- four rounding modes

- positive divided by zero yields “infinity”

- zero divide by zero yields “not a number”

- other complexities

Floating Point Addition Example

e.g. : Add 9.999 x 101 and 1.610 x 10-1 assuming 4 decimal digits

1. Allign decimal point of number with smaller exponent

1.610 × 10^-1= 0.161 × 10⁰ = 0.0161 × 10¹

Shift smaller number to right

2. Add significant

9.999 + 0.016 = 10.015 → SUM = 10.015 × 10¹

NOTE: One digit of precision lost during shifting. Also sum is not normalized

3. Shift sum to put it in normalized form 1.0015 × 10²

4. Since significant only has 4 digits, we need to round the sum

SUM = 1.002 × 10²

NOTE: normalization maybe needed again after rounding,

e.g, rounding 9.9999 you get 10.000

Accurate Arithmetic – Guard & Round bits •

l IEEE 754 standard specifies the use of 2 extra bits on the right during intermediate calculations – Guard bit and Round bit

l Example: Add 2.56 × 10⁰ and 2.34 × 10² assuming 3 significant digits and without guard and round bits

2.56 × 100 = 0.0256 × 10²

2.34 x 0.02 = 2.36 × 10²

l With guard and round bits

2.34 x 0.0256 = 2.3656 × 10²

ROUND → 2.37 × 10⁰

Infinity arithmetic

Infinity arithmetic is treated as the limiting case of real arithmetic, with the infinity values given the following interpretation:

-∞ < (every finite number) < +∞

With the exception of the special cases discussed subsequently, any arithmetic operation involving infinity yields the obvious result.

For Example:

5 + (+∞) = +∞ 5 ÷ (+∞) = +0

5 - (+∞) = -∞ (+∞) + (+∞) = +∞

5 + (-∞) = -∞ (-∞) + (+∞) = -∞

5 - (-∞) = +∞ (-∞) - (+∞) = -∞

5 x (+∞) = +∞ (+∞) - (-∞) = +∞

Quiet And Signaling NaNs

Table Operations that Produce a Quiet NaN

Yu Hong Sheng
B031210099

O.S.K. BLOG

Sunday, 21 October 2012

Arithmetics For Computers (Number System & Operations) - Floating Point Arithmetic

No comments:

Post a Comment