Sunday, 21 October 2012

Arithmetics For Computers (Number System & Operations) - Floating Point Arithmetic


In floating point arithmetic, addition and subtraction are more complex than multiplication and division. This is because of the need for alignment. There are four basic phases of the algorithm for addition and subtraction:
1.       Check for zeros
2.       Align the significant.
3.       Add or subtract the significant.
4.       Normalize the result.
Example:
X = 0.3 x 102 = 30
Y = 0.2 x 103 = 200
X + Y = (0.3 x 102-3 + 0.2) x 103 = 0.23 x 103 = 230
X - Y = (0.3 x 102-3 -0.2) x 103 = (-0.17) x 103 = -170
X x Y = (0.3 x 0.2) x 102-3 = 0.06 x 105 = 6000
X + Y = (0.3 x 0.2) x 102-3 = 1.5 x 10-1 = 0.15

Floating Point
l  We need a way to represent
-          Numbers with fractions, e.g. : 3.142
-          Very small numbers, e.g. : 0.0000001
-          Very large numbers, e.g. : 3.1428 x 109

l  Representation:
-          sign, exponent, significant : (–1)sign × significant × 2exponent
-          more bits for significant gives more accuracy
-           more bits for exponent increases range
l  IEEE 754 floating point standard
-          single precision: 8 bit exponent, 23 bit significant
-          double precision: 11 bit exponent, 52 bit significant

IEEE 754 floating point standard

l  Leading “1” bit of significant is implicit
l   Exponent is “biased” to make sorting easier
-            all 0s is smallest exponent all 1s is largest
-             bias of 127 for single precision and 1023 for double precision  
-            summary: (–1)sign × (1+significand) × 2exponent – bias
l  Example:  
-            decimal: -.75 = -3/4 = -3/22 
-            binary: -.11 = -1.1 x 2-1
-             floating point: exponent = 126 = 01111110  
-            IEEE single precision: 10111111010000000000000000000000





IEEE 754 Standard
Representation of floating point numbers in IEEE 754 standard:

Magnitude of numbers that can be represented is in the range:
2-126(1.0) to 2127(2-2-23)
This is approximately:
1.8 x 10-38 to 3.40 x 1038
Floating Point Complexities
l  In addition to overflow we can have “underflow” •
l  Accuracy can be a big problem
-            IEEE 754 keeps two extra bits, guard and round
-            four rounding modes
-            positive divided by zero yields “infinity”
-            zero divide by zero yields “not a number” 
-            other complexities
Floating Point Addition Example
e.g. : Add 9.999 x 101 and 1.610 x 10-1 assuming 4 decimal digits
1. Allign decimal point of number with smaller exponent
1.610 × 10-1= 0.161 × 100 = 0.0161 × 101
Shift smaller number to right
2. Add significant  
9.999 + 0.016 = 10.015 SUM = 10.015 × 101
NOTE: One digit of precision lost during shifting. Also sum is not normalized
3. Shift sum to put it in normalized form 1.0015 × 102
4. Since significant only has 4 digits, we need to round the sum
SUM = 1.002 × 102
NOTE: normalization maybe needed again after rounding,
e.g, rounding 9.9999 you get 10.000
Accurate Arithmetic – Guard & Round bits
l  IEEE 754 standard specifies the use of 2 extra bits on the right during intermediate calculations – Guard bit and Round bit 
l  Example: Add 2.56 × 100 and 2.34 × 102 assuming 3 significant digits and without guard and round bits
2.56 × 100 = 0.0256 × 102
2.34 x 0.02 = 2.36 × 102 
l  With guard and round bits
2.34 x 0.0256 = 2.3656 × 102
ROUND  2.37 × 100
Infinity arithmetic
Infinity arithmetic is treated as the limiting case of real arithmetic, with the infinity values given the following interpretation:

-∞ < (every finite number) < +∞

With the exception of the special cases discussed subsequently, any arithmetic operation involving infinity yields the obvious result.

For Example:
5 + (+) = +∞                  5 ÷ (+∞)            = +0
5 - (+) = -∞                    (+∞) + (+∞)     = +∞
5 + (-) = -∞                    (-∞) + (+∞)      = -∞
5 - (-) = +∞                    (-∞) - (+∞)       = -∞
5 x (+) = +∞                   (+∞) - (-∞)       = +∞

Quiet And Signaling NaNs
Table Operations that Produce a Quiet NaN





      Yu  Hong Sheng 
B031210099  

No comments:

Post a Comment