Gems | nonoptimalrobot

These days the old standby of approximating trig functions with look-up tables is a questionable optimization. If your input is not spatially coherent or you are only making a single call you are bound to incur the cost of a cache miss. Fetching from memory or even pulling a line from the L2 into the L1 cache is slow while processors are fast, really fast. With this in mind the evaluation of smoothly varying functions is better optimized via curve fitting than look-up tables.

The following functions hardly deserve a write up so I’ll simply say: for the computationally lean and numerically sloppy…

//--------------------------------------------------------------
// InvCosFast(...)
//
// a_fX         cos(theta)
//
// Notes:       Return acos(theta) by interpolating along a
//              cubic spline with the following control points:
//
//              y0 = acos( x0 = -3/3 )
//              y1 = acos( x1 = -1/3 )
//              y2 = acos( x2 =  1/3 )
//              y3 = acos( x3 =  3/3 )
//
//              * Average error is 2% of pi.
//              * Maximum error is 6% of pi at a_fX = 0.9182.
//              * Speed up over calling acos() with a randomly
//                distributed input is around 4x.
//--------------------------------------------------------------

inline float InvCosFast(float a_fX)
{
    static const float a = -6.201964617e-001f;
    static const float b =  2.825264289e-008f;
    static const float c = -9.505999088e-001f;
    static const float d =  1.570796371e+000f;

    // Evaluate f(x) = a*x*x*x + b*x*x + c*x + d

    float fParam(a_fX);
    float fResult(d);

    fResult += c * fParam; fParam *= a_fX;
    fResult += b * fParam; fParam *= a_fX;
    fResult += a * fParam;

    return fResult;
} //------------------------------------------------------------

//--------------------------------------------------------------
// InvSinFast(...)
//
// a_fX         sin(theta)
//
// Notes:       Return asin(theta) by interpolating along a
//              cubic spline with the following control points:
//
//              y0 = asin( x0 = -3/3 )
//              y1 = asin( x1 = -1/3 )
//              y2 = asin( x2 =  1/3 )
//              y3 = asin( x3 =  3/3 )
//
//              * Average error is 2% of pi.
//              * Maximum error is 6% of pi at a_fX = 0.9182.
//              * Speed up over calling asin() with a randomly
//                distributed input is around 4x.
//--------------------------------------------------------------

inline float InvSinFast(float a_fX)
{
    static const float a =  6.201964617e-001f;
    static const float b =  5.274972015e-009f;
    static const float c =  9.506000280e-001f;
    static const float d = -5.274971571e-009f;

    // Evaluate f(x) = a*x*x*x + b*x*x + c*x + d

    float fParam(a_fX);
    float fResult(d);

    fResult += c * fParam; fParam *= a_fX;
    fResult += b * fParam; fParam *= a_fX;
    fResult += a * fParam;

    return fResult;
} //------------------------------------------------------------

Note that unlike acos() and asin() of the C++ standard library you do not have to range check the input to avoid getting NaN when passing in values that are infinitesimally outside the interval [-1, 1].

nonoptimalrobot

Musings by a Graphics Programmer

Category Archives: Gems

Simple Inverse Trig Functions