Brain float

  • There is a trend in DL towards using FP16 instead of FP32 because lower precision calculations seem to be not critical for neural networks. Additional precision gives nothing, while being slower, takes more memory and reduces speed of communication.

  • This format is a truncated (16-bit) version of the 32-bit IEEE 754 single-precision floating-point format (binary32) with the intent of accelerating machine learning and near-sensor computing.

  • It preserves the approximate dynamic range of 32-bit floating-point numbers by retaining 8 exponent bits, but supports only an 8-bit precision rather than the 24-bit significand of the binary32 format.

  • Bfloat16 is used to reduce the storage requirements and increase the calculation speed of machine learning algorithms.

  • The bfloat16 format, being a truncated IEEE 754 FP32, allows for fast conversion to and from an IEEE 754 FP32. In conversion to the bfloat16 format, the exponent bits are preserved while the significand field can be reduced by truncation.

    Advantages of bf16

    Hence, the advantages of using BFLOAT16 are:

    • Reduced hardware multiplier size compared to FP16 and FP32.

    • Accelerated Matrix Multiplication performance.

    • Reduced memory requirements.