MAX changelog
The MAX platform is a unified set of tools and libraries that unlock performance, programmability, and portability for your AI inference pipeline. It includes several products, including MAX Engine, MAX Serving, and the Mojo programming language.
This page describes all the changes in each version of the MAX platform.
To learn more about the platform, read What is MAX.
If you already have MAX, see how to update. If you don't have it yet, see the install guide.
v24.4 (2024-06-07)
🔥 Legendary
-
MAX is now available on macOS! Install now.
-
New quantization APIs for MAX Graph. You can now build high-performance graphs in Mojo that use the latest quantization techniques, enabling even faster performance and more system compatibility for large models.
Learn more in the guide to quantize your graph weights.
⭐️ New
MAX Mojo APIs
-
Added AI pipeline examples in the
max
repo, with Mojo implementations for common transformer layers, including quantization support.-
New Llama3 pipeline built with MAX Graph.
-
New Replit Code pipeline built with MAX Graph.
-
New TinyStories pipeline (based on TinyLlama) that offers a simple demo of the MAX Graph quantization API.
-
-
Added Mojo API inference example with the TorchScript BERT model.
-
Added
max.graph.checkpoint
package to save and load model weights.All weights are stored in a
TensorDict
. You can save and load aTensorDict
to disk withsave()
andload()
functions. -
Added MAX Graph quantization APIs:
- Added quantization encodings
BFloat16Encoding
,Q4_0Encoding
,Q4_KEncoding
, andQ6_KEncoding
. - Added the
QuantizationEncoding
trait so you can build custom quantization encodings. - Added
Graph.quantize()
to create a quantized tensor node. - Added
qmatmul()
to perform matrix-multiplication with a float32 and a quantized matrix.
- Added quantization encodings
-
Added some MAX Graph ops:
-
Added a
layer()
context manager andcurrent_layer()
function to aid in debugging during graph construction. For example:with graph.layer("foo"):
with graph.layer("bar"):
print(graph.current_layer()) # prints "foo.bar"
x = graph.constant[DType.int64](1)
graph.output(x)This adds a path
foo.bar
to the added nodes, which will be reported during errors. -
Added
format_system_stack()
function to format the stack trace, which we use to print better error messages fromerror()
. -
Added
TensorMap.keys()
to get all the tensor key names.
MAX C API
Miscellaneous new APIs:
M_cloneCompileConfig()
M_copyAsyncTensorMap()
M_tensorMapKeys()
andM_deleteTensorMapKeys()
M_setTorchLibraries()
🦋 Changed
MAX Mojo API
-
EngineNumpyView.data()
andEngineTensorView.data()
functions that return a type-erased pointer were renamed tounsafe_ptr()
. -
TensorMap
now conforms toCollectionElement
trait to be copyable and movable. -
custom_nv()
was removed, and its functionality moved intocustom()
as an function overload, so it can now output a list of tensor symbols.
For all the Mojo language and library changes in this release, see the Mojo changelog.