Roadmap & known issues

Although MAX is not ready for production deployment (the MAX SDK is a preview), it's a significant step forward in our mission to remove complexity in the AI infrastructure stack and accelerate the pace of AI innovation. It provides significant performance improvements for models in multiple formats on a wide range of hardware. That said, we're also aware that is has some rough edges and missing features.

On this page, we'll list all the features we plan to deliver in the near future, and describe all the notable issues that we know of and plan to fix.

MAX priorities

The MAX platform offers significant value for a variety of AI inferencing workloads, but there's a lot more we plan to deliver in the months ahead. Here are some of the things we're working on.

Support for macOS and Windows. Currently, the MAX SDK is available for Ubuntu Linux only (see the requirements).
A commercial license for MAX. Currently, the MAX SDK is for local development only, but a future update will allow you to deploy the same MAX components in a production environment. This will include support for production services such as EC2, EKS, and SageMaker on AWS.
Support for GPUs. At ModCon we announced our partnership with NVIDIA to bring NVIDIA GPUs (A100s, H100s, etc.) to the MAX platform.
Support for additional quantized models. The quantization feature-set is is broad, but we will start with the most common low-precision data types such as Bfloat16 and Int8.
Support for more CPUs. Today, we support a wide variety of CPU architectures, but we will continue to add support for additional architectures and newer generations of CPUs, to maximize performance with the latest hardware features.
Support for PyTorch 2.0 torch.compile(). Currently, PyTorch models must be converted to TorchScript or ONNX format for execution in MAX Engine.
And much more that we're not prepared to announce yet... 😁

Also check out the Mojo roadmap & sharp edges.

Known issues

We're aware of the following issues and are working to resolve them.

Error: "cannot allocate memory in static TLS block"

When executing a model with the MAX Engine Python API, you might encounter an error that says, cannot allocate memory in static TLS block. This happens due to a bug that stems from the order in which Python modules are loaded, and it affects specific targets including aarch64 on systems with a glibc 2.31 or lower, such as Ubuntu 20.04.

When encountering this issue please try the following workarounds:

Re-order the Python import statements so that import max.engine appears first (before torch and transformers).

Or, LD_PRELOAD the shared library that fails to allocate memory in a static TLS block. For example, if you see this error message:

/usr/local/lib/python3.8/dist-packages/max/lib/libgomp-9c79e370.so.1: cannot allocate memory in static TLS block

Then, re-run the command prefixed with:

LD_PRELOAD=/usr/local/lib/python3.8/dist-packages/max/lib/libgomp-9c79e370.so.1

Both workarounds ensure that the MAX Engine library has access to static TLS block memory before it is all used up by the other modules, which may not require static TLS but still use the surplus static TLS as an optimization.

Glibc 2.32 and newer reserve 128 bytes of surplus static TLS for modules that require it (more detail), so this should not be a problem on systems with glibc >= 2.32, such as Ubuntu 22.04.

MAX Serving inputs/outputs must be tensors

MAX Serving does not support model inputs/outputs as a dictionary or any other collection type; you must use tensors.

When converting a PyTorch model to TorchScript format, you must disable dictionary formats in the model configuration. For example:

model = AutoModelForSequenceClassification.from_pretrained(HF_MODEL_NAME)
model.config.return_dict = False

For more detail, see this example for PyTorch BERT.

MAX Engine can't load multiple model formats

MAX Engine does not allow you to load models with different formats in the same inference session or server instance. For example, you can't load one model from PyTorch and then another one from ONNX. Doing so results in a failed to load error.

Currently, if you want to load a different model format, you must restart the process with MAX Engine or restart MAX Serving (the Triton server).

MAX Engine Mojo API `symbol lookup error`

Importing the Python torch packages—or other libraries that transitively import them (such as transformers)—does not interoperate with the max.engine Mojo library. Importing these together might result in a symbol resolution error message that starts with mojo: symbol lookup error.

This is a temporary issue that will be fixed, and it applies only to the Mojo API for MAX Engine—you can safely use these Python packages with the Python API for MAX Engine.

Compiler fails for PyTorch models with custom ops

MAX Engine currently doesn't support PyTorch models that include custom ops.

If you load a PyTorch model (either TorchScript or ONNX format) that includes user-defined custom ops or other ops from domain libraries such as torchvision or torchaudio, the compiler will fail.

If it's a TorchScript model, you'll see a generic message that says, Error compiling model: unable to convert PyTorch model—beware that this is the same message that appears if you try to load a PyTorch model in a format other than TorchScript.

If it's an ONNX model, you might see a segmentation fault message with a stack dump.

We're working on this at both ends: We're creating better error messages that are more helpful, and building support for PyTorch custom ops (and the ability for you to write your own custom ops in Mojo).

Mojo JIT session error

You might encounter certain code configurations that result in a JIT session error, which happens when the Mojo JIT compiler fails to find a specific symbol. We've seen this happen recently when using the MAX Engine API Mojo API. In some cases, you can workaround it with a little luck by simply rearranging the code and moving some of it to a separate function.

We're making significant changes to the way that Mojo generates code, and this is one of the known JIT issues that we're working on.

MAX Engine custom ops are not available for PyTorch models yet

As we announced at ModCon, we're excited to deliver extensibility that allows you to implement custom ops written in Mojo, which the compiler can natively analyze, optimize, and fuse into the graph. It's just not ready yet.

In v24.3, we released the first version of custom ops for ONNX and MAX Graph models, but support for PyTorch models is still in progress. We're also still working on some of the optimizations in the compiler, and fusing for custom ops is not available yet.

Python interop with Mojo APIs has issues

When calling Python libraries from MAX Engine Mojo APIs, you must first set the MOJO_PYTHON_LIBRARY environment variable to use the included Python library:

export MOJO_PYTHON_LIBRARY=$(modular config mojo-max.python_lib)

In a future release, this will set automatically.

However, we've discovered a separate issue on Ubuntu 20.04 in which this environment variable is not effective. For example, if you are on Ubuntu 20.04 and try running our ONNX custom op example, you might get the following error due to a bug with the Python interop feature (this custom op example uses Python for the op implementation):

free(): invalid pointer
Aborted (core dumped)

We're investigating this issue now.

MAX Graph does not support empty graphs

Currently, graphs that directly return their inputs may return incorrect values on those returns.

MAX priorities​

Known issues​

Error: "cannot allocate memory in static TLS block"​

MAX Serving inputs/outputs must be tensors​

MAX Engine can't load multiple model formats​

MAX Engine Mojo API symbol lookup error​

Compiler fails for PyTorch models with custom ops​

Mojo JIT session error​

MAX Engine custom ops are not available for PyTorch models yet​

Python interop with Mojo APIs has issues​

MAX Graph does not support empty graphs​