Define environment variables for input workers#

In a Wafer-Scale Cluster execution, the input pipeline runs on input worker nodes within the cluster, which are separate processes started by the Appliance on different CPU nodes. As such, any environment variables set on the user process are not seen by the input workers as they run on different nodes. But since input workers run user code (e.g., dataloader), it is desirable in some cases to have some environment variables set on user side to be visible by workers as well. This section describes how to set environment variables on the user node that are visible by the input workers that stream data to the system.

For this, we provide a drop-in replacement for os.environ with an identical interface. This object can be imported as follows:

>>> from cerebras_appliance.environment import appliance_environ

The main difference of this object compared to os.environ is that it tracks any environment variables set through it and sends them to the input workers. Setting an environment variable through appliance_environ effectively stores the key and when an execute job is created, values of these environment variables are read and sent to the input workers. The input workers then see these variables as if they were set through os.environ.

Note

This object does not track environment variables that are not set through it. For example, if you set an environment variable through os.environ, while this object will see the value, it will not transfer this variable to the input workers.

Note

These environment variables are sent to the Appliance when an execute job is requested. Setting variables after an execute job is created has no effect until the next execute job is created (if any).

Note

Environment variables set through appliance_environ are injected into the input workers after the worker process has been created. This means setting environment variables that need to exist before the python interpreter is started will not work. For example, setting PYTHONPATH will not have the intended effect. This object is mainly for user-level environment variables set within a run.

Here’s an example of how to use this object:

>>> # Let's start with a clean environment
>>> assert "foo" not in os.environ and "bar" not in app.environ

>>> # Import the appliance_environ object
>>> from cerebras_appliance.environment import appliance_environ

>>> # Now let's set env variable "foo" through appliance_environ.
>>> # When the input workers are created, they will see this updated
>>> # value.
>>> appliance_environ["foo"] = "1"
>>> assert appliance_environ["foo"] == "1"  # appliance_environ sees the updated value
>>> assert os.environ["foo"] == "1"  # os.environ also sees the updated value

>>> # Now let's set env variable "bar" through os.environ.
>>> # Since "bar" is set through "os.environ", when the input workers are
>>> # created, they won't have access to this key.
>>> os.environ["bar"] = "2"
>>> assert appliance_environ["bar"] == "2"  # appliance_environ sees the updated value

Kernel autogeneration with AutoGen

Import user-specific dependencies in Cerebras environment