Define environment variables for input workers#
In a Wafer-Scale Cluster execution, the input pipeline runs on input worker nodes within the cluster, which are separate processes started by the Appliance on different CPU nodes. As such, any environment variables set on the user process are not seen by the input workers as they run on different nodes. But since input workers run user code (e.g., dataloader), it is desirable in some cases to have some environment variables set on user side to be visible by workers as well. This section describes how to set environment variables on the user node that are visible by the input workers that stream data to the system.
For this, we provide a drop-in replacement for os.environ
with an identical
interface. This object can be imported as follows:
>>> from cerebras_appliance.environment import appliance_environ
The main difference of this object compared to os.environ
is that it tracks
any environment variables set through it and sends them to the input
workers. Setting an environment variable through appliance_environ
effectively stores the key and when an execute job is created, values of these
environment variables are read and sent to the input workers. The input workers
then see these variables as if they were set through os.environ
.
Note
This object does not track environment variables that
are not set through it. For example, if you set an environment
variable through os.environ
, while this object will see the value,
it will not transfer this variable to the input workers.
Note
These environment variables are sent to the Appliance when an execute job is requested. Setting variables after an execute job is created has no effect until the next execute job is created (if any).
Note
Environment variables set through appliance_environ
are injected
into the input workers after the worker process has been created.
This means setting environment variables that need to exist before the
python interpreter is started will not work. For example, setting
PYTHONPATH
will not have the intended effect. This object is mainly
for user-level environment variables set within a run.
Here’s an example of how to use this object:
>>> # Let's start with a clean environment
>>> assert "foo" not in os.environ and "bar" not in app.environ
>>> # Import the appliance_environ object
>>> from cerebras_appliance.environment import appliance_environ
>>> # Now let's set env variable "foo" through appliance_environ.
>>> # When the input workers are created, they will see this updated
>>> # value.
>>> appliance_environ["foo"] = "1"
>>> assert appliance_environ["foo"] == "1" # appliance_environ sees the updated value
>>> assert os.environ["foo"] == "1" # os.environ also sees the updated value
>>> # Now let's set env variable "bar" through os.environ.
>>> # Since "bar" is set through "os.environ", when the input workers are
>>> # created, they won't have access to this key.
>>> os.environ["bar"] = "2"
>>> assert appliance_environ["bar"] == "2" # appliance_environ sees the updated value