2. piw-master

The piw-master script is intended to be run on the database and file-server machine. It is recommended you do not run piw-slave on the same machine as the piw-master script. The database specified in the configuration must exist and have been configured with the piw-initdb script. It is recommended you run piw-master as an ordinary unprivileged user, although obviously it will need write access to the output directory.

2.1. Synopsis

piw-master [-h] [--version] [-c FILE] [-q] [-v] [-l FILE] [-d DSN]
           [--pypi-xmlrpc URL] [--pypi-simple URL] [-o PATH]
           [--index-queue ADDR] [--status-queue ADDR]
           [--control-queue ADDR] [--builds-queue ADDR]
           [--db-queue ADDR] [--fs-queue ADDR] [--slave-queue ADDR]
           [--file-queue ADDR] [--import-queue ADDR]

2.2. Description

-h, --help

show this help message and exit

--version

show program’s version number and exit

-c FILE, --configuration FILE

Specify a configuration file to load

-q, --quiet

produce less console output

-v, --verbose

produce more console output

-l FILE, --log-file FILE

log messages to the specified file

-d DSN, --dsn DSN

The database to use; this database must be configured with piw-initdb and the user should not be a PostgreSQL superuser (default: postgres:///piwheels)

--pypi-xmlrpc URL

The URL of the PyPI XML-RPC service (default: https://pypi.python.org/pypi)

--pypi-simple URL

The URL of the PyPI simple API (default: https://pypi.python.org/simple)

-o PATH, --output-path PATH

The path under which the website should be written; must be writable by the current user

--index-queue ADDR

The address of the IndexScribe queue (default: inproc://indexes)

--status-queue ADDR

The address of the queue used to report status to monitors (default: ipc:///tmp/piw-status)

--control-queue ADDR

The address of the queue a monitor can use to control the master (default: ipc:///tmp/piw-control)

--builds-queue ADDR

The address of the queue used to store pending builds (default: inproc://builds)

--db-queue ADDR

The address of the queue used to talk to the database server (default: inproc://db)

--fs-queue ADDR

The address of the queue used to talk to the file- system server (default: inproc://fs)

--slave-queue ADDR

The address of the queue used to talk to the build slaves (default: tcp://*:5555)

--file-queue ADDR

The address of the queue used to transfer files from slaves (default: tcp://*:5556)

--import-queue ADDR

The address of the queue used by piw-import (default: (ipc:///tmp/piw-import); this should always be an ipc address

2.3. Development

Although the piwheels master appears to be a monolithic script, it’s actually composed of numerous (often extremely simple) tasks. Each task runs its own thread and all communication between tasks takes place over ZeroMQ sockets. This is also how communication occurs between the master and the piw-slave, and the piw-monitor.

The following diagram roughly illustrates all the tasks in the system (including those of the build slaves and the monitor), along with details of the type of ZeroMQ socket used to communicate between them:

_images/master_arch.svg

It may be confusing that the file server and database server appear to be separate to the master in the diagram. This is deliberate as the system’s architecture is such that certain tasks can be easily broken off into entirely separate processes (potentially on separate machines), if required in future (either for performance or security reasons).

2.4. Tasks

The following sections document the tasks shown above (listed from the “front” at PyPI to the “back” at Users):

2.4.1. Cloud Gazer

Implemented in: piwheels.master.cloud_gazer.CloudGazer.

This task is the “front” of the system. It follows PyPI’s event log for new package and version registrations, and writes those entries to the database. It does this via The Oracle.

2.4.2. The Oracle

Implemented in: piwheels.master.the_oracle.TheOracle.

This task is the main interface to the database. It accepts requests from other tasks (“register this new package”, “log this build”, “what files were built with this package”, etc.) and executes them against the database. Because database requests are extremely variable in their execution time, there are actually several instances of the oracle which sit behind Seraph.

2.4.3. Seraph

Implemented in: piwheels.master.seraph.Seraph.

Seraph is a simple load-balancer for the various instances of The Oracle. This is the task that actually accepts database requests. It finds a free oracle and passes the request along, passing back the reply when it’s finished.

2.4.4. The Architect

Implemented in: piwheels.master.the_architect.TheArchitect.

This task is the final database related task in the master script. Unlike The Oracle it simply queries the database for the packages that need building. Whenever Slave Driver needs a task to hand to a build slave, it asks the Architect for one matching the build slave’s ABI.

2.4.5. Slave Driver

Implemented in: piwheels.master.slave_driver.SlaveDriver.

This task is the main coordinator of the build slave’s activities. When a build slave first comes online it introduces itself to this task (with information including the ABI it can build for), and asks for a package to build. As described above, this task asks The Architect for the next package matching the build slave’s ABI and passes this back.

Eventually the build slave will communicate whether or not the build succeeded, along with information about the build (log output, files generated, etc.). This task writes this information to the database via The Oracle. If the build was successful, it informs the File Juggler that it should expect a file transfer from the relevant build slave.

Finally, when all files from the build have been transferred, the Slave Driver informs the Index Scribe that the package’s index will need (re)writing.

2.4.6. Mr. Chase

Implemented in: piwheels.master.mr_chase.MrChase.

This task talks to piw-import and handles importing builds manually into the system. It is essentially a cut-down version of the Slave Driver with a correspondingly simpler protocol.

This task writes information to the database via The Oracle. If the imported build was successful, it informs the File Juggler that it should expect a file transfer from the importer.

Finally, when all files from the build have been transferred, it informs the Index Scribe that the package’s index will need (re)writing.

2.4.7. File Juggler

Implemented in: piwheels.master.file_juggler.FileJuggler.

This task handles file transfers from the build slaves to the master. Files are transferred in multiple (relatively small) chunks and are verified with the hash reported by the build slave (retrieved from the database via The Oracle).

2.4.8. Big Brother

Implemented in: piwheels.master.big_brother.BigBrother.

This task is a bit of a miscellaneous one. It sits around periodically generating statistics about the system as a whole (number of files, number of packages, number of successful builds, number of builds in the last hour, free disk space, etc.) and sends these off to the Index Scribe.

2.4.9. Index Scribe

Implemented in: piwheels.master.index_scribe.IndexScribe.

This task generates the web output for piwheels. It generates the home-page with statistics from Big Brother, the overall package index, and individual package file lists with messages from Slave Driver.

2.5. Queues

It should be noted that the diagram omits several queues for the sake of brevity. For instance, there is a simple PUSH/PULL control queue between the master’s “main” task and each sub-task which is used to relay control messages like PAUSE, RESUME, and QUIT.

Most of the protocols used by the queues are (currently) undocumented with the exception of those between the build slaves and the Slave Driver and File Juggler tasks (documented in the piw-slave chapter).

However, all protocols share a common basis: messages are lists of Python objects. The first element is always string containing the action. Further elements are parameters specific to the action. Messages are encoded with pickle. This is an untrusted format but was the quickest to get started with (and the inter-process queues aren’t exposed to the internet). A future version may switch to something slightly safer like JSON or better still CBOR.