You are here

Distributed Rendering with Ranimate

Distributed Rendering with Ranimate

Using Ranimate to distribute Single Frame Rendering Processes in an Openmosix Cluster

Dipl.-Ing. Lars O. Grobe

Copyright © 2006-2011 Dipl.-Ing. Lars O. Grobe

Table of Contents

Why to use Control Programs with Radiance
Alternatives to Ranimate
Ranimate - A Control Program for Rendering of Frame Sequences
Openmosix - A Virtual Multiprocessor System
Ranimate in an Openmosix environment
Troubleshooting

Why to use Control Programs with Radiance

Radiance is not a program, but a set of tools, maybe a toolchain, allowing to adapt the simulation software to all kinds of tasks. Each of these tools can be used as a stand-alone program, and most of them can be linked by pipes, a well-known practice in unix-like environments. So in fact, it is possible to pass the whole simulation process by starting programs, linking them if necessary by pipes or temporary files, and keeping an eye of necessary ressources and generated results as well as on options.

Still, everyone who has run some simulations with Radiance experiences that while there are repetive tasks one really doesn't want to bother with again and again, some settings and techniques may be crucial for the simulation and are worth special attention. So lots of people use simple, some even more advanced scripting techniques to get rid of the repitive work, allowing to concentrate on the parameters that might need some finetuning. Most of us will start their raytracing processes with a set of defaults, some of them calculated from scene properties such as size, illumination characteristics and needed accuracy, and correct some parameters later when the first results have been analyzed.

At this point, control programs that set default parameters for our scene (maybe even by analyzing some of its characteristics), keep track of changes in the input files and reading all the different parameters, file names and views from a central ressource become an extremely convinient tool. Instead of worrying about the default parameters, searching the file where we defined our views or wondering which material libraries we had included to produce that nice rendering last friday, we just set as many options as necessary, write the file names and start the simulation.

Alternatives to Ranimate

The most common control program for Radiance simulations is rad. It takes one configuration file, that by convention has the ending .rif, analyzes the scene, sets some defaults parameters according to the scene, overrides these if defined in the configuration, and than renders all the defined views. In fact, the configuration files used by rad collect all necessary information to render single-frame pictures.

trad is a simple graphical user interface (gui) for rad, written in tcl/tk and appearing a bit outdated in our days. Still it allows to set the options in a rad configuration file from a gui, gives help on the parameters and allows to start the simulations as well as to view results and the process messages.

Rayfront is a very powerful environment for simulations, including a front end that not only controls the simulation processes as e.g. rad does, but includes the import from CAD files (dxf), material assignment and light source definitions. While most control programs handle the simulation processes, Rayfront knows a lot more about the actual content, it is not only meant to change the paramters of the simulation but in a certain extent the model. As an option, it can be used as an integrated interface to Radiance from within Autocad.

ranimove was developed as the sucessor of ranimate, still not replacing it. ranimove can be used to control the calculation of series of pictures (frames) as ranimate, but adds functionality to allow animated objects in the scene. ranimove won't be further described here, as we concentrate on single images that are not to be linked to a movie.

Ranimate - A Control Program for Rendering of Frame Sequences

ranimate is part of the Radiance distribution and most commonly used to render the frames of an animation. It takes a rad control file (.rif) and a second configuration file (usually ending .ran). The rad control file determines how the frames are rendered, so and it provides us with the convenience we are used from rad. So instead of setting all kinds of indirect, direct and output control parameters, we give the scene, some values describing the input as well as the desired output, and let rad do the work. But, in this case, we won't call rad - instead, ranimate will use the rad configuration. In addition to rad, ranimate will consider some advanced problems when it comes to rendering of large numbers of pictures, such as available disk space (and it won't just complain but optimize its work if disk space is getting a limiting factor), processing ressources, and how to call processes. There is also functionality specific to animations, but this is not in the scope of this document, so it will be just noted that you can set up time-dependent cameras, interpolate between frames of an animation and let ranimate do some work for creating an animation.

As ranimate is able to control the generation of multiple views of a scene, it is a nice tool to control the generation of still pictures of a scene as well as to create movies. We will use ranimate just to start more than one rendering, allowing parallel calculation on multiprocessor systems, control the ressources and do some basic error handling.

For our example, we will take the following as given: A complete scene, compiled into an frozen octree (using oconv -f); a view file describing the views we want to render (as written by rvu), a bunch of machines running Linux with an Openmosix-patched kernel and a recent Radiance distribution

The first is to write a rad control file, just as we are used to do for serial rendering (serial instead of parallel processing as we want to achieve here). The setup for rad includes some basic parameters we want to set: The octree name (we won't want rad to update the octree, as we have our model complete and want to get just pictures without any changes to the scene), the dimensions, some settings for the desired output as well as settings influencing the rendering quality (and time, beware!). An example might look like this:

OCTREE= scene.oct
ZONE= Interior -10 10 -10 10 0 4
RESOLUTION= 1024
QUALITY= HIGH
INDIRECT= 1
AMBFILE= scene.amb
VARIABILITY= MEDIUM

To understand the exact meaning of the settings above, consult the manpage of rad and take a look at the literature available for Radiance. In fact, these are really the most basic settings and should be understood by everyone using Radiance. Call the file scene.rif and place it into the same directory as the octree of your scene (called scene.oct).

While writing a control file for rad is everyday work for Radiance users, most of them might have never written a control file for ranimate. Think of it as an extension of rad, the style of the configuration file is the same, and even some keywords can be defined in either of the two. We will call our ranimate control file scene.ran, and write the following:

DIRECTORY= results
DISKSPACE= 512
RIF= scene.rif
VIEWFILE= scene.vf
RSH= bash
host= localhost

Again, read the manpage, this time for ranimate. The DIRECTORY should exist as a subdirectory, it will be used by ranimate to write the results as well as to store temporary data such as information about the process or options as passed to rpict. The DISKSPACE setting allows to set an amount of diskspace, Radiance can optimize the rendering process in some cases to fit into the space set here. Still, if you have sufficient space, do not restrict the rendering process and use a high setting here. RIF points to our rad control file, as VIEWFILE is the name of the viewfile that we created e.g. from rvu. RSH is a reminesence to rsh, an old-fashioned remote shell, used to start processes on other computers in the network. In fact, you can define any shell that may be available, from ssh to start the process on another machine (remember that starting a process there makes sense only if all the required data is available on the remote machine) to bash, which will simple a process by the Linux standard shell on localhost. If you checked again that all required files are present and your setup of Radiance is working, you can start ranimate now to get all views in your viewfile rendered:

ranimate scene.ran

This should work as expected, rendering all your views serially. So no parallel rendering so far, but don't be disappointed, we will set up your super-computer right now...

Openmosix - A Virtual Multiprocessor System

Linux has been a popular system to be used for distributed computing for years. There are rather different approaches, ranging from applications that are linked against special libraries to allow parallel execution (e.g. mpi, pvm), simple remote execution with optionally shared data (e.g. the classical approach in Radiance: starting processes with rsh and sharing ambient and scene data by nfs), or, recently becoming more and more popular, virtual multi-processor systems based on clusters of nodes connected in a local network. There are several systems to create such clusters, an we will describe Openmosix here, which is the open-source project forked from Mosix. Another interesting approach is OpenSSI, still I do not have any experiences to share about this.

To set up your Openmosix cluster, consult the documentation available from the project. The basic steps in most cases will include the installation of a patched Linux-Kernel on your system, the creation of boot-disks (e.g.CD-ROMs) for nodes if they are not permanently used as render nodes, and the connection of these by a reliable network. It is important that you use the same version of both the Linux kernel and the Openmosix patch on all machines in your cluster. If you want to use an existing boot-CD distribution, this will determine which kernel to install on your master node. Still, e.g. with Plump-OS, it is really easy to build your custom boot-CD. Another problem may be memory on cd-booted nodes, if you do not have a swap partition available. So take care that both the processes you want to distribute and the system on the nodes will have enough RAM, or create a swap partition even if you plan to boot from CD. Take a look at the available commands to control a cluster, like mtop, the cluster-aware top, and omps, to get information on processes running in the cluster, and especially the /proc-interface, where you can find information about the status of the cluster.

Openmosix (other than OpenSSI) does not allow the use of shared memory as the -PP option of rpict requires. So we will have every rpict to load the scene into memory, but as the processes can migrate to other nodes, you will not have all processes taking the memory on the master as long as they are not started all at the same time. You should also make sure that your processes won't try to migrate back to the master all at once, as this might result in a lack of ressources and thus killed processes! For large scenes, it may be a good idea to start the rendering processes delayed, so that there is enough time for them to migrate.

Ranimate in an Openmosix environment

The native process distribution mechanism of ranimate is the use of a remote shell to start rpict processes and a shared file-system to share the scene and the cached results of the indirect illumination's calculation. This is a completely generic mechanism that will work even in heterogenous networks of unix-like machines, as long as incompatibilities (the nfs lock manager is a famous example) are not affecting the coordination of the processes. If you are really in a hurry and want to distribute your renderings over all available hardware, ranging from your Alpha number-cruncher to the Sun application server and the Linux desktop, this is the way to go... Just use as many hostlines in your ranimate control file as you have machines, make them point to the host names and optionally add the number of processors available on the host to the end of the line. If you are connected to a public network such as the internet, do not use rsh but ssh and set up key-based authentication.

Still, it means you have to become some kind of a network administrator, and as caring about nfs and common logins can become a tedious task, as well as there have been some scalability issues found with some implementations of nfs, many people would like to ignore all the network and use the machines like one large, multi-processor computer. This is what Openmosix allows us to do, so in fact we can start all processes on the local computer with Openmosix and let the system distribute the processes. Unfortunaly, Openmosix does not provide you with a SMP system.

If we take a look at our ranimate control file, we see that for the host-line, ranimate will call the shell specified in the RSH-line to start the process. If we add a number behind the hostname, ranimate will start that number of processes on the host. So if we have two processors in the machine, we may want to change the host-line as follows:

host= localhost 2

Still, if we have two machines in an Openmosix cluster, having one processor each, we cannot just use this line. While two rpict processes would be started, they would not be distributed. The reason is that on multi-processor systems, rpict will be started with the -PP option, keeping the scene in shared memory accessed by both processes. This is possible on most multi-processor systems, which are also called shared memory processing (SMP) systems, but Openmosix does not allow shared memory. So instead of using the number behind the hostname, we will have to make ranimate start several independant processes on our machine, that can migrate to other hosts after start-up. To do so, we do something incredible simple - we add more host-lines, each pointing to localhost, where the process will be started. In our cluster with three nodes, we will have these lines:

host= localhost
host= localhost
host= localhost

Now, ranimate will start three rendering processes on localhost, but these are all independant and do not share memory. Still, the cached ambient data in the file scene.amb can be accessed by all processes, but instead of using nfs for the distribution of the data, all processes will see this shared file as on a local file-system. To watch the rpict-processes, we can use the top-equivalent in Openmosix-clusters called mtop, or see a graph showing the system load with mosmon.

Troubleshooting

The bad news first. If you want your cluster to appear like a single system, network stability is a must. Do not expect an efficient cluster if you use unrelieable hardware, a broken network setup or unsupported configurations such as different versions of the Linux kernel in one cluster. Also, all the nice and fast processors in your network can't be used without memory. If your super-fast number-crunching node has just 512MB RAM installed, but your rendering process, started on your poor slow laptop, takes 640MB memory, it will stay on your laptop. Memory is getting especially tricky if you use diskless nodes (or nodes that have disks, but do not use them, such as CD-booted systems). We are used to have our systems swap if we do not have enough memory installed, but on a system without swap-partition, the system cannot access this virtual memory and will kill processes instead.

The good news is that ranimate uses the features for recovery provided by Radiance as well as it does some error checking. If your rendering processes broke and you simply invoke ranimate, it finds out that there has been a problem and will try to recover the results of the process, than rendering the unfinished pictures frame by frame. As the image files contain info about rendering parameters (try getinfo on the unfinished frame), rpict can continue rendering a frame taking views options from the image file. But if you are in a hurry, you will not want to wait for your hundred pictures being all rendered serially on your local host while your cluster is doing nothing. To accelerate the process, we will manually start the rpict processes, but as we are experienced users, meaning we have done enough stupid mistakes in our past with Radiance, we will first make a back-up of the unfinished frames! It is always worth to take a copy of your rendering results, as Radiance will usually be able to reuse what it has calculated once. We use tar, and you should be familar with the syntax:

tar cvf ~/unfinished.tar results/*.unf

Now let's see what we can find out about the processes we had started. Take a look at the content of the directory result, where Ranimate was writing all its output before the process broke. You will find a file STATUS, showing the progress of the rendering task, as well as a very interesting file called render.opt. The content of this file is actually what you need, to make sure that the new process continues with the same settings as the image had been started. Go the the directory from which you had started the processes, and start rpict with its recovery option and the file results/render.opt as option file:

rpict @results/render.opt -w0 -ro results/frame001.unf scene.oct

This will start rpict and make it continue where the previous process had been stopped. You can start as many processes now as you have processors available, Openmosix will distribute them over your cluster. You may want to send the process to the background, or use nohup so that you can close your shell and have the process still be working. See the manpage of nohup for details. Of course, if the cause of the process interruption is still present, rpict won't work relieable, but if the set-up of your cluster as well as of your scene is clean, you can get the renderings completed now. However, using multiple processes with ranimate is not possible when recovering at the moment, and ranimate will always take the safe path after an error.