Distributed Rendering with Ranimate
Using Ranimate to distribute Single Frame Rendering Processes in an Openmosix Cluster
Dipl.-Ing. Lars O. Grobe
Copyright © 2006-2011 Dipl.-Ing. Lars O. Grobe
Table of Contents
Why to use Control Programs with Radiance
Alternatives to Ranimate
Ranimate - A Control Program for Rendering of Frame Sequences
Openmosix - A Virtual Multiprocessor System
Ranimate in an Openmosix environment
Troubleshooting
Why to use Control Programs with Radiance
Radiance is not a program, but a set of tools, maybe a toolchain, allowing to adapt the simulation software to all kinds of tasks. Each of these tools can be used as a stand-alone program, and most of them can be linked by pipes, a well-known practice in unix-like environments. So in fact, it is possible to pass the whole simulation process by starting programs, linking them if necessary by pipes or temporary files, and keeping an eye of necessary ressources and generated results as well as on options.
Still, everyone who has run some simulations with Radiance experiences that while there are repetive tasks one really doesn't want to bother with again and again, some settings and techniques may be crucial for the simulation and are worth special attention. So lots of people use simple, some even more advanced scripting techniques to get rid of the repitive work, allowing to concentrate on the parameters that might need some finetuning. Most of us will start their raytracing processes with a set of defaults, some of them calculated from scene properties such as size, illumination characteristics and needed accuracy, and correct some parameters later when the first results have been analyzed.
At this point, control programs that set default parameters for our scene (maybe even by analyzing some of its characteristics), keep track of changes in the input files and reading all the different parameters, file names and views from a central ressource become an extremely convinient tool. Instead of worrying about the default parameters, searching the file where we defined our views or wondering which material libraries we had included to produce that nice rendering last friday, we just set as many options as necessary, write the file names and start the simulation.
Alternatives to Ranimate
The most common control program for Radiance simulations is
rad
. It takes one configuration file, that by convention
has the ending .rif
, analyzes the scene, sets some defaults
parameters according to the scene, overrides these if defined in the
configuration, and than renders all the defined views. In fact, the configuration
files used by rad
collect all necessary information to render
single-frame pictures.
trad
is a simple graphical user interface (gui) for rad
,
written in tcl/tk and appearing a bit outdated in our days. Still it allows to set
the options in a rad
configuration file from a gui, gives help on the
parameters and allows to start the simulations as well as to view
results and the process messages.
Rayfront is a very powerful environment for simulations, including a
front end that not only controls the simulation processes as e.g. rad
does, but includes the import from CAD files (dxf), material assignment
and light source definitions. While most control programs handle the
simulation processes, Rayfront knows a lot more about the actual
content, it is not only meant to change the paramters of the simulation
but in a certain extent the model. As an option, it can be used as an
integrated interface to Radiance from within Autocad.
ranimove
was developed as the sucessor of ranimate
,
still not replacing it. ranimove
can be used to control the calculation of
series of pictures (frames) as ranimate
, but adds functionality to allow
animated objects in the scene. ranimove
won't be further described here,
as we concentrate on single images that are not to be linked to a movie.
Ranimate - A Control Program for Rendering of Frame Sequences
ranimate
is part of the Radiance distribution and most
commonly used to render the frames of an animation. It takes a rad
control file (.rif
) and a second configuration file (usually ending .ran
).
The rad
control file determines how the frames are rendered, so and it provides
us with the convenience we are used from rad
. So instead of setting
all kinds of indirect, direct and output control parameters, we give the scene, some
values describing the input as well as the desired output, and let rad
do the work. But, in this case, we won't call rad
- instead, ranimate
will use the rad
configuration. In addition to rad
,
ranimate
will consider some advanced problems when it comes to rendering
of large numbers of pictures, such as available disk space (and it won't just
complain but optimize its work if disk space is getting a limiting factor), processing ressources,
and how to call processes. There is also functionality specific to animations, but this is not in
the scope of this document, so it will be just noted that you can set up
time-dependent cameras, interpolate between frames of an animation and
let ranimate
do some work for creating an animation.
As ranimate
is able to control the generation of multiple views of a
scene, it is a nice tool to control the generation of still pictures of a
scene as well as to create movies. We will use ranimate
just to start
more than one rendering, allowing parallel calculation on multiprocessor
systems, control the ressources and do some basic error handling.
For our example, we will take the following as given: A complete
scene, compiled into an frozen octree (using oconv -f
); a view file
describing the views we want to render (as written by rvu
), a bunch of
machines running Linux with an Openmosix-patched kernel and a recent
Radiance distribution
The first is to write a rad
control file, just as we are used to do
for serial rendering (serial instead of parallel processing as we want
to achieve here). The setup for rad
includes some basic parameters
we want to set: The octree name (we won't want rad
to update the
octree, as we have our model complete and want to get just pictures without any
changes to the scene), the dimensions, some settings for the desired
output as well as settings influencing the rendering quality (and time,
beware!). An example might look like this:
OCTREE= scene.oct
ZONE= Interior -10 10 -10 10 0 4
RESOLUTION= 1024
QUALITY= HIGH
INDIRECT= 1
AMBFILE= scene.amb
VARIABILITY= MEDIUM
To understand the exact meaning of the settings above, consult the
manpage of rad
and take a look at the literature available for
Radiance. In fact, these are really the most basic settings and should be
understood by everyone using Radiance. Call the file scene.rif
and place it into the same directory as the octree of your scene (called scene.oct
).
While writing a control file for rad
is everyday work for Radiance
users, most of them might have never written a control file for ranimate
.
Think of it as an extension of rad
, the style of the configuration file is
the same, and even some keywords can be defined in either of the two. We will call our
ranimate
control file scene.ran
, and write the following:
DIRECTORY= results
DISKSPACE= 512
RIF= scene.rif
VIEWFILE= scene.vf
RSH= bash
host= localhost
Again, read the manpage, this time for ranimate
. The DIRECTORY
should exist as a subdirectory, it will be used by ranimate
to write the
results as well as to store temporary data such as information about the
process or options as passed to rpict
. The DISKSPACE
setting allows to set an amount of diskspace, Radiance can optimize the rendering process
in some cases to fit into the space set here. Still, if you have sufficient space, do not
restrict the rendering process and use a high setting here. RIF
points to our
rad
control file, as VIEWFILE
is the name of the viewfile that
we created e.g. from rvu
. RSH
is a reminesence
to rsh
, an old-fashioned remote shell, used to start processes on other
computers in the network. In fact, you can define any shell that may be
available, from ssh
to start the process on another machine (remember
that starting a process there makes sense only if all the required data
is available on the remote machine) to bash
, which will simple a process
by the Linux standard shell on localhost
. If you checked again that all
required files are present and your setup of Radiance is working, you
can start ranimate
now to get all views in your viewfile rendered:
ranimate scene.ran
This should work as expected, rendering all your views serially. So no parallel rendering so far, but don't be disappointed, we will set up your super-computer right now...
Openmosix - A Virtual Multiprocessor System
Linux has been a popular system to be used for distributed computing
for years. There are rather different approaches, ranging from
applications that are linked against special libraries to allow parallel
execution (e.g. mpi, pvm), simple remote execution with optionally
shared data (e.g. the classical approach in Radiance: starting processes
with rsh
and sharing ambient and scene data by nfs), or, recently
becoming more and more popular, virtual multi-processor systems based on
clusters of nodes connected in a local network. There are several
systems to create such clusters, an we will describe Openmosix here,
which is the open-source project forked from Mosix. Another interesting
approach is OpenSSI, still I do not have any experiences to share about
this.
To set up your Openmosix cluster, consult the documentation available
from the project. The basic steps in most cases will include the
installation of a patched Linux-Kernel on your system, the creation of
boot-disks (e.g.CD-ROMs) for nodes if they are not permanently used as
render nodes, and the connection of these by a reliable network. It is
important that you use the same version of both the Linux kernel and the
Openmosix patch on all machines in your cluster. If you want to use an
existing boot-CD distribution, this will determine which kernel to
install on your master node. Still, e.g. with Plump-OS, it is really
easy to build your custom boot-CD. Another problem may be memory on
cd-booted nodes, if you do not have a swap partition available. So take
care that both the processes you want to distribute and the system on
the nodes will have enough RAM, or create a swap partition even if you
plan to boot from CD. Take a look at the available commands to control a
cluster, like mtop
, the cluster-aware top
,
and omps
, to get information on processes running in the cluster,
and especially the /proc
-interface, where you can find information
about the status of the cluster.
Openmosix (other than OpenSSI) does not allow the use of shared
memory as the -PP
option of rpict
requires. So we
will have every rpict
to load the scene into memory, but as the
processes can migrate to other nodes, you will not have all processes taking the
memory on the master as long as they are not started all at the same time. You
should also make sure that your processes won't try to migrate back to the master
all at once, as this might result in a lack of ressources and thus killed processes! For
large scenes, it may be a good idea to start the rendering processes delayed, so that
there is enough time for them to migrate.
Ranimate in an Openmosix environment
The native process distribution mechanism of ranimate
is the use of a
remote shell to start rpict
processes and a shared file-system to share
the scene and the cached results of the indirect illumination's calculation. This is a
completely generic mechanism that will work even in heterogenous networks of unix-like
machines, as long as incompatibilities (the nfs lock manager is a famous example)
are not affecting the coordination of the processes. If you are really in a hurry and want to
distribute your renderings over all available hardware, ranging from your Alpha
number-cruncher to the Sun application server and the Linux desktop,
this is the way to go... Just use as many hostlines in your ranimate
control file as you have machines, make them point to the host names and optionally
add the number of processors available on the host to the end of the line. If you are
connected to a public network such as the internet, do not use rsh
but
ssh
and set up key-based authentication.
Still, it means you have to become some kind of a network administrator, and as caring about nfs and common logins can become a tedious task, as well as there have been some scalability issues found with some implementations of nfs, many people would like to ignore all the network and use the machines like one large, multi-processor computer. This is what Openmosix allows us to do, so in fact we can start all processes on the local computer with Openmosix and let the system distribute the processes. Unfortunaly, Openmosix does not provide you with a SMP system.
If we take a look at our ranimate
control file, we see that for the
host-line, ranimate
will call the shell specified in the RSH
-line
to start the process. If we add a number behind the hostname, ranimate
will start that number of processes on the host. So if we have two
processors in the machine, we may want to change the host-line as
follows:
host= localhost 2
Still, if we have two machines in an Openmosix cluster, having one
processor each, we cannot just use this line. While two rpict
processes
would be started, they would not be distributed. The reason is that on
multi-processor systems, rpict
will be started with the -PP
option, keeping the scene in shared memory accessed by both processes. This is
possible on most multi-processor systems, which are also called shared
memory processing (SMP) systems, but Openmosix does not allow shared
memory. So instead of using the number behind the hostname, we will have
to make ranimate
start several independant processes on our machine,
that can migrate to other hosts after start-up. To do so, we do
something incredible simple - we add more host-lines, each pointing to
localhost
, where the process will be started. In our cluster with three
nodes, we will have these lines:
host= localhost
host= localhost
host= localhost
Now, ranimate
will start three rendering processes on localhost
,
but these are all independant and do not share memory. Still, the cached
ambient data in the file scene.amb
can be accessed by all processes, but
instead of using nfs for the distribution of the data, all processes will see this shared file as
on a local file-system. To watch the rpict
-processes, we can use the
top
-equivalent in Openmosix-clusters called mtop
,
or see a graph showing the system load with mosmon
.
Troubleshooting
The bad news first. If you want your cluster to appear like a single system, network stability is a must. Do not expect an efficient cluster if you use unrelieable hardware, a broken network setup or unsupported configurations such as different versions of the Linux kernel in one cluster. Also, all the nice and fast processors in your network can't be used without memory. If your super-fast number-crunching node has just 512MB RAM installed, but your rendering process, started on your poor slow laptop, takes 640MB memory, it will stay on your laptop. Memory is getting especially tricky if you use diskless nodes (or nodes that have disks, but do not use them, such as CD-booted systems). We are used to have our systems swap if we do not have enough memory installed, but on a system without swap-partition, the system cannot access this virtual memory and will kill processes instead.
The good news is that ranimate
uses the features for recovery
provided by Radiance as well as it does some error checking. If your
rendering processes broke and you simply invoke ranimate
, it finds out
that there has been a problem and will try to recover the results of the
process, than rendering the unfinished pictures frame by frame. As the
image files contain info about rendering parameters (try getinfo
on the
unfinished frame), rpict
can continue rendering a frame taking views
options from the image file. But if you are in a hurry, you will not
want to wait for your hundred pictures being all rendered serially on
your local host while your cluster is doing nothing. To accelerate the
process, we will manually start the rpict
processes, but as we are
experienced users, meaning we have done enough stupid mistakes in our past
with Radiance, we will first make a back-up of the unfinished frames!
It is always worth to take a copy of your rendering results, as Radiance
will usually be able to reuse what it has calculated once. We use tar
,
and you should be familar with the syntax:
tar cvf ~/unfinished.tar results/*.unf
Now let's see what we can find out about the processes we had
started. Take a look at the content of the directory result, where
Ranimate was writing all its output before the process broke. You will
find a file STATUS
, showing the progress of the rendering task, as well
as a very interesting file called render.opt
. The content of this file
is actually what you need, to make sure that the new process continues
with the same settings as the image had been started. Go the the
directory from which you had started the processes, and start rpict
with
its recovery option and the file results/render.opt
as option file:
rpict @results/render.opt -w0 -ro results/frame001.unf scene.oct
This will start rpict
and make it continue where the previous process
had been stopped. You can start as many processes now as you have
processors available, Openmosix will distribute them over your cluster.
You may want to send the process to the background, or use nohup
so
that you can close your shell and have the process still be working. See the
manpage of nohup
for details. Of course, if the cause of the process
interruption is still present, rpict
won't work relieable, but if the
set-up of your cluster as well as of your scene is clean, you can get the renderings
completed now. However, using multiple processes with ranimate
is not
possible when recovering at the moment, and ranimate
will always take
the safe path after an error.