Best Practice

5. ADM Render Farm


Latest Notice

Software version currently supported:

Maya 20202022, 2023 with Arnold

Maya 2024 only on classroom workstations (still use the farm to distribute)

Nuke 14.0 and 12.2

Houdini 19.5 with Mantra: Job submission only through Deadline monitor, from within Houdini currently not supported. Redshift hopefully coming soon.


Red Button on Desktop: Please use instead of log-off or shutdown. The workstation will automatically restart and join the network.


dl_logo_1024


Distributed Rendering Introduction

We are using Thinkbox Deadline for distributed rendering

Submitting a job means sending your Maya or Nuke script to a Render Manager

The Render Manager will distribute your job to a network of computers, called the Clients, Nodes or Slaves

Your submitting workstation doesn’t have to be turned on, but every PC workstation in the Animation area (FYP and classrooms) can join the network and become a render node. This is important to consider during crunch time, we only have 24 dedicated render nodes but 120 workstations

After submitting your job, follow the progress and manage your job with the Deadline Monitor


Requirements

Video files such as QuickTime can only be rendered by one machine. Render image sequences to utilize distributed rendering, or if video file is needed, limit to one machine

Jobs can be submitted directly from Maya and Nuke

All source files such as footage, textures, referenced models etc., which are used in your setup have to be located in a network location. If one source file is inaccessible, i.e. still on your desktop, the render job will fail

Accepted source locations are all network shares: animation, share, resources, projects, projects-fyp

The render destination has to be on the network, too. Make sure the Maya project path is set correctly for a network share

Follow the File Naming Convention and avoid space or special characters in all folder and file names


Rules

The render farm is a shared resource for all animation projects and during crunch time the render queue can get longer

You are required to monitor your job
Do not just submit and walk away, hoping everything will be fine. Monitor closely and wait for first frames to finish to confirm the job is working. Suspend failing jobs and check error reports

Don’t start a higher priority race
A few days in, everyone will submit with high priority. Increasing priority without limiting number of nodes is not allowed

Be considered, don’t waste resources while others are queuing
Wastefully is:

  • Blocking farm with a loop of failing tasks instead of suspending
  • A first test render at full-size quality
  • Long render without having done a short test first
  • Extremely long render times per task on many slaves

Join your workstation to the render network
If the queue is long, let your workstation join the render network before going home
How to in Deadline Slave – Join your Workstation

FYP projects granted priority
Every FYP team has one nominated member with elevated permissions to control jobs


Deadline Parameters Explained

A brief introduction to the most relevant parameters, which you need to set during job submission. They can be changed in the Monitor after submission while rendering or queuing

Priority

Higher priority comes first, but it’s not the only factor determining the queue order, submission time and task size will weight in as well
We have rules on setting the job priority, i.e. average Maya job: Priority 50
More in Priorities, Machine Limit and Task Size

Machine Limit

The maximum number of slaves to render the job. The 24 dedicated render slaves have to be shared. If other jobs are in the queue, limit is 8

Pool

Select the software package to render with, i.e. maya, nuke, houdini, etc.
Not every render farm slave or workstation has all software installed, selecting the Pool ensures your job is not sent to the wrong machine without the software needed

Group

Select the group of slaves to render on

Recommended selection: farmplus

farm: The 24 dedicated render slaves ADM 1 –  24 only
farmplus: 24 farm nodes and all rarely used workstations
64gb: Only machines with 64 GB Ram
all: All workstations including classroom, ie B1-5G

Frames Per Task
The job is distributed to all slaves in task chunks, which can be 1 frame or many. After task is completed, slave will request a new task, this can take a moment, so using 1 frame per task for fast renders is wasting time with getting a new assignment and reloading the same setup

Priorities, Machine Limit and Task Size

First come first serve does not always work on a render farm. Some 3D job takes an hour per frame, another just 10 minutes and a simple compositing job maybe only one minute

Your job has a long task time – play fair and don’t block the entire farm

Lower your job’s Priority and Machine Limit and accept that other jobs need to render too. Submitting a render with many tasks and long task time, and high priority, without machine limit, is unfair


Which Priority, Machine Limit and Task Size should I use?

Calculate your total render time: 10 minutes per frame x 100 frames = 1000 minutes

Task Size: Divide 10 by time per frame, 10 / 5 min = Task Size 2


Fast
<200 min

Short Duration and Task Time
2 minutes per frame x 100 frames = 200 min
On 6 slaves done in 30 minutes
Average compositing job

Priority >60
Machine Limit 6
Task Size 5-10

Average
<2000 min

Average Duration and Task Time
20 minutes per frame x 100 frames = 2000 min
On 10 slaves done in 3 hours

Priority 50
Machine Limit 10
Task Size 1-5

Slow
>2000 min

Long Duration or Task Time
30 minutes per frame x 100 frames = 3000 min
On 8 slaves done in 6 hours

Priority <50
Machine Limit 8
Task Size 1


maya_shelf_v2

Maya Job Submission

Locate Deadline Shelf

Green icon is the Deadline Submitter

maya_submit_full_450
Submit to Deadline

Be patient, it might take a few seconds to open window

1 Pool: Software + version used, i.e. maya, nuke

2 Group: Select farmplus or during crunch time all.

3 Priority: Set to 50. Average 3D jobs shall be submitted with priority 50. Slow jobs have to go lower!

4 Machine Limit: Set to 8 to fairly share the farm with others. If the render queue is empty, you are allowed to set to 0, which uses all nodes

5 Comment: Let others know that you are using only a certain number of nodes or that you have an urgent but very fast render job and therefore increased the priority

6 Frames Per Task: Set to 1. Increase if render time per frame is very short, i.e. 2 minutes per frame, set Frames Per Task to 5, one task will then take 10 minutes

7 Project and Output Path: Confirm these are network paths. If not, change

8 Submit Maya Scene File: Yes, enable. Don’t if your setup contains relative paths

9 Strict Error Checking: Disable

Submit Job: Send  job and close window manually


nuke_menu_600_v3

Nuke Job Submission

Locate Render Menu

Select Submit Nuke Job To Deadline

Be patient, might take a few seconds to open window

nuke_submit_full_v4
Submit to Deadline

1 Pool: Select nuke, or if available, the specific version

2 Group: Select farmplus

3 Priority: Set to 6070. Nuke jobs are usually fast and are allowed to use higher priority

4 Machine Limit: Set to 68 to fairly share the farm with others. If the render queue is empty, you are allowed to set to 0, which uses all nodes

5 Comment: Let others know that you are using only a certain number of nodes or that you have an urgent but very fast render job and therefore increased the priority

6 Frames Per Task: Set to 10. If render time per frame is 1 minute, one task will take 10 minutes

7 Submit Nuke Script File: Yes, enable. Don’t if your setup contains relative paths


dl_apps_w10_450_2rows_v2

Deadline Applications

These are the 3 Deadline apps on your workstation

Monitor: Control jobs and monitor render queue

Slave: Make your workstation a render node

Launcher: Icon in the notification area to control Deadline settings and start above apps

w10_search_launcher
Start Deadline Launcher

Deadline Launcher is a Taskbar icon only, might be hidden if you haven’t changed behavior

windows_icons_area
Launcher Taskbar Icon

Click on Show hidden icons up-arrow if icons not visible

windows_icons_v2
Change Icon Behavior

Change to show icon and notifications

Way more convenient with icons visible

Note

If you can’t start the Deadline Launcher, means another user is still signed-in the workstation, as only one account can run Deadline apps. Restart workstation, or if you have admin permissions, you can Sign Off the other user

taskmanager_fullv2
Task Manager

In the Task Manager > Users tab: Confirmed, another user is still signed-in

With admin permissions, right-click on user and Sign Off


dl_launcher_menu_shorter

Deadline Launcher

Right-click on Launcher icon

3 menu items are useful for us

Launch Monitor

Launch Slave

Launch Slave at Startup: The green box here means it’s enabled. On your workstation, you usually want it disabled, without green box


Deadline Monitor

dl_monitor_full2

Job Window

Showing all jobs rendering and queuing, called the Render Queue
Also showing completed, suspended and failed

Completed jobs are auto-archived after 3 days, suspended and failed jobs: Please delete or archive yourself

dl_monitor_jobs
Enable Ego-Centric Sorting and sort by Status

Locate your job and monitor how many jobs are queuing before you

If slaves are available, your job should start immediately


dl_monitor_jobs_control
Job Control

Right-click on job

Suspend Job: Pauses job
Menu will then offer Resume Job

Modify Job Properties: Opens Job Properties window to change many submission parameters such as Pool, Group, Priority, Machine Limit, Machine Whitelist and Blacklist etc.

View Job Report: If your job creates error count, read the error reports to debug

Job Output: Jump right to the output folder

dl_monitor_tasks
Task Window

How long does one task render?
Are these expected or suspiciously long times?
Do certain slaves fail?

Right-click on task to access task report, jump to output folder or re-queue task


How to Monitor?

Do not just submit and walk away, hoping everything will be fine. Monitor closely and wait for first frames to finish and confirm the job is working

 

Is my job rendering?

If there is already a finished frame, check the output immediately to confirm the render is good

 

Does my job create errors?

Or even fail, maybe on certain slaves? Read the error report and investigate
Suspend your job and avoid blocking the farm with a loop of failing tasks

 

How long are the render times per task?

Enough time for the number of free machines to finish the job in time? If not, resubmit with smaller render size and lower quality settings

Any tasks unusually longer than others? Maybe that slave is hanging or it’s a slower classroom workstation. Remove that slave from your job by adding to Blacklist (see below) and re-queue task


dl_monitor_error_job
Monitor Error Count

This job has created 13 errors. The limit is 100, after that, the job will fail

dl_monitor_error_report_v2
Job Report

If your job creates errors, read job or task report

Here it’s workstation S3D70 causing problems

Bad slaves are marked and excluded automatically, but it’s better to add them to the blacklist to not spoil the error count unnecessary

dl_monitor_jobs_properties_blacklist_450

Job Properties, Black & Whitelist

Open Job Properties with right-click on job

In Machine Limit, select slaves from the Slave List and add them to right-side list, then specify whether this is a Black or Whitelist

Other useful settings in Job Properties

Machine Limit: Set limit how many machines render the job, here 8

General: Change Pool or Group

Dependencies: Link to another render job which has to finish first before this one starts

Failure Detection: List of slaves marked bad


Deadline Slave – Join your Workstation

During crunch time it is essential to join your workstation to the render farm. More slaves are faster. We have 120 workstations, start as many as you can

Two easy ways to join a workstation to the farm

dl_launcher_menu_slave3
1. Start Slave from Launcher

If you don’t want to sign-out of your account, this is the way

If you don’t see the Launcher icon, refer to section Deadline Application to change visibility behavior

Or simply search Start Menu for Deadline Slave

dl_slave_short_v4
Deadline Slave Window

Shows rendering status and progress

Simply close window to un-join your workstation

If render is close to 100%, don’t close window, use
Control menu > Stop After Current Task Completion

desktop_renderstart_v1
2. Restart in Render Account

Click the RenderStart icon on desktop

With one click, everything is done

The workstation will restart, automatically sign into our render account and join the render farm

desktop_1_locked_450_v2
Locked Workstation

The workstation is still signed-in our render account

If you want to use the workstation again, you need to restart

Select Switch Users

desktop_2_options_450
Restart Workstation

Click options arrow of Shut Down button

and Restart


Render Queue Order, Exceptions and FAQ

Render Queue Order

The render order is determined by Pool, Weight, and First-In-First-Out

Weight is calculated by Priority Number of Task + Submission Time Errors

So, Deadline is smart, Priority is not everything, job submission time and errors factor in, too


Priorities, Machine Limit and Task Size

Calculate your total render time: 10 minutes per frame x 100 frames = 1000 minutes

Task Size: Divide 10 by time per frame, 10 / 5 min = Task Size 2


Fast
<200 min

Short Duration and Task Time
2 minutes per frame x 100 frames = 200 min
On 6 slaves done in 30 minutes
Average compositing job

Priority >60
Machine Limit 6
Task Size 5-10

Average
<2000 min

Average Duration and Task Time
20 minutes per frame x 100 frames = 2000 min
On 10 slaves done in 3 hours

Priority 50
Machine Limit 10
Task Size 1-5

Slow
>2000 min

Long Duration or Task Time
30 minutes per frame x 100 frames = 3000 min
On 8 slaves done in 6 hours

Priority <50
Machine Limit 8
Task Size 1

 


Exceptions

Exceptions for 3D jobs to go higher with priority can only be made for extremely short renders, such as a test, with Machine Limit below 4

Someone in front of you with a super long render and yours is a really quick one, limit to a few machines and go higher in priority


FAQ

What if there’s no other job in the queue, do I still need Machine Limit?

The next job can only go higher in priority to get any slaves at all

Still, if no one else is rendering at all, take all machines with Machine Limit to 0 but use priority not higher than 50. Also, please monitor the queue, if other jobs appear, decrease your Machine Limit in the Job Properties

 

Some slow jobs are in front of me, my job is not fast either, but I don’t want to wait 1 day, can I go ahead with higher priority?

Check what’s the expected finishing time of the jobs in front. Is it really 1 day or maybe just a few hours and wouldn’t it be good enough to have your result next morning, if so, why bother?

Else, if you know who submitted, talk to them and get a few slaves

 

There are too many equally important jobs of several FYP teams

Firstly talk to each other and complain about the crazy deadline 😉

If everything is same crazy urgent, please share the farm equally by assigning each team an amount of slaves. If 24 slaves are available, and 3 teams want to render, each team simply gets 8 slaves

You can always add more slaves by starting the Deadline Slave on classroom and FYP workstations. We have 120 workstations, start as many as you can. How to in Deadline Slave – Join your Workstation


Optimizing Render Times

Don’t just render your first test sequence with the highest settings

Render size, quality, complex materials and motion blur are the most time-costly settings

 

What is the render for?

Keep in mind, most of the time the first render is not the final. Depending on the project, several versions will be rendered before it’s really final

Does the first render really have to be full resolution with best quality settings? Most likely not

Combining CG with live action in VFX projects, the CG will be slightly defocused in compositing, rendering the full resolution is almost never necessary

Rendering 720p instead of Full HD might already cut the render time by half

Decide first what you try to achieve with the render, which aspects you want to inspect before pushing the settings to full quality

Is it to check the animation or is it a first lighting pass for compositing?

 

Motion Blur

Enabling Motion Blur in Maya causes much slower renders, and you really have to push the quality settings to the max to get rid of the noise. In most cases, it’s better and much faster to render the beauty without motion blur but render the vector motion pass and use it to create the motion blur in Nuke. You can find many tutorials on this out there, here is one: Vector Motion Blur Tutorial on YouTube


Report Problems

If you see any problems, i.e. a slave is offline or hanging, please report to Prof Ben, Naga or your render farm work-study