[Home]Distributed Processing

Showing revision 9
On this page I will be outlining one possible distributed processing model for Transcode.

Design targets for this model will be ease of implementation and efficiency (N machines should be able to do the basic task of decoding/encoding in something approaching t/N time).

The component strategies are of two types:

  1. distributed decoding/encoding of video/audio streams (one machine decodes, another encodes).
  2. distributed transcoding of chunks of video stream (each slave machine processen t/N seconds of video).

MPI/PVM are not components of the proposed methods.

Gigabit ethernet connectivity between the machines will be assumed.

Stay tuned for more.


Example mid-level script that can be used for executing all the remote steps required for setup, execution, takedown, and remote log recovery via ssh-agent:

#!/usr/bin/tclsh
#
# Execute arbitrary commands on all cluster nodes in
# /etc/nodes OR scp local files/dirs to a target
# location on all nodes.
#
# Example calls:
#
#  ./do-on-all-nodes.tcl 'ps -Ao fname |grep ash'
#
#  ./do-on-all-nodes.tcl scp /local/file1 /local/file2 /remote/file
#
# The file /etc/nodes contains a list of machine names,
# one per line, and comments delimited with the '#' character.
# Any line beginning with '#' is ignored, and comments
# may appear after valid entries.
#
# Examples of /etc/nodes entries (first example should be
# read as beginning in column zero of the /etc/nodes file):
#
# node1
# node2
# #node3
# #node4 # removed from service
# node5 # returned to service 05/05/06 with new dimm 0. 
#
# In these examples nodes 3 and 4 will be ignored when the
# command is executed.
#
# By: Phil Ehrens <pehrens@ligo.caltech.edu>
#
# WARNING: It is possible for a remote machine to be in a crashed
#          state where an ssh connection will be established but
#          the client connection will hang forever doing a blocking
#          read - this is a known behaviour of ssh that is not
#          considered a bug.
#

set nodelist /etc/nodes
set ssh "/usr/bin/ssh -x -n -obatchmode=yes -oconnecttimeout=2"
set scp "/usr/bin/scp -r -B -p -q -o ConnectTimeout=2 \$local \${node}:\$target"

set fid [ open $nodelist r ]
set data [ read $fid [ file size $nodelist ] ]
close $fid

puts stderr "\n\nCommand: $argv"
puts stderr "\nreading node names from $nodelist\n"

foreach line [ split $data "\n" ] {
   set node [ lindex [ string trim $line ] 0 ]
   if { ! [ string length $node ] } { continue }
   if { ! [ string match #* $node ] } {
      if { [ string match scp [ lindex $argv 0 ] ] } {
         set local  [ lindex $argv 1 ]
         set target [ lindex $argv 2 ]
         catch { eval eval exec $scp } result
      } else {
         catch { eval exec $ssh $node $argv } result
      }
      puts stderr "$node : '$result'"
   } else {
      puts stderr "skipping line: '$line'"
   }
}

Transcode Wiki | Recent Changes | Preferences
Password required to edit | View other revisions | View current revision
Edited June 20, 2006 1:08 am by tarazed.ligo.caltech.edu (diff)
Search: