[Home]Distributed Processing

Difference (from prior major revision) (author diff)

Removed: 50,55d49
# WARNING: It is possible for a remote machine to be in a crashed
# state where an ssh connection will be established but
# the client connection will hang forever doing a blocking
# read - this is a known behaviour of ssh that is not
# considered a bug.
#

Added: 68a63
set result [ list ]

Changed: 71,79c66,72
if { ! [ string match #* $node ] } {
if { [ string match scp [ lindex $argv 0 ] ] } {
set local [ lindex $argv 1 ]
set target [ lindex $argv 2 ]
catch { eval eval exec $scp } result
} else {
catch { eval exec $ssh $node $argv } result
}
puts stderr "$node : '$result'"
if { [ string match #* $node ] } {
set msg "skipping line: '$line'"
} elseif { [ string match scp [ lindex $argv 0 ] ] } {
set local [ lrange $argv 1 end-1 ]
set target [ lindex $argv end ]
catch { eval eval exec $scp } result
set msg "scp $local -> $target"

Changed: 81c74,75
puts stderr "skipping line: '$line'"
catch { eval exec $ssh $node $argv } result
set msg "ssh $node $argv"

Added: 82a77,80
if { [ string length $result ] } {
set msg "$msg : '$result'"
}
puts stderr $msg

On this page I will be outlining one possible distributed processing model for Transcode.

Design targets for this model will be ease of implementation and efficiency (N machines should be able to do the basic task of decoding/encoding in something approaching t/N time).

The component strategies are of two types:

  1. distributed decoding/encoding of video/audio streams (one machine decodes, another encodes).
  2. distributed transcoding of chunks of video stream (each slave machine processen t/N seconds of video).

MPI/PVM are not components of the proposed methods.

Gigabit ethernet connectivity between the machines will be assumed.

Stay tuned for more.


Example mid-level script that can be used for executing all the remote steps required for setup, execution, takedown, and remote log recovery via ssh-agent:

#!/usr/bin/tclsh
#
# Execute arbitrary commands on all cluster nodes in
# /etc/nodes OR scp local files/dirs to a target
# location on all nodes.
#
# Example calls:
#
#  ./do-on-all-nodes.tcl 'ps -Ao fname |grep ash'
#
#  ./do-on-all-nodes.tcl scp /local/file1 /local/file2 /remote/file
#
# The file /etc/nodes contains a list of machine names,
# one per line, and comments delimited with the '#' character.
# Any line beginning with '#' is ignored, and comments
# may appear after valid entries.
#
# Examples of /etc/nodes entries (first example should be
# read as beginning in column zero of the /etc/nodes file):
#
# node1
# node2
# #node3
# #node4 # removed from service
# node5 # returned to service 05/05/06 with new dimm 0. 
#
# In these examples nodes 3 and 4 will be ignored when the
# command is executed.
#
# By: Phil Ehrens <pehrens@ligo.caltech.edu>
#

set nodelist /etc/nodes
set ssh "/usr/bin/ssh -x -n -obatchmode=yes -oconnecttimeout=2"
set scp "/usr/bin/scp -r -B -p -q -o ConnectTimeout=2 \$local \${node}:\$target"

set fid [ open $nodelist r ]
set data [ read $fid [ file size $nodelist ] ]
close $fid

puts stderr "\n\nCommand: $argv"
puts stderr "\nreading node names from $nodelist\n"

foreach line [ split $data "\n" ] {
   set result [ list ]
   set node [ lindex [ string trim $line ] 0 ]
   if { ! [ string length $node ] } { continue }
   if { [ string match #* $node ] } {
      set msg "skipping line: '$line'"
   } elseif { [ string match scp [ lindex $argv 0 ] ] } {
      set local [ lrange $argv 1 end-1 ]
      set target [ lindex $argv end ]
      catch { eval eval exec $scp } result
      set msg "scp $local -> $target"
   } else {
      catch { eval exec $ssh $node $argv } result
      set msg "ssh $node $argv"
   }
   if { [ string length $result ] } {
      set msg "$msg : '$result'"
   }
   puts stderr $msg
}

Transcode Wiki | Recent Changes | Preferences
Password required to edit | View other revisions
Last edited July 14, 2006 12:55 am by tarazed.ligo.caltech.edu (diff)
Search: