
When client starts:
======================

send_subscribe():
  R_SUBSCRIBE   =>   {broadcast}
  {anyone}  =>  ACK_SUBSCRIBE(src)  
  src => host_list


When client terminates:
==========================

send_unsubscribe():
  R_UNSUBSCRIBE  =>  {broadcast}


Normal operation:
===================

recv_subscribe():
  {anyone}  =>  R_SUBSCRIBE(src)
  host_list <- host_list + src
  if( last_message_time + resubscription_timeout < now )
    send_subscribe(src)
  last_message_time <- now

send_subscribe(host):  // must be called at regular intervals
  R_SUBSCRIBE(selff) => host

recv_unsubscribe():
  {anyone}  =>  R_UNSUBSCRIBE(src)
  host_list <- host_list\src


send_job_alloc(type):
  nodeset  <-  host_list
  do 
    h <- find_best_node_for(type)
    R_ALLOC(type)  =>  h
    h  =>  ACK_ALLOC, return ok
    || h =>  _,  nodeset <-  nodeset\h
  repeat


recv_job_alloc(type):
  {anyone}  =>  R_ALLOC(src,type)
  if(alloc_ok(type))
    ...generate serial#...
    ACK_ALLOC(serial#)  =>  src
    ...update local metric...
    send_new_metric()
    ...prepare accept TCP with serial#...
  else
    NACK_ALLOC  =>  src


send_new_metric():
    foreach i in host_list
      R_NEWMETRIC  =>  i       // doesn't have to succeed, as remote metrics are guidance only

recv_new_metric():
  {anyone}  =>  R_NEWMETRIC(src)
  ...just update local state...
  last_message_time <- now

req_new_metric(host)
   REQ_NEWMETRIC => host

recv_req_new_metric():
   {anyone}  => REQ_NEWMETRIC(src)
   metrics => src

// same calls for limits as for metrics

send_jobtype_limits():
  foreach i in host_list
     foreach jobt in job_limits
        R_NEWLIMITS(job_limit) => i   // will be resent after every N send_new_metric() calls

recv_jobtype_limits():
   {anyone} => R_NEWLIMITS(src,lims)
   ...validate src...
   ...register new lims...


==========================================
Protocol notes:
  Each node holds the metrics received from all other nodes. This is
used when allocating jobs.  But a node cannot allocate a job slot on a
remote node without acknowledgement, and this acknowledgement will
only be given if the job can be spawned within current (real) metric
restraints.   Should a node attempt an allocation based on obsolete metric
information, the remote node will send a NACK, and the node will retry the
calculation+request in the node set _minus_ the node on which the allocation
failed.

Usage:
  At node initialization, send_subscribe() must be called.  At job
start, send_job_alloc() is used to allocate a job slot.  Periodically
metrics are updated to reflect the state of the cluster, this means
that send_new_metric() must be called periodically whether there are
jobs scheduled or not.

==========================================
Job execution:
  make will execute a ``remote compile'' command, which will contact
the local jobd daemon via. a UNIX socket. The local jobd dmon is told
to execute the job.
  First a job slot is allocated using send_job_alloc(). When a
slot is successfully allocated on some host (h), the remote jobd will
send an ACK holding a seq# of the allocated job slot, and a port number
on which a process will be awaiting for the rant conenction. The local
jobd will send back the seq# of the job slot and the port number to
the rant command.
  The remote jobd will have spawned a job-handler sub-process that is
listening on the previously specified port. This process will be killed
by the antsd after a few seconds if no connection is made to it.
  Now rant can start up a TCP connection to the remote jobd daemon
validating the connection by sending the seq# back to the remote
jobd daemon.
  If no TCP connection is received at the remote end some N seconds
after the seq# of the allocated job slot was sent away, the remote antsd
daemon will kill the sub-process spawned, and reclaim the job-slot.

(not yet implemented:)
  Environment from the remote compile command is transferred to the
remote jobd daemon over the TCP connection. The job will be started on
the the remote node, and stderr+stdout from the job will be sent back
over the TCP connection to the local remote-compile command which will
just output it.

========================================
Local Job Spawn:
  
rcmd()...
 - old crap = slow
 - suid jsh
 + proven tech.

jobd feature ?
 - security
 + simplicity & performance

jobd will execute the command given in the established input stream
if the serial# matches etc.

====================================
Security:
When jsh requests a seq# from the local jobd, the jobd will check
the UID of the jsh.   This UID is sent to the remote jobd, which will
check that the source port of the received packet is a privileged port
(and eventually some key / sequence number / whatever).  When a seq#
is returned to the local jobd, it forwards the seq# to the jsh, which
then opens a TCP connection to the remote jobd.  The remote jobd already
knows the UID to spawn the job as (and it could even check uid <-> port
mapping using identd).

Attacks: If the attacker sends a spoofed UDP datagram (spoofed to an
address which is valid in the cluster) to a server, _and_ manages to
sniff the packet sent back to the spoofed address (holding the
serial#), the attacker could launch code as any user from the valid
cluster machine (with the spoofed source address).

There are no remote attacks possible. Successful execution requires
that the attacker can guess the serial# being sent, and can send
arbitrary UDP datagrams to the target machine, and has program execution
capability on one machine in the cluster.

Alternatively, if the attacker has physical access to the network
coupling the nodes, program execution capability on one of the nodes
is unnecessary.

====================================
Configuration:

Each node must have a list of allowable (trusted) nodes. The jobd
system is based on mutual trust between the daemons.

We use /etc/jobd.hosts as this white-list of hosts.

Each job-type has it's own resource limits etc. Those can be configured
either in a /etc/jobd.conf file, or auto-detected by the jobd.


