###Discussed:###

  • OpenStack environment:
    • NFS, maintain state, etc using Salt.
    • Everything to provision and deprovision nodes could be done in Salt.
    • Idea of shared files:
      • May not be able to put hosts in shared directory.
      • Can put slurm.conf in shared directory
      • Different opinions on links to shared path to be a hard link or soft link. (May need to turn off SELinux to allow soft links.) This is a test for after the special sever VM (see below) is made.
    • Going forward: put things (for OpenStack) into “Salt form”.
      • This can be done via separate Salt recipes.
      • Figure out how to split up the salt scripts? Provide a list of separate scripts to be written?
  • With salt, monitoring Python can still exist!
    • Salt scripts get called in place of provision/deprovision scripts.
  • Instead of a hosts file, run a DNS server.
    • DNS server can run on same machine as salt-master.
    • Uses “NameD” for DNS server, though DNSmasq may be easier. Hosts (compute nodes) check if server is up, dynamically get hostname -> IP mapping
      • There’s a static file you put the DNSmasq server in. Find this out?
      • Ask BMI group, etc, how do they handle this? Probably using DNSmasq. Map ip address to hostname. How does node know what IP, hostname to get?
  • Remark on killing and resubmititng job: give users a choice.
    • find out if there’s an sbatch/srun option for this?

###To-do:###

  • Check on hard link vs soft link to be used to a path in shared in location
  • Check with BMI team on how to use Dnsmasq for IP and hostname mapping with the node
  • Explore the job resubmission options to without killing a job
  • This week: Special VM with salt-master, DNSmasq, NFS master. Separate VM as gateway.
  • Provision/deprovision from gateway:
    • Using salt-cloud, bring up 5 VMs.
    • Using salt-server, configure VMs.
  • Put the configuration into Salt formula
  • Down the line: Two different copies of this: one for engage1 work, one for BMI work.
  • Next week: Test slurm 16 (for Cloud features).