####1. Current CSAIL Practices (this was essentially a Q & A with Jon P from CSAIL) - CSAIL faces constant, but generally non-targeted attacks via SSH - They have not experienced targeted attacks on their OpenStack cluster - Typical attacks are just people trawling for a chance to exploit known vulnerabilities **CSAIL standard practices** - secure the hosts - look at logs for uneven spikes - they do not currently use any honeypots - they use sshguard "all over the place" - limit outbound access to internal networks with iptables - internal networks = the whole building, there are a number of IPs; it's not restricted to specific admin workstations - Nothing currently prevents someone from coming onsite and plugging into their own port to bypass this - A number of non-Openstack nodes are also on this network, such as Linux workstations used by students **What CSAIL does with successful attacks** - These nearly always involve one of the following: - Network services poorly written by a grad student - Web applications poorly written by a grad student - end users who have used the same password/credentials somewhere less secure, and they were compromised - How they know a machine is affected - Spikes in inbound or outbound traffic - Sometimes an internal report from one of their systems - When a successful attack is detected - reinstall OS - in the past, they have not worried about rootkits, etc. but it is a possibility - The level of response/analysis on staff/resources, and also which system was attacked - on a web server, they track down what CGI scripts caused the problem and who wrote them - anything running arbitrary code that users write, gets looked at more closely to make sure the relevant code is not run again after OS reinstall **Jon P's suggested baseline for MOC security procedures** - Regular patch schedule - use good passwords - for passwords you never type, 32-char random pass key string - enforce reasonable password complexity for users *** ####2. Network Design - Initially, we are looking at 2 extremes - put it all on the Internet, with selective filtering - or, only nodes that definitely need access to the internet are publicly accessible. - Separated Control Plane - When networking the physical host, an isolated control network is good practice - API endpoints on the internet are needed for people to talk to the server - Right now, CSAIL's internal and external endpoints are the same, there is not a separate internal network. **Ceph Storage Networking** - CSAIL Ceph has a backend network which is used for replication, and a frontend user network which is open to the internet - MOC Fujitsu Ceph has 4 networks: storage, replication, management, and a network used by Fujitsu. None are publicly accessible. - Ceph NIC distribution proposed (this is what CSAIL has right now) - one for OBM that doesn't go through the OS - one for external data network - one for internal replication - There has not been a case at CSAIL where Ceph had such high bandwidth demand that they couldn't get in - if that happened, you could still get a virtual serial console through the OBM. - Lenovo Ceph we will have two pools: - Production - Big Data - Ceph does not encrypt data network traffic - Suggestion: Data should be separate from the public interfaces **Ceph Storage Vulnerabilities** - If a compute node is compromised, the attacker will be able to see all the data going in and out. - It could be possible to design things so that there would still be components of the system that we trust. - A successful attack on a compute node leaves Ceph vulnerable - the attacker could do anything Ceph can do, such as delete all data - Suggestion: Have a backup system that allows going back 1 day, or similar - Suggestion: put data traffic on a separate VLAN, so it can be encrypted - This would not work. Compute nodes have a Ceph key that has access to a large set of the Ceph data, and if attackers got this they could use it to get the data regardless of encryption - Suggestion: Isolate the VLAN traffic for each of the 2 openstack environments, so that if one is compromised the other is not. - Possibly not something that can be enforced - Two public IP addresses. Each side gets access to one, and not the other. - to do this on Fujitsu, would require messing with the internal Fujitsu switch - Fujitsu does not need to support 2 environments in the near term, but we can explore this on Lenovo - the switch could be configured for DHCP snooping *NB: Here there was a brief discussion that I did not fully capture in my notes\:* Brief mention of network pass through; it was decided not to worry right now about this Use VLAN to create an isolated network over which VXLAN traffic will flow **NETWORK DESIGN RESULT** - Components - IPMI network for non-haas-managed resources - In terms of multiple CSAIL/MOC IPMI, if there is no security concern, we can do what makes the most sense. - Stuff in the same pod can be on the same IPMI network, stuff in different pods can have its own IPMI network - separate IPMI network for haas-managed resources - We can create non-administrative IPMI users for haas, which only have access to the specific set of actions needed by haas, not full access to IPMI. - external API endpoints - horizon, rgw - internal API endpoints - storage data VLAN - storage management VLAN - neutron networks (vxlans) - A second NIC may be useful for monitoring but this will be discussed later as it is not a security issue. **Access to IPMI from the OS** - Can we / should we disable access to IPMI from OS? - CSAIL has dedicated IPMI NICs, but also has OBM access from inside the OS - this is useful to access things like sensors - however, if the OS is compromised, this allows flashing the firmware, as well as OBM access to other machines - Suggestion: the OS should be a limited-access user, like we discussed above for haas - No. Credentials are only checked for remote access, so this is not possible - Suggestion: Identify the uses of this, and see if there is a better way to do it that doesn't require accessing IPMI from the OS - e.g. there are utilities to get at sensor data - one thing CSAIL uses internal IPMI access for is to set up remote IPMI access, but this is a one-time task **Switch Configuration** - MIT's external networking setup should be rejecting things like BGP before it gets to us - However the switch might be listening on that port, which could in theory be exploited - Suggestion: we should turn off everything we have no use for on the switches - Does CSAIL do this? - it depends on the switch, but typically they start clean and make a new config - Garrett has a standard set of configs that he pushes, with tweaks to the IP addresses, etc - UMASS does something similar to CSAIL - Joe can get us some of their standard configs to look at - Joe would like to talk to the person at Northeastern who configured the switches, because they have routing capability - they are currently routing the 192.168.28 network used by Fujitsu - Reportedly the switches were using a standard NU in-house config, with some extra things turned on for us by Anand - Suggestion: Only turn on the very specific things we need, for both switches and router - We don't own the current router, NU owns it - Once we use the MIT router instead, we will have more control - Jon P: Once we get a list of VLANs together, we can talk to Garrett and get some numbers. - Everything will have to be reconfigured after switching to the MIT router - Discussion: Are we concerned that the switches might have gotten broken into? - They were previously open to the internet with a bad password. This has been fixed and the password changed. - They had old firmware with known vulnerabilities. Nothing has been done to fix any lower-level attacks that might have happened. - Question to Jon P - how comfortable are you with this? - Update the firmware if possible, to eliminate those known vulnerabilities, that should be fine. - Consensus on what to do: - write new configs from scratch - only trunk shared vlans up - upgrade firmware - Joe has experience with these switches, so if he gets specs from Garrett he can work on them *** ####3. Responsibility for infrastructure We have 3 sets of infrastructure - who is in charge of hardware, firmware, etc? - CSAIL - CSAIL takes care of all their own stuff - Northeastern - MOC team in charge of OS stuff - NU will do firmware, but we need to talk to them about how proactive they will be - Harvard - we haven't talked to them in a while, but we should also ask them to proactively upgrade firmware - In theory, the person with the support contract is the one who should be doing the firmware etc. - We need to clarify this with NU and HU. *** ####4. Route to the IPMI network How do we keep this accessible while protecting it from attacks? - Suggestion: 1 hardened box that you SSH into, then from there to the OBM - we should have 2, one as a backup - we currently have haas master, and cisco-emergency-recovery as a backup - No concerns with CSAIL having access to all the IPMI networks as well - Suggestion: Could our hardened box be a virtual machine? - No. There's too many requirements before the VM is actually working, we want something that works low level. - Better to use a cheap box - Graphical access to the NU IPMI requires outdated flash and java, so that is best done from a VM - some of the BIOS stuff requires the graphical console, which is why we have this special VM. - On CSAIL Dell equipment, they can SSH to the DRACs and run 'console com2' to get a virtual console, which is easier - Maybe we did not fully explore alternatives and there is some way we can do this too - However CSAIL's stuff is in the building, so they don't use a remote solution when going into BIOS. - The stuff CSAIL has at MGHPCC is dumb and does not require a graphical interface - Suggestion: a few hardened boxes that are not very powerful, running VNC servers - you SSH into the boxes - you then can then run a VM for the outdated flash/java stuff in it anywhere you want, and use port forwarding with VNC *** ####5. Publishing Security Practices Should our practices be recorded on the public wiki, or do we keep them private? - In theory, if our stuff is well locked down, recording how we locked it down doesn't matter - Consensus: for the stuff we discussed today, the public wiki is probably fine - There may be some things we need to put on the private wiki later