Campus "Group Buy" for the Next High Performance Computing Cluster

ISU researchers interested in participating in a campus group purchase of the next high-performance computing cluster are encouraged to contact HPC steering committee chair Arun Somani or HPC operations lead Jim Coyle.

Researchers will only be responsible for purchasing compute nodes and storage needed using grant, startup, departmental, or other funds. Infrastructure costs (racks, power, cooling, networking, HPC interconnect network) will be supported centrally. The entire cluster will act as a communal resource with proportional usage allocations. This means that researchers will have access to the entire cluster up to the total compute hours available from their nodes purchased. (More if the cluster is underutilized).

The primary benefits of the Condo cluster for researchers are:

  • Significant cost reduction
    • No infrastructure costs.
    • No staffing needed to manage the cluster.
    • No staffing needed diagnosis/repair of hardware problems.
    • Hardware purchase needed is close to average usage, rather than peak usage.
  • Access to a major compute resource that a small resource may not be capable of solving.
  • Flexibility and Scalability: Until maximum capacity is reached nodes can be purchased as need arises or funding becomes available.

The HPC steering committee is partnering with ISU Purchasing to issue a Request for Proposals to vendors and anticipates awarding the bid in late June 2014 with install by late August 2014. Costs from a recently acquired cluster are available below for your planning purposes.

Anticipated Node specifications:

  • Node: about $5500/node, purchased in groups of 4.
    • Intel E5-2650 V2 between 2 and 2.6 GHz
    • 128 GB
    • 3TB Raid 1
    • 40 Gbit bandwidth, low latency

Anticipated storage specifications (about $90K for a pair of fileservers):

  • Four usable RAID-6 36TB partitions in the pair
  • 500 MB/sec peak
  • High availability RAID servers with redundant controllers.
  • Bought in pairs with fileserver failover, to ensure continuous access to data.
  • A copy of each partition is made each night in case of catastrophic RAID or filesystem failure.

Supported communal software:

  • Intel compilers including OpenMP, MPI, and optimized Math Library (MKL)
  • DDT paralllel debugger
  • RedHat 6.x OS and current software for that OS.

June 15 Update: HPC Condo Cluster: Information for Participants [DOCX]

September 29 Update: Purchases may be made using the Condo Cluster Purchase Form.