XSEDE ALLOCATION REQUESTS Open Submission available until April 15, 2021 for awards starting July 1, 2021

Mar 18, 2021

XSEDE is now accepting Research Allocation Requests for the allocation period, July 1, 2021 to June 30, 2022.

The submission period is from March 15, 2021 thru April 15, 2021. The XRAC panel will convene June 7, 2021 with notifications being sent by June 15, 2021. Please review the new XSEDE systems and important policy changes (see below) before you submit your allocation request through the XSEDE User Portal

More information on submitting a XSEDE Research request can be found at https://portal.xsede.org/allocations/research. With links to details of the required documents, examples of well-written Research Requests, review criteria and guidelines, along with many other useful information on submitting a successful Research request.

New Resources Available: Open Storage Network, Johns Hopkins’ Rockfish, NCSA’s Delta, University of Kentucky’s KyRIC, PSC's Bridges-2, and SDSC's Expanse

Estimated Available Service Units/GB for upcoming meeting: Indiana University/TACC (Jetstream) 16,000,000 Johns Hopkins’ Rockfish TBA NCSA (Delta CPU) TBA NCSA (Delta GPU) TBA NCSA (Delta Storage) TBA Open Science Grid (OSG) 2,000,000 Open Storage Network(OSN) TBA PSC Bridges-2 Regular Memory (Bridges-2 RM) 127,000,000 PSC Bridges-2 Extreme Memory (Bridges-2 EM) 750,000 PSC Bridges-2 GPU (Bridges-2 GPU) 189,000 PSC Bridges-2 AI (Bridges GPU-AI) 363,000 PSC Bridges-2 Storage (Bridges-2 Ocean) TBA SDSC Dell Cluster with AMD Rome HDR IB (Expanse) 170,000,000 SDSC Dell Cluster with NVIDIA V100 GPUs NVLINK and HDR IB (Expanse GPU) 350,000 SDSC Expanse Projects Storage TBA TACC Dell/Intel Knight's Landing System (Stampede2) 10,000,000 node hours TACC Long-term tape Archival Storage (Ranch) 2,000,000 University of Kentucky (KyRIC) 400,000

Publications that have resulted from the use of XSEDE resources should be entered into your XSEDE portal profile which you will be able to attach to your Research submission. Please cite XSEDE in all publications that utilized XSEDE resources. See https://www.xsede.org/for-users/acknowledgement

After the Panel Discussion of the XRAC meeting, the total Recommended Allocation is determined and compared to the total Available Allocation across all resources. Transfer of allocations may be made for projects that are more suitable for execution on other resources; transfers may also be made for projects that can take advantage of other resources, hence balancing the load. When the total Recommended considerably exceeds Available Allocations a reconciliation process adjusts all Recommended Allocations to remove over-subscription. This adjustment process reduces large allocations more than small ones and gives preference to NSF-funded projects or project portions. Under the direction of NSF, additional adjustments may be made to achieve a balanced portfolio of awards to diverse communities, geographic areas, and scientific domains.

If you would like to discuss your plans for submitting a research request please send email to the XSEDE Help Desk at help@xsede.org. Your questions will be forwarded to the appropriate XSEDE Staff for their assistance.

----------------

Open Storage Network (OSN), a distributed data sharing and transfer service intended to facilitate exchanges of active scientific data sets between research organizations, communities and projects, providing easy access and high bandwidth delivery of large data sets to researchers who leverage the data to train machine learning models, validate simulations, and perform statistical analysis of live data. OSN is funded by NSF Collaboration Awards #1747507, #1747490, #1747483, #1747552, and #1747493. The OSN is intended to serve two principal purposes: (1) enable the smooth flow of large data sets between resources such as instruments, campus data centers, national supercomputing centers, and cloud providers; and (2) facilitate access to long tail data sets by the scientific community. Examples of data currently available on the OSN include synthetic data from ocean models; the widely used Extracted Features Set from the Hathi Trust Digital Library; open access earth sciences data from Pangeo; and Geophysical Data from BCO-DMO.

OSN data is housed in storage pods, located at Big Data Hubs, interconnected by national, high-performance networks and accessed via a RESTful interface following Simple Storage System (S3) conventions, creating well-connected, cloud-like storage with data transfer rates comparable to or exceeding the public cloud storage providers, where users can park data, back data up, and/or create readily accessible storage for active data sets. 5 PB of storage are currently available for allocation. Allocations of a minimum 10 TB and max of 300 TB can be requested

----------------

Johns Hopkins University, through the Maryland Advanced Research Computing Center (MARCC) – Rockfish, will participate in the XSEDE Federation with its new NSF-funded flagship cluster "rockfish.jhu.edu" funded by NSF MRI award #1920103 that integrates high-performance and data-intensive computing while developing tools for generating, analyzing and disseminating data sets of ever-increasing size. The cluster will contain compute nodes optimized for different research projects and complex, optimized workflows. Rockfish consists of 368 regular compute nodes with 192GB of memory, 10 large memory nodes with 1.5TB of memory and 10 GPU nodes with 4 Nvidia A100 GPUs featuring Intel Cascade Lake 6248R, 48 cores per node, 3.0GHz processor base frequency, and 1TB NVMe for local storage. All compute nodes have HDR100 connectivity. In addition, the cluster has access to several GPFS file systems totaling 10PB of storage. 20% of these resources will be allocated via XSEDE.

More information about the Rockfish resource can be found at:(https://portal.xsede.org/jhu-rockfish)

----------------

NCSA’s Delta The National Center for Supercomputing applications (NCSA) is pleased to announce the availability of its newest resource, Delta, which is designed to deliver a highly capable GPU-focused compute environment for GPU and CPU workloads. Delta will provide three new resources for allocation, as specified below:

* Delta CPU: The Delta CPU resource will support general purpose computation across a broad range of domains able to benefit from the scalar and multi-core performance provided by the CPUs such as appropriately scaled weather and climate, hydrodynamics, astrophysics, and engineering modeling and simulation, and other domains that have algorithms that have not yet moved to the GPU. Delta also supports domains that employ data analysis, data analytics or other data-centric methods. Delta will feature a rich base of preinstalled applications, based on user demand. The system will be optimized for capacity computing, with rapid turnaround for small to modest scale jobs, and will feature support for shared-node usage. Local SSD storage on each compute node will benefit applications with random access data patterns or require fast access to significant amounts of compute-node local scratch space.

* Delta GPU: The Delta GPU resource will support accelerated computation across a broad range of domains such as soft-matter physics, molecular dynamics, replica-exchange molecular dynamics, machine learning, deep learning, natural language processing, textual analysis, visualization, ray tracing, and accelerated analysis of very large in-memory datasets. Delta is designed to support the transition of applications from CPU-only to using the GPU or hybrid CPU-GPU models. Delta will feature a rich base of preinstalled applications, based on user demand. The system will be optimized for capacity computing, with rapid turnaround for small to modest scale jobs, and will feature support for shared-node usage. Local SSD storage on each compute node will benefit applications with random access data patterns or require fast access to significant amounts of compute-node local scratch space.

* Delta Projects Storage: The Delta Storage resource provides storage allocations for allocated projects using the Delta CPU and Delta GPU resources. Unpurged storage is available for the duration of the allocation period.

More information about the Delta resource can be found at:(https://portal.xsede.org/ncsa-delta)

----------------

KyRIC (Kentucky Research Informatics Cloud) Large Memory nodes will provide 3TB of shared memory for processing massive NLP data sets, genome sequencing, bioinformatics and memory intensive analysis of big data. Each of KyRIC 5 large memory nodes will consist of Intel(R) Xeon(R) CPU E7-4820 v4 @ 2.00GHz with 4 sockets, 10 cores/socket), 3TB RAM, 6TB SSD storage drives and 100G Ethernet interconnects.

*PIs requesting allocations should consult the KyRIC website (https://docs.ccs.uky.edu/display/HPC/KyRIC+Cluster+User+Guide) for additional details and the most current information.

----------------

PSC’s Bridges-2 platform will address the needs of rapidly evolving research by combining high-performance computing (HPC), high-performance artificial intelligence (HPAI), and high-performance data analytics (HPDA) with a user environment that prioritizes researcher productivity and ease of use.

* Hardware highlights of Bridges-2 include HPC nodes with 128 cores and 256 to 512GB of RAM, scalable AI with 8 NVIDIA Tesla V100-32GB SXM2 GPUs per accelerated node and dual-rail HDR-200 InfiniBand between GPU nodes, a high-bandwidth, tiered data management system to support data-driven discovery and community data, and dedicated database and web servers to support persistent databases and domain-specific portals (science gateways).

* User environment highlights include interactive access to all node types for development and data analytics; Anaconda support and optimized containers for TensorFlow, PyTorch, and other popular frameworks; and support for high-productivity languages such as Jupyter notebooks, Python, R, and MATLAB including browser-based (OnDemand) use of Jupyter, Python, and RStudio. A large collection of applications, libraries, and tools will make it often unnecessary for users to install software, and when users would like to install other applications, they can do so independently or with PSC assistance. Novices and experts alike can access compute resources ranging from 1 to 64,512 cores, up to 192 V100-32GB GPUs, and up to 4TB of shared memory.

* Bridges-2 will support community datasets and associated tools, or Big Data as a Service (BDaaS), recognizing that democratizing access to data opens the door to unbiased participation in research. Similarly, Bridges-2 is available to support courses at the graduate, undergraduate, and even high school levels. It is also well-suited to interfacing to other data-intensive projects, instruments, and infrastructure.

* Bridges-2 will contain three types of nodes: Regular Memory (RM), Extreme Memory (EM), and GPU (Graphics Processing Unit; GPU). These are described in turn below.

* Bridges-2 Regular Memory (RM) nodes will provide extremely powerful general-purpose computing, machine learning and data analytics, AI inferencing, and pre- and post-processing. Each of Bridges-2’s 504 RM nodes will each consist of two AMD 7742 “Rome” CPUs (64 cores, 2.25-3.4 GHz, 3.48 Tf/s peak), 256-512 GB of RAM, 3.84 TB NVMe SSD, and one HDR-200 InfiniBand adaptor. 488 Bridges-2 RM nodes have 256 GB RAM, and 16 have 512 GB RAM for more memory-intensive applications. Bridges-2 will be HPE Apollo 2000 Gen11 servers.

* Bridges-2 Extreme Memory (EM) nodes will provide 4TB of shared memory for genome sequence assembly, graph analytics, statistics, and other applications that need a large amount of memory and for which distributed-memory implementations are not available. Each of Bridges-2’s 4 EM nodes will consist of four Intel Xeon Platinum 8260M CPUs, 4 TB of DDR4-2933 RAM, 7.68 TB NVMe SSD, and one HDR-200 InfiniBand adaptor. Bridges-2 will be HPE ProLiant DL385 Gen10+ servers.

* Bridges-2 GPU (GPU) nodes will be optimized for scalable artificial intelligence (AI). Each of Bridges-2’s 24 GPU nodes will contain 8 NVIDIA Tesla V100-32GB SXM2 GPUs, providing 40,960 CUDA cores and 5,120 tensor cores. In addition, each GPU node will contain two Intel Xeon Gold 6248 CPUs, 512 GB of DDR4-2933 RAM, 7.68 TB NVMe SSD, and two HDR-200 adaptors. Their 400 Gbps connection will enhance scalability of deep learning training across up to 192 GPUs. The GPU nodes can also be used for other applications that make effective use of the V100 GPUs’ tensor cores. Bridges-2 GPU nodes will be HPE Apollo 6500 Gen10 servers.

* The Bridges-2 Ocean data management system will provide a unified, high-performance filesystem for active project data, archive, and resilience. Ocean will consist of two tiers – disk and tape – transparently managed by HPE DMF (Data Management Framework) as a single, highly usable namespace, and a third all-flash tier will accelerate AI and genomics. Ocean’s disk subsystem, for active project data, is a high-performance, internally resilient Lustre parallel filesystem with 15 PB of usable capacity, configured to deliver up to 129 GB/s and 142 GB/s of read and write bandwidth, respectively. Its flash tier will provide 9M IOps and an additional 100 GB/s. The disk and flash tiers will be implemented as HPE ClusterStor E1000 systems. Ocean’s tape subsystem, for archive and additional resilience, is a high-performance tape library with 7.2 PB of uncompressed capacity (estimated 8.6 PB compressed, with compression done transparently in hardware with no performance overhead), configured to deliver 50TB/hour. The tape subsystem will an HPE StoreEver MSL6480 tape library, using LTO-8 Type M cartridges. (The tape library is modular and can be expanded, if necessary, for specific projects.)

* Bridges-2 , including both its compute nodes and its Ocean data management system, is internally interconnected by HDR-200 InfiniBand in a fat tree Clos topology. Bridges-2 RM and EM nodes each have one HDR-200 link (200 Gbps), and Bridges-2 GPU nodes each have two HDR-200 links (400 Gbps) to support acceleration of deep learning training across multiple GPU nodes.

* Bridges-2 will be federated with Neocortex, an innovative system also at PSC that will provide revolutionary deep learning capability that accelerates training orders of magnitude. This will complement the GPU-enabled scalable AI available on Bridges-2 and provide transformative AI capability for data analysis and to augment simulation and modeling.

More information about the Bridges-2 resource can be found at: (https://www.psc.edu/resources/bridges-2/user-guide/)

----------------

SDSC is pleased to announce it’s newest supercomputer Expanse. Expanse will be a Dell integrated cluster, composed of compute nodes with AMD Rome processors, GPU nodes with NVIDIA V100 GPUs (with NVLINK), interconnected with Mellanox HDR InfiniBand in a hybrid fat-tree topology. Expanse supercomputer will provide three new resources for allocation. Limits noted below are subject to change, so consult the Expanse website for the most up-to-date information. (https://expanse.sdsc.edu)

* (1) Expanse Compute: The compute portion of Expanse features AMD Rome processors, interconnected with Mellanox HDR InfiniBand in a hybrid fat-tree topology. There are 728 compute nodes, each with two 64-core AMD EPYC 7742 (Rome) processors for a total of 93,184 cores in the full system. Each compute node features 1TB of NVMe storage, 256GB of DRAM per node, and PCIe Gen4 interfaces. Full bisection bandwidth will be available at the rack level (56 nodes) with HDR100 connectivity to each node. HDR200 switches are used at the rack level and are configured for a 3:1 over-subscription between racks. In addition, Expanse has four 2 TB large memory nodes. There are two allocation request limits for the Expanse Compute resource:, 1) there is a maximum request(SU) limit of 15M SUs except for Science Gateway requests, which may request larger amounts (up to 30M SUs); and 2) a limit on the maximum size of a job set at 4,096 cores, with higher core counts possible by special request.

* (2) Expanse GPU: The GPU component of Expanse has 52 GPU nodes each containing four NVIDIA V100s (32 GB SMX2), connected via NVLINK, and dual 20-core Intel Xeon 6248 CPUs. Each GPU node has 1.6TB of NVMe storage and 256GB of DRAM per node, and HDR100 connectivity.

* (3) Expanse Projects Storage: Lustre-based allocated storage will be available as part of an allocation request. The filesystem will be available on both the Expanse Compute and GPU resources. Storage resources, as with compute resources, must be requested and justified, both in the XRAS application and the proposal’s main document.

* Expanse will feature two new innovations: 1) scheduler-based integration with public cloud resources; and 2) composable systems, which supports workflows that combine Expanse with external resources such as edge devices, data sources, and high-performance networks.

* Since the Expanse AMD Rome CPUs are currently not available for benchmarking, PIs are requested to use Comet (or any comparable system) performance/scaling information in their benchmarking and scaling section. For the Expanse GPU nodes, PIs can use performance info on V100 GPUs (if available) or use 1.3X speed up over Comet P100 GPU (or comparable GPU) performance as a conservative estimate. The time requested must be in V100 GPU hours.

*PIs requesting allocations should consult the Expanse website (https://expanse.sdsc.edu) for additional details and the most current information.