This NSF PI meeting aims at further building the community around the NSF CSSI and its precursor programs (DIBBs, SI2) toward a national cyberinfrastructure ecosystem. The meeting provides a forum for PIs to share technical information about their projects with each other, NSF program directors and others; to explore innovative topics emerging in the software and data infrastructure communities; to discuss and learn about best practices across projects; and to stimulate new ideas of achieving software and data sustainability and new collaborations.

At least one representative (PI/Co-PI/senior personnel) from each active CSSI, SI2 and DIBBS project is required by NSF to attend the meeting and give a one-minute lightning talk and a poster presentation on their project. For a collaborative project with multiple awards, one representative from the entire project is required. Anyone who is interested in attending the meeting is welcome and encouraged to do so. There is no registration fee, but participants need to register by the deadline to allow for planning logistics such as space and food.

This meeting venue is in close vicinity to and concurrently with the SIAM Conference on Parallel Processing for Scientific Computing (PP20). Attendees of PP20 are encouraged to join and interact with CSSI investigators and visit the poster sessions in the CSSI PI meeting.

Report The final workshop report is available here: Report
Your Feedback Please provide your feedback on the meeting by answering a few questions on this form: The organizing committee will pass on your feedback to future CSSI PI meeting organizers. We will also include your aggregated feedback in the meeting report to NSF.
Future CSSI Program The organizing committee would like to hear your thoughts on the future directions of the CSSI programs. Please enter your suggestions and comments into this Google doc by Friday, February 28, 2020. We will include a summary of your suggestions in the meeting report to NSF.
Mailing List We now have a mailing list for future communicatios (e.g., announcements and requests). This list, renamed from the previous SI2-PI mailing list, already includes many of the 2020 PI meeting participants. If you are not on the list already, you may opt in by going to and click on "Subscribe to this group".

For how to join a Google Group without using a Gmail address, please refer to this Google help page
Remote participation Remote participation will be provided via Zoom.
Only talks will be available via Zoom.
Join Zoom meeting:
Lightning Talks Presentation schedule is available at the Lightning Talks page. Please note your session and order number. Slides must be submitted by Sunday, Feb. 9. We will not be able to incorporate late submissions into the slide deck.
Posters Each poster session will follow a lightning talk round for that group of projects. Bring your poster to the poster hall before the poster session starts. Detailed information regarding posters is available at the Posters page.
Important Dates
Registration December 20, 2019
Registration closed. Wait-list only.
Hotel group rate cutoff January 22, 2020
Poster (pdf) upload February 4, 2020 (extended)
Lightning talk slide (pdf) upload February 4, 2020 (extended)
Meeting dates February 13-14, 2020


Due to an enthusiastic response, the official registration for 2020 NSF CSSI PI Meeting is now closed as we have reached the maximum capacity planned for this event. You may request to be placed on a waiting list and will be accepted on a first-come-first-served basis if additional openings become available. Thank you for your understanding.

To enroll on the waiting list, please email us at

NSF project personnel attending the meeting need to register by the deadline. The designated presenter for each project needs to include NSF award number, project title, and an abstract of no more than 150 words for the poster on the registration form.

Since the registration form is currently turned off, you cannot edit your registration information.

To share your lightning talk slide and poster, please fill out this form.


The meeting venue will be Residence Inn by Marriott Seattle Downtown/Convention Center, 1815 Terry Avenue, Seattle, Washington 98101 (map direction).

Book your stay with group discount rate before January 22, 2020!

Locations of the CSSI PI Meeting and SIAM PP20:

Locations of CSSI PI meeting and SIAM PP20


Thursday, February 13, 2020 --- Ballroom, 2nd Floor
Time Event Speaker Title
8:00 AM to 8:30 AM Registration
8:30 AM Welcome & Announcements
8:45 AM Opening remarks Vipin Chaudhary NSF/OAC NSF Presentation
9:00 AM Lightning Talk Session #1
10:00 AM Coffee Break
10:20 AM Poster Session #1
11:45 PM Lunch
1:00 PM Invited talk Marianna Safronova University of Delaware Community Portal for High-Precision Atomic Physics Data and Computation
Abstract Slides Poster

The goal of this project is to provide the scientific community with easily accessible high-quality atomic data and user-friendly broadly-applicable modern relativistic code to treat electronic correlations. The code will be capable of calculating a very broad range of atomic properties to answer the significant needs of atomic, plasma, and astrophysics communities. We also propose a creation on an online portal for high-precision atomic physics data and computation that will provide a variety of services to address the needs of the widest possible community of users. The portal will contribute a novel element to today's U.S. cyberinfrastructure ecosystem, improving usability and access for the atomic physics community and their fields of application.

1:20 PM Invited talk Fred Hansen Nexight Group Charting a Path Forward: Insights from a CSSI PI Survey
Abstract Slides

In developing investment priorities, the CSSI programs seeks to engage the capabilities, curiosity and creativity CI research community PIs. Ongoing feedback from and dialogue with PIs from the CI research community is therefore critical. Surveys are an efficient and effective mechanism for staying connected and collecting input. This session describes the methodology and results of a survey of principal investigators (PIs), co-PIs, and others the CI research community. The survey, which was carried out by the Nexight Group under the Office of Advanced Cyberinfrastructure (OAC) Award (#1930025), had two primary purposes. First, the survey was designed to inform decisions about changes to be made to the National Science Foundation (NSF) Cyberinfrastructure for Sustained Scientific Innovation (CSSI) solicitation. Second, the survey was designed to inform decisions about the future direction and focus of the NSF CSSI umbrella program. The survey results provide insights that enhance CSSI’s support of a cyberinfrastructure for scientific research.

1:40 PM Lightning Talk Session #2
2:40 PM Coffee Break
3:00 PM Poster Session #2
4:00 PM NSF Presentation Stefan RobilaNSF/OAC Future Steps of CSSI
4:15 PM Panel Discussion Moderator: Haiying Shen
Panelists: Geoffrey Charles Fox, Juliana Freire, Philip Harris, Andreas Mueller
ML for CI and CI for ML
5:00 PM to 8:00 PM Reception
Friday, February 14, 2020 --- Ballroom, 2nd Floor
Time Event Speaker Title
8:00 AM to 8:30 AM Registration
8:30 AM Recap & Day 2 Agenda
8:45 AM Lightning Talk Session #3
9:45 AM Coffee Break
10:00 AM Poster Session #3
11:00 AM Open-Mic session Moderator: Ritu Arora
12:00 PM Lunch
1:00 PM Invited talk Gordon Watts University of Washington The Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP)
Abstract Slides Poster

The Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP) is a Software Institute funded by the National Science Foundation. It aims to develop the state-of-the-art software cyberinfrastructure required for the challenges of data intensive scientific research at the High Luminosity Large Hadron Collider (HL-LHC) at CERN, and other planned HEP experiments of the 2020’s. These facilities are discovery machines which aim to understand the fundamental building blocks of nature and their interactions. In this talk I will discuss a bit of this history and some highlights from our first 15 months of operation (OAC-1836650).

1:20 PM Invited talk Madhav Marathe University of Virginia Networks, Simulation Science and Advanced Computing
Abstract Slides

Reasoning about real-world social habitats often represented as multiplexed co-evolving networks is complicated and scientifically challenging due to their size, co-evolutionary nature and the need for representing multiple dynamical processes simultaneously. The 2014 Ebola epidemic, 2009 financial crisis, global migration, societal impacts of natural and human initiated disasters and the effect of climate change provide examples of the many challenges faced when developing such environments. Advances in computing has fundamentally altered how such networks can be synthesized, analyzed and reasoned.

We will briefly describe our work on co-evolving socio-technical networks by drawing on our work in urban transport planning, national security and public health epidemiology to guide the discussion. We will conclude by describing an exciting new project funded by the NSF that aims to develop CINES: a scalable cyberinfrastructure for network science. CINES builds on a prior NSF project, CINET and serve as a community resource for network science. CINES provide an extensible platform for producers and consumers of network science data, information, and software.

1:40 PM Lightning Talk Session #4
2:30 PM Poster Session #4
3:30 PM Coffee Break
4:00 PM Invited Talk Michela Taufer University of Tennessee Knoxville Cyberinfrastructure Tools for Precision Agriculture in the 21st Century
Abstract Slides

Soil moisture is a critical variable that links climate dynamics with water and food security. It regulates land-atmosphere interactions (e.g., via evapotranspiration---the loss of water from evaporation and plant transpiration to the atmosphere), and it is directly linked with plant productivity and plant survival. The current availability in soil moisture data over large areas comes from remote sensing (i.e., satellites with radar sensors), which provides nearly global coverage of soil moisture at spatial resolution of tens of kilometers. Satellite soil moisture data has two main shortcomings. First, although satellites can provide daily global information, they are limited to coarse spatial resolution (at the multi-kilometer scale). Second, satellites are unable to measure soil moisture in areas of dense vegetation, snow cover, or extremely dry surfaces; this results in gaps in the data.

In this talk, we will present how we address these two shortcomings with a modular SOil MOisture Spatial Inference Engine (SOMOSPIE). SOMOSPIE consists of modular components including input of available data at its native spatial resolution, selection of a geographic region of interest, prediction of missing values across the entire region of interest (i.e., gap-filling), analysis of generated predictions, and visualization of both predictions and analyses. To predict soil moisture, our engine leverages hydrologically meaningful terrain parameters (e.g., slope and topographic wetness index) calculated using an open source platforms for standard terrain analysis (i.e., SAGA-GIS or System for Automated GeoScientific Analysis-Geographical Information System) and a suite of machine learning methods. We will present empirical studies of the engine's functionality including an assessment of data processing and fine-grained predictions over United States ecological regions with a highly diverse soil moisture profile.

4:20 PM Closing Remarks NSF
4:30 PM Meeting Adjourned

Poster Presentation

Each active CSSI/DIBBs/SI2 project is expected to present a poster on the project at the PI meeting. You will need to print out and bring a physical copy of your poster on your own; we will not be printing any posters. Collaborative projects (including across multiple institutions) should only bring one poster.

The size of your poster should be no bigger than 24 inches (60 cm) wide and 36 inches (90 cm) tall. A Powerpoint poster template is available here: CSSI Poster Template.pptx.

We will use Figshare to share the posters digitally. Please follow these steps to upload your poster by February 4, 2020:

  1. Create or log into your Figshare account.
  2. Follow steps from "My Data" -> "Create a new item" to bring up the content upload form.
  3. Fill in the appropriate metadata (authors, title).
  4. Set the "Item type" to poster, and at the keyword stage put "NSF-CSSI-2020-Poster" as one of the chosen keywords. (You must hit return/enter after typing each keyword.)
  5. You may also want to add your NSF award # to the "Funding" section.
  6. For license, we recommend selecting "CC BY" (which should be the default).
  7. Please also add a brief abstract describing your project.
  8. Hit publish! (so your poster will be accessible by others)

After you have uploaded your poster to Figshare, please use this form to fill in the URL pointing to your poster.

View 2020 CSSI PI meeting posters on Figshare

Obtaining the URL to your poster pdf on Figshare:

  1. Sign in to and go to "My data", your poster PDF should be displayed among the files that you have uploaded.
    my data on figshare
  2. Click on your poster from the list, your poster and its associated information will be displayed (as the example below). Copy the URL and update your registration form.
    my data on figshare

Lightning Talks

Each project will also give a brief, one-minute lightning talk to introduce their poster. This is an opportunity to drive meeting participants to your poster. To avoid any technical issues and minimize delays between talks, one slide per lightning talk will need to be submitted by February 4, 2020.

Each slide should include the following information:

  • Project title
  • Names of the investigators and presenter
  • NSF award number
  • NSF program that funds the project

A Powerpoint template is available here: CSSI Slide Template.pptx.

We will use Figshare to gather your 1 slide pdf files. Follow the instructions in the "Posters" page to upload your slide (PDF) to Figshare, but use the keyword "NSF-CSSI-2020-Talk" as one of the chosen keywords. Your PDF slide will be shown during your one-minute Lightning Talk.

After you have uploaded your slide to Figshare, please use this form to fill in the URL pointing to your slide.

View 2020 CSSI PI meeting lightning talk slides on Figshare

Presentation Schedule

Session 1 - Feb 13, Morning
# Name Organization NSF Award Abstract Poster Talk
Nagarajan Kandasamy Drexel University Collaborative Research: SI2-SSE: High-Performance Workflow Primitives for Image Registration and Segmentation Award #: 1642380 Abstract

Image registration is an inherently ill-posed problem that lacks a unique mapping between voxels of the two images being registered. As such, we must confine the registration to only physically meaningful transforms by regularizing it via an appropriate penalty term which can be calculated numerically or analytically. The numerical approach, however, is computationally expensive depending on the image size, and therefore analytical methods are preferable. Using cubic B-splines as the basis for registration, we develop a generalized mathematical framework that accommodates five distinct types of regularizers: diffusion, curvature, linear elastic, third-order, and total displacement. We validate our approach by testing the accuracy achieved by each of the regularizers against their numerical counterpart. We also provide benchmarking results showing that the analytic solutions run significantly faster --- up to two orders of magnitude --- than central-differencing based numerical solutions.

As a case study, we show how to use deformable registration to improve the accuracy of the Multi-atlas based segmentation (MABS). The effect of various regularizers on MABS accuracy is compared using the Dice coefficient and Hausdorff distance metrics.

Poster Slides
Cheryl Tiahrt University of South Dakota The South Dakota Data Store, a Modular, Affordable Platform to Enable Data-Intensive Research and Education Award #: 1659282 Abstract

Expanding opportunities for data-driven research and increasing requirements for data management in sponsored research have resulted in a growing need for retention of both long-term archival data sets that are infrequently accessed, as well as 'active archives' of data that are accessed periodically to revisit, revise, and share experimental results. For this project, the University of South Dakota acquired, deployed, and maintains the South Dakota Data Store (SDDS), a network-accessible, sharable, multi-campus storage resource integrated with existing campus cyberinfrastructure. SDDS supports twelve STEM projects across eight departments at four institutions in South Dakota, including 30 faculty, 43 postdocs, and 303 students. SDDS provides South Dakota researchers with a centralized, efficient, high-performance platform for both archival of and shared access to large quantities of electronic data.

Poster Slides
Ewa Deelman University of Southern California SI2-SSI: Pegasus: Automating Compute and Data Intensive Science Award #: 1664162 Abstract

For almost 20 years the Pegasus Workflow Management System has been designed, implemented and supported to provide abstractions that enable scientists to focus on structuring their computations without worrying about the details of the target cyberinfrastructure. To support these workflow abstractions Pegasus provides automation capabilities that seamlessly map workflows onto target resources, sparing scientists the overhead of managing the data flow, job scheduling, fault recovery and adaptation of their applications. Automation enables the delivery of services that consider criteria such as time-to-solution, as well as takes into account efficient use of resources, managing the throughput of tasks, and data transfer requests.

Poster Slides
Klaus Bartschat Drake University Elements: NSCI-Software -- A General and Effective B-Spline R-Matrix Package for Charged-Particle and Photon Collisions with Atoms, Ions, and Molecules Award #: 1834740 Abstract

The project deals with the further development and subsequent distribution of a suite of computer codes that can accurately describe the interaction of charged particles (mostly electrons) and light (mostly lasers and synchrotrons) with atoms and ions. The results are of significant importance for the understanding of fundamental collision dynamics, and they also fulfill the urgent practical need for accurate atomic data to model the physics of stars, plasmas, lasers, and planetary atmospheres. With the rapid advances currently seen in computational resources, such studies can now be conducted for realistic systems, as opposed to idealized models. Due to the high demand for data generated with the present program, the source code as well as instructional material to make the program usable by other interested researchers will be made publicly available. Versions to run on desktop workstations as well as massively parallel supercomputers will be created.

Poster Slides
David Sandwell Univ. of California, San Diego Elements: Software - Harnessing the InSAR Data Revolution: GMTSAR Award #: 1834807 Abstract

GMTSAR is an open source InSAR processing system for generating wide-area mapping of the deformation of the surface of the Earth using repeated synthetic aperture radar (SAR) images collected by spacecraft ( ). The major deformation signals of interest are associated with earthquakes, volcanoes, glacier flow, and subsidence due to withdrawal of crustal fluids (e.g., water and hydrocarbons).

Poster Slides
Richard Evans University of Texas at Austin NSCI Elements: Software - PFSTRASE - A Parallel FileSystem TRacing and Analysis SErvice to Enhance Cyberinfrastructure Performance and Reliability Award #: 1835135 Abstract

Parallel Filesystems are a critical yet fragile component in modern HPC systems. As a resource that is shared and required by most jobs running on an HPC system, a single job can adversely impact all other jobs. This impact can vary from performance degradation to job failure. PFSTRASE monitors the filesystem at all times and provides the necessary information to immediately identify the contribution of every node, job, and user to the filesystem load. The monitoring agent that collects data generates negligible load on the filesystem servers and clients while the backend that analyzes and presents the data is scalable to the largest filesystems. The infrastructure currently supports Lustre but is designed to be extensible to other filesystems. A filesystem has been deployed to verify and validate the infrastructure. An Ansible-based provisioning system has been developed to enable the rapid deployment and reconfiguration of a Lustre filesystem.

Poster Slides
Kennie Merz Michigan State University CSSI: Efficient GPU Enabled QM/MM Calculations: AMBER Coupled with GPU Enabled QUICK Award #: 1835144 Abstract

The goal of our project is to continue to improve our software cyberinfrastructure aimed at solving important molecular-level problems in catalysis, drug design, energy conversion. Combined quantum mechanical/molecular mechanical (QM/MM) models have enabled significant advances in our understanding of chemical reactivity. The shortcoming of QM/MM models when using ab initio or density functional theory (DFT) methods is the computational expense, which limits QM/MM modeling. The performance of QM methods has been greatly improved over the years through algorithmic and hardware improvements. In our poster we describe the enhancements and performance of our GPU enabled Quantum Interaction Computational Kernel (QUICK) QM program combined with the Sander molecular dynamics (MD) engine from AMBER. AMBER is one of the most popular simulations packages and has been supported and sustained by the AMBER developer community for ~30 years. The developed software is available to the community via the open source AMBERTools package (see

Poster Slides
Ute Herzfeld University of Colorado Boulder Element: Software: Data-Driven Auto-Adaptive Classification of Cryospheric Signatures as Informants for Ice-Dynamic Models Award #: 1835256 Abstract

Both collection of Earth observation data from satellites and modeling of physical processes have seen unprecedented advances in recent years, but data-derived information is not used to inform modeling in a systematic and automated fashion. This creates a bottleneck that is growing with the data revolution. The objective of this project is to develop a connection between Earth observation data and numerical models of Earth system processes, through the use of an automated classification and parameterization system proto-typed by the PI's group. To take matters another step forward towards a transformation of the data-modeling connection, we are developing a data-driven auto-adaptive classification system that will utilize satellite data to derive informants for numerical models. The cyberinfrastructure will be implemented in a general and transportable way, but its functionality will be demonstrated by addressing a concrete open problem in glaciology: The acceleration during a glacier surge, which is characterized by an increase to 100-200 times the flow normal velocity. Glacial accelerations are important, because they constitute the largest uncertainty in sea-level-rise assessment.

Poster Slides
Carlo Piermarocchi Michigan State University Elements: Software: NSCI: A Quantum Electromagnetics Simulation Toolbox (QuEST) for Active Heterogeneous Media by Design Award #: 1835267 Abstract

Designing novel optical materials with enhanced properties would impact many areas of science and technology, leading to new lasers, better components for photonics, and to a deeper understanding of how light interacts with matter. This project will develop software that simulates how light would propagate in yet to be made complex optical materials. The final product will be a software toolbox that computes the dynamics of each individual light emitter in the materials rather than calculating an average macroscopic field. This toolbox will permit the engineering and optimization of optical properties by combining heterogeneous components at the nanoscale.

Poster Slides
Yuanfang Cai Drexel University Collaborative Research: Elements: Software: Software Health Monitoring and Improvement Framework Award #: 1835292 Abstract

This project seeks to bridge the gap between software engineering community and other science and engineering community in general. It will provide quantitative comparisons of software projects against an industrial benchmark, enable users to pinpoint software issues responsible for high maintenance costs, visualize the severity of the detected issues, and refactor them using the proposed interactive refactoring framework. The proposed framework will bring together software users and software developers by enabling non software experts to post software challenges for the software community to solve, which will, in turn, boost the research and advances in software research

Poster Slides
Daniel Shapero University of Washington icepack: an open-source glacier flow modeling library in Python Award #: 1835321 Abstract

I will present a new software package for modeling the flow of glaciers and ice sheets called icepack. Icepack is developed using the finite element modeling package firedrake, which provides a domain-specific language (DSL) embedded into Python for the specification of differential equations. The use of this DSL lowers the barrier to entry for development of new physics models for practicing scientists who are not experts in scientific computing.

Poster Slides
Mohamed Soliman Oklahoma State University Element: Data: HDR: Enabling data interoperability for NSF archives of high-rate real-time GPS and seismic observations of induced earthquakes and structural damage detection in Oklahoma Award #: 1835371 Abstract

This project focuses on enabling research into hazard mitigation for vulnerable buildings in Oklahoma subjected to the recent increase in induced seismicity, and cumulative damage due to successive earthquakes. This is being conducted by expanding cyberinfrastructure that transmits real-time geophysical and engineering data for use in algorithms to provide low frequency and static deformation measurements of ground motion and building response. The investigation covers differences between source processes and frequency content of earthquakes in tectonically active environments versus induced earthquakes in tectonically passive regions of oil and gas exploration and wastewater injection, which could also have significant implications for seismic hazard mitigation. The objectives will be accomplished by developing and demonstrating modules for the Antelope Environmental Monitoring System that transmit additional sensor data and products created in combination with seismic data. The system architecture will assure data integrity, reduce bandwidth requirements, and alleviate telecommunication bottlenecks from remote multi-sensor stations.

Poster Slides
Bruce Berriman California Institute of Technology Elements: Bringing Montage To Cutting Edge Science Environments Award #: 1835379 Abstract

We describe the use of Montage to create all-sky astronomy maps compliant with the Hierarchical Progressive Survey (HiPS) sky-tesselation scheme. These maps support panning and zooming across the sky to progressively smaller scales, and are used widely for visualization in astronomy. They are, however, difficult to create at infrared wavelengths because of high background emission. Montage is an ideal tool for creating infrared maps for two reasons: it uses background modeling to rectify the time variable image backgrounds to a common level; and it uses an adaptive image stretch algorithm to convert the image data to display values for visualization. The creation of the maps involves the use of existing Montage tools in tandem with four new tools to support HiPS. We wil present images of infrared sky surveys in the HiPS scheme.

Poster Slides
James Bordner University of California, San Diego Collaborative Research:Framework:Software:NSCI:Enzo for the Exascale Era (Enzo-E) Award #: 1835402 Abstract

Cello is a highly scalable "array-of-octree" based adaptive mesh
refinement (AMR) software framework, implemented using Charm++, an
object-oriented message-driven parallel programming system. Enzo-E,
being developed concurrently with Cello, is a branch of the ENZO
astrophysics and cosmology application that has been modified to use
the Cello scalable AMR framework. The Cello framework provides a
scientific application with mesh adaptivity, data-driven ghost cell
refresh, generic field and particle data types, and task-based
asynchronous distributed computation on block data. In this
presentation we describe Cello's distributed data structures and
asynchronous algorithms, which include a revised buffered refresh
scheme, and a recently-implemented domain-decomposition based scalable
gravity solver. We also present parallel scaling results of Enzo-E
simulations of cosmological structure formation, including more
recent experiments with dynamic load balancing.

Poster Slides
Bryna Hazelton University of Washington Collaborative Research: Elements: Software: Accelerating Discovery of the First Stars through a Robust Software Testing Infrastructure Award #: 1835421 Abstract

The birth of the first stars and galaxies 13 billions years ago -- our "Cosmic Dawn" -- is one of the last unobserved periods in the history of the Universe. Scientists are working to observe the 21 cm radio light emitted by the primeval neutral hydrogen fog as the first stars formed and reionized the universe. One of the biggest challenges for the detection of the Epoch of Reionization is the presence of bright astrophysical foregrounds that obscures the signal of interest, requiring extraordinarily precise modeling and calibration of the radio telescopes performing these observations. The 21 cm cosmology community is rapidly developing new techniques for instrument calibration, foreground removal and analysis, but thorough testing and integration into existing data analysis pipelines has been slow. This project provides a software infrastructure that enables rigorous, seamless testing of novel algorithmic developments within a unified framework. This infrastructure also ensures a new level of reliability and reproducibility not previously possible and accelerates the speed at which developments become integrated into production level code, providing an invaluable foundation for bringing our field into the next decade and for leveraging the current NSF investments in these experiments.

Poster Slides
Juan Pablo Vielma MIT Framework: Software: Next-Generation Cyberinfrastructure for Large-Scale Computer-Based Scientific Analysis and Discovery Award #: 1835443 Abstract

This project seeks to develop methods and software for computer-based scientific analysis that are sufficiently powerful, flexible and accessible to (i) enable domain experts to achieve significant advancements within their domains, and (ii) enable innovative use of advanced computational techniques in unexpected scientific, technological and industrial applications. In this poster we report on the progress towards this goal by describing advancements both in the development and application of the associated cyberinfrastructure. In particular, we report on the use of mathematical optimization techniques for the analysis of economically viable pathways for decarbonization of electrical power networks. We also report on the development of next-generation of interior point algorithms for convex optimization and their potential to revolutionize applications in machine learning and optimal control. Finally, we report on various community building activities.

Poster Slides
Thomas Hacker Purdue University Elements: Data: Integrating Human and Machine for Post-Disaster Visual Data Analytics: A Modern Media-Oriented Approach Award #: 1835473 Abstract

Our poster describes VISER (Visual Structural Expertise Replicator), a visual image service we are investigating based on automated image classification and a scalable cyberinfrastructure.

Poster Slides
Jordan Powers National Center for Atmospheric Research CSSI Software Elements: Cloud WRF for the Atmospheric Research and Education Communities Award #: 1835511 Abstract

The Weather Research and Forecasting (WRF) Model is the world's most popular numerical weather prediction model and is supported by the National Center for Atmospheric Research (NCAR) to a community of users across universities, research labs, and operational centers. This effort is exploiting the emerging technology of cloud computing for the critical WRF support effort and for the benefit of the worldwide user community. The work has established an officially-supported version of WRF in the cloud that extends system accessibility, improves model support and training, and facilitates model development. The components include: cloud-configured WRF system code; cloud WRF tutorial materials; and a cloud-based testing capability for code contributions. To date, the cloud WRF materials have been used for tutorials on the modeling system, for student instruction at universities, and for new system version releases.

Poster Slides
Asti Bhatt SRI International Integrated Geoscience Observatory Award #: 1835573 Abstract

Geoscientists arrive at scientific results by analyzing observations from a diverse set of instrumentation and often assimilate them into a model. Effective collaboration between scientists can be hampered when they are using different resources, resulting in a lengthy and laborious process. Individual researchers need to assemble many of the community resources on their own before they can conduct successful research or get credit for their work. The Integrated Geoscience Observatory (InGeO) project tackles the problem of seamless collaboration between geoscientists by creating a platform where the data from disparate instruments can be brought together with software tools for data interpretation provided by the instrument operators.

The primary goals of the InGeO project are to:
1. Provide tools that make it easy for geospace researchers to collaborate, share work, reproduce results, and build on tools that have already been developed.
2. Educate the geospace community on best practices for software development and data archiving to ensure data and the tools needed to work with it are available to a broad range of researchers in the community with minimal barriers to entry.

Our solution to the first goal has been Resen - a tool to enable reproducibility and collaboration. Resen allows community developed toolkits to be easily accessed and creates a convenient mechanism for researchers to save and share their results along with the analysis that produced them. Resen is written in Python and uses Docker containers. The current software packages in Resen allow you to access common geospace data sources including MANGO, Madrigal and SuperDARN. We plan to add functionality for other data sources in the future releases.

Poster Slides
Genevieve Bartlett ISI/University of Southern CA Elements: Software: Distributed Workflows for Cyberexperimentation(Elie) Award #: 1835608 Abstract

Distributed Workflows for Cyberexperimentation (Elie) is a new experiment representation. Elie enables the researcher to abstract the definition of an experiment from its realization. It encodes the desired behavior of an experiment at a high-level as a scenario (e.g. “generate attack from A to B, wait 10 seconds, turn on defense at C”), and provides sufficient details as to how each action in a scenario can be realized on the testbed, via bindings (e.g. use script with specific parameters for the attack action). Elie further encodes only those features of testbed topology, which matter for the experiment, via constraints (e.g. “use Ubuntu OS on C”).

When an experiment is to be realized on the testbed, the constraints section of Elie is used to generate a resource request for the testbed. Once the experiment is allocated—physical nodes are reserved and loaded with the operating system—the scenario and bindings are used, along with allocation details, to produce scripts, which run on the nodes.

When the researcher runs the experiment on the testbed they parameterize and run these scripts, possibly interspersed with manual actions, which produces a run history. Together, the Elie representation, node allocations and the run history represent a complete record of an experiment, which can be shared and reused by others.

Poster Slides
Chris Hill MIT Collaborative Research: Framework: Data: Toward Exascale Community Ocean Circulation Modeling Award #: 1835618 Abstract

We are developing a community model virtual solution for ocean and climate studies geared toward both classical analysis and modern synthetic training activities.

Poster Slides
Dan Negrut University of Wisconsin-Madison Collaborative Research: Elements:Software:NSCI: Chrono - An Open-Source Simulation Platform for Computational Dynamics Problems Award #: 1835674 Abstract

The lightning talk and poster are tied to a software infrastructure called Project Chrono, which is developed through a joint project between the University of Wisconsin-Madison and University of Parma, Italy. Chrono provides simulation support for applications that roughly belong to the computational dynamics field: flexible and rigid body dynamics (Newton-Euler equations), fluid-solid interaction problems (mass balance and Navier Stokes), and granular dynamics (friction/contact/impact constitutive laws). The software leverages parallel computing paradigms (AVX vectorization, GPU cards, multi-core shared memory, and distributed memory). It has been used for Mars rover simulation, vehicle dynamics, autonomous vehicle and robotics simulation, wind energy harvesting, granular dynamics, off-road mobility analysis, planet formation, farming and food processing. It has an online forum with more than 300 registered users and it is released off GitHub as open source under a permissive BSD3 license. Release 5.0 is slated for February 2020.

Poster Slides
Cate Brinson Duke University Collaborative Research: Framework: Data: HDR: Nanocomposites to Metamaterials: A Knowledge Graph Framework Award #: 1835677 Abstract

A team of experts from five universities (Duke, RPI, Caltech, Northwestern and Univ of Vermont) develops an open-source materials resource, NanoMine, to enable discovery of fundamental processing-structure-property (p-s-p) relationships for polymer nanocomposites and demonstrates extensibility through the creation of a sister resource for metamaterials, MetaMine. The framework enables annotation, organization and storage of composition, processing, microstructure and property data, along with an array of analysis tools and advanced learning algorithms that facilitate discovery of quantitative p-s-p relationships. A broad spectrum of users can query the system, identify materials that may have certain characteristics, and automatically produce information about these materials. The effort demonstrates the capability of the designed data framework through two domain case studies: discovery of factors controlling dissipation in nanocomposites, and tailored mechanical response in metamaterials motivated by an application to personalize running shoes. The project will significantly improve the representation of data and the robustness with which expanding user communities can identify promising materials applications, enabling new collaborations in materials discovery and design. Strong connections with the National Institute of Standards and Technology (NIST), the Air Force Research Laboratory (AFRL), and Lockheed Martin facilitate industry and government use of the developing knowledge graph.

Poster Slides
Byung-Jun Yoon Texas A&M University Elements: Software: Autonomous, Robust, and Optimal In-Silico Experimental Design Platform for Accelerating Innovations in Materials Discovery Award #: 1835690 Abstract

Accelerating the development of novel materials that have desirable properties is a critical challenge as it can facilitate advances in diverse fields across science, engineering, and medicine. However, the current prevailing practice in materials discovery relies on trial-and-error experimental campaigns and/or high-throughput screening approaches, which cannot efficiently explore the huge design space to develop materials with the targeted properties. Furthermore, measurements of material composition, structure, and properties often contain considerable errors due to technical limitations in materials synthesis and characterization, making this exploration even more challenging. This project aims to develop an effective in-silico experimental design platform to accelerate the discovery of novel materials. The platform is built on optimal Bayesian learning and experimental design methodologies that can translate scientific principles into predictive models, in a way that takes model and data uncertainty into account. The optimal Bayesian experimental design framework will enable the collection of smart data that can help exploring the material design space efficiently, without relying on slow and costly trial-and-error and/or high-throughput screening approaches.

Poster Slides
Xiaogang Ma University of Idaho Elements: Software: HDR: A knowledge base of deep time to facilitate automated workflows in studying the co-evolution of the geosphere and biosphere Award #: 1835717 Abstract

Geologic time, though widely used as a fundamental framework in geoscience, is a system of concepts that faces the issue of semantic heterogeneity. Our work ( aims to build a machine-readable knowledge base of deep time. The planned objectives and activities are multi-fold. Semantic technologies will be used to model and encode the knowledge base for the collected standards. A service of the knowledge base will be set up for both human and machine to access and query the precise meaning of each concept. Through the support of the knowledge base, a few existing geoscience data facilities will be used to carry out case studies in data integration and analysis. Workflow platforms such as Jupyter Notebook will be used in those case studies. The project outputs will be shared on the community repositories, such as the ESIP community ontology repository, to support a national open knowledge network.

Poster Slides
Michael Shirts University of Colorado Collaborative Research: NSCI Framework: Software: SCALE-MS - Scalable Adaptive Large Ensembles of Molecular Simulations Award #: 1835720 Abstract

The goal of this project is to create a framework for expression and execution of adaptive ensemble molecular simulation algorithms, based on requirements elicited from the molecular dynamics community. The user-facing and developer-facing aspects of this framework are an adaptive ensemble API which is then backed by a capable runtime layer that can execute on production-scale cyberinfrastructure. Our framework design is grounded in both the application science and methods development communities and is designed as a community code. This effort is linked to driving scientific applications with very long time scales in biophysics, materials science, chemistry, and chemical engineering from the PIs’ laboratories and others in the molecular simulation community.

Poster Slides
David Hudak Ohio State University Frameworks: Software NSCI-Open OnDemand 2.0: Advancing Accessibility and Scalability for Computational Science through Leveraged Software Cyberinfrastructure Award #: 1835725 Abstract

High performance computing (HPC) has led to remarkable advances in science and engineering and has become an indispensable tool for research. Unfortunately, HPC use and adoption by many researchers is often hindered by the complex way in which these resources are accessed. Indeed, while the web has become the dominant access mechanism for remote computing services in virtually every computing area, it has not for HPC. Open OnDemand is an open source project to provide web based access to HPC resources. The primary goal of OnDemand is to lower the barrier of entry and ease access to HPC resources for both new and existing users. Through OnDemand users can create, edit, upload/download files, create, edit, submit and monitor jobs, create and share apps, run GUI applications and connect to a terminal, all via a web browser, with no client software to install and configure.

Poster Slides
Marouane kessentini University of Michigan-Dearborn Collaborative Research: Elements: Software: Software Health Monitoring and Improvement Framework Award #: 1835747 Abstract

software quality, computational intelligence, artificial assistant

Poster Slides
Xian-He Sun Illinois Institute of Technology Framework: Software: NSCI: Collaborative Research: Hermes: Extending the HDF Library to Support Intelligent I/O Buffering for Deep Memory and Storage Hierarchy Systems Award #: 1835764 Abstract

Modern HPC and distributed systems come equipped with deep memory and storage hierarchies (DMSH). The expert knowledge required to manage these multi-tier storage environments puts their benefits out of reach for most scientists and researchers. In this project, we propose the design and development of Hermes, a new, heterogeneous-aware, multi-tiered, dynamic, and distributed I/O buffering platform which provides: 1) Vertical and horizontal distributed buffering in the DMSH: 2) Selective buffering; 3) Adaptive and dynamic buffering via system and application profiling. We are developing new buffering algorithms and mechanisms that address the challenges of a DMSH ecosystem. This effort will eventually boost HDF5 core technology and facilitate an agile architecture that will allow the evolution of next generation I/O and will address the increasingly challenging scale and complexity of future systems. Hermes software is intended to support new scientific/engineering methodologies and will be carefully designed, implemented, and thoroughly tested.

Poster Slides
Yao Liang Indiana University Purdue University Indianapolis  CyberWater—An open and sustainable framework for diverse data and model integration with provenance and access to HPC Award #: 1835817 Abstract

To advance our fundamental understanding of the complex behaviors and interactions among the various Earth processes that are critical to the health, resilience and sustainability of water resources, scientists need to be able to use diverse data and integrate models outside their own disciplines with sufficient model accuracy and predictability. Currently, however, this is very difficult to accomplish: (1) a vast quantity of diverse data are not readily accessible to models; and (2) diverse models developed individually by different research groups are difficult to share and integrate cross disciplines. To address these critical challenges, we propose to develop an open and sustainable framework software in cyberinfrastructure (CI) that enables easy and incremental integration of diverse data and models for knowledge discovery and interdisciplinary team-work, and also enables reproducible computing and the seamless and on demand access to various HPC resources which are essential and desirable for communities. Our proposed project is fundamentally important and addresses urgent need in enabling new scientific advances for all water related issues.

The goal of this NSF funded multi-institution multi-discipline project is to build an open data, open modeling framework software - a new cyberinfrastructure called CyberWater. CyberWater will expedite the process for fundamental knowledge discoveries and significantly reduce the time and effort on the part of users. Not only does it ease the way to bring together diverse data and model integration, model testing/validations/comparisons, etc. while ensuring reproducible computing, CyberWater also enables access to HPC facilities on demand. Our proposed framework is based on scientific workflow.

Poster Slides
Zhenming Liu William & Mary Elements: Software: NSCI: A high performance suite of SVD related solvers for machine learning Award #: 1835821 Abstract

We present our recent research progress on joint-optimization for ML algorithm and SVD solver. Our major discovery is that the "stopping criteria" of a SVD algorithm can be directly optimized for downstream ML applications. We will present a few examples, in which when we change the stopping criteria of the "inner SVD algorithm", we see significant performance gain (in running time) in the "outer" ML algorithm.

Poster Slides
Carol Song Purdue University Framework: Data: HDR: Extensible Geospatial Data Framework towards FAIR (Findable, Accessible, Interoperable, Reusable) Science Award #: 1835822 Abstract

Multidisciplinary solutions to address the 21st century’s grand challenges in resource sustainability and resilience are increasingly geospatial data-driven, but researchers spend a significant amount of their time “wrangling data”, i.e., accessing and processing data to make them usable in their modeling and analysis tools. Our NSF CSSI project is developing GeoEDF, an extensible geospatial data framework, that aims to reduce and possibly remove this barrier by creating seamless connections among platforms, data and tools, making large distributed scientific and social geospatial datasets directly usable in models and tools. Through an extensible set of modular and reusable data connectors and processors, GeoEDF is designed to abstract away the complexity of acquiring and utilizing data from remote sources. Researchers can string them together into a workflow that can be executed in various environments including HUBzero tools, HPC resources, or Jupyter Notebooks. By bringing data to the science, GeoEDF will help accelerate data-driven discovery, while ensuring that data is not siloed and improving the practice of FAIR science.

Poster Slides
Alexey Akimov University at Buffalo, SUNY Elements: Libra: The Modular Software for Nonadiabatic and Quantum Dynamics Award #: 1931366 Abstract

Sustained progress in scientific endeavors in solar energy, functional, and nanoscale material sciences requires advanced methods and software components that can be used to model the complex dynamics of excited states, including charge and energy transfer. Within the current project, a range of advanced nonadiabatic and quantum dynamics (NA/QD) techniques such as independent and collective trajectory surface hopping methods, non-equilibrium Fermi golden rule rate calculations, and new quantum-classical decoherence schemes will be implemented and made available for future reuse via the open-source Libra library. These “building blocks” will enable the testing new ideas and theories of NA/QD. The infrastructure of Libra methods and the related database of model problems will enable systematic assessment of various NA/QD methods, in order to standardize and rank a “zoo” of the presently available methods. The first “Jacob’s ladder” of NA/QD methods will be built to serve as a roadmap to theorists and practitioners.

Poster Slides
Alexey Akimov University at buffalo, SUNY CyberTraining: Pilot: Modeling Excited State Dynamics in Solar Energy Materials Award #: 1924256 Abstract

The design and discovery of new efficient and inexpensive solar energy materials can be accelerated via computational modeling of excited states dynamics in these systems. Nonetheless, training in this area remains relatively scarce; the community is often unaware of the available cyberinfrastructure, lacks the best practice guidelines, and may experience entry barriers to employing these advanced tools. This pilot project aims to fill the above gaps by providing targeted training to young scientists in the proficient use of advanced cyberinfrastructure for modeling excited states dynamics in solar energy materials. The project will leverage and combine the general-purpose Libra code library for modeling excited states dynamics and the Virtual Infrastructure for Data Intensive Analysis (VIDIA) platform for web-based data analysis and visualization, as well as the existing electronic structure packages. The resulting versatile gateway will enable advanced training in modeling excited states dynamics of solar energy materials.

Poster Slides
Shrideep Pallickara Colorado State University Frameworks: Collaborative Proposal: Software Infrastructure for Transformative Urban Sustainability Research Award #: 1931363 Abstract

The NSF has invested in several strategic research efforts in the area of urban sustainability, all of which generate, collect, and manage large volumes of spatiotemporal data. This project produces an enabling software infrastructure, Sustain, that facilitates and accelerates discovery by significantly alleviating data-induced inefficiencies. The effort innovative leverages spatiotemporal sketching to decouple data and information.

Poster Slides
Hanna Terletska Middle Tennessee State University Collaborative Research: Element: Development of MuST, A Multiple Scattering Theory based Computational Software for First Principles Approach to Disordered Materials. Award #: 1931367 Abstract

Disorder is inevitably present in real materials. Understanding and harnessing the role of disorder is critical for controlling and utilizing the functional properties of quantum systems with disorder. A careful theoretical and numerical analysis to be done is required. The product of this project is open-source MuST software for abinitio study of disorder effects in real materials. We aim to accomplish the following goals: 1) Provide an open-source ab-initio numerical framework for systems with disorder; 2) Create a truly scalable multiple-scattering theory approach for the first principle study of quantum materials. 3) Expand the existing capabilities of ab initio codes to study strong disorder effects i.e., disorder-driven quantum phase transitions, transport and electron localization (currently available at model Hamiltonian level only). 4) Perform the method development to enable exploration of disorder effects in a variety of materials: disordered metals, high entropy alloys, semiconductors, and topological insulators. 5) Enable researcher to perform ab-initio calculations for disordered systems that are presently out of reach to most researchers.

Poster Slides
In-Ho Cho Iowa State University Elements: Development of Assumption-Free Parallel Data Curing Service for Robust Machine Learning and Statistical Predictions Award #: 1931380 Abstract

The new era of big data and machine learning (ML) is arising. Data- and ML-driven research is becoming a primary paradigm in broad science and engineering. However, the pandemic issue of missing data may hamper robust ML and statistical inference. The negative impact of incomplete data on ML and statistical learning (SL) may seriously depend upon data types and ML/SL methods. Existing data curing (imputation) methods are difficult for general researchers and often unsuitable for large/big, complex data. To overcome the challenges, this project embarked upon developing a new community-level of data-curing platform that can be deployable on NSF cyberinfrastructure and local HPC facilities. The novelty of this project lies in the fact that it requires no (or minimum) expert-level assumptions and has no restrictions on size, dimension, type, and complexity of data. This project’s service will pursue a novel combination of assumption-free, big data-oriented imputation theories, and parallel algorithms.

Poster Slides
Robert Harrison Stony Brook University Collaborative Research: Frameworks: Production quality Ecosystem for Programming and Executing eXtreme-scale Applications (EPEXA) Award #: 1931387 Abstract

EPEXA is creating a production-quality, general-purpose, community-supported, open-source software ecosystem to attack the twin challenges of programmer productivity and portable performance for advanced scientific applications. Through application-driven codesign we focus on the needs of irregular and sparse applications that are poorly served by current programming and execution models on massively-parallel, hybrid systems.

Poster Slides
Nicholas Murphy Center for Astrophysics | Harvard & Smithsonian Collaborative Research: Frameworks: An open source software ecosystem for plasma physics Award #: 1931388 Abstract

PlasmaPy is an open source Python package for plasma physics that is currently under development. The ultimate goal of this project is to foster a community-wide open source software ecosystem for plasma research and education. Following Astropy's model, functionality needed across disciplines is being implemented in the PlasmaPy core package, while more specialized functionality will be developed in affiliated packages. We strive to use best practices from software engineering that have heretofore been uncommon in plasma physics, including continuous integration testing and code review. We will describe code development and community building activities from the first few months of our NSF CSSI award and plans for the next five years.

Poster Slides
Hasan Babaei University of California-Berkeley ENABLING ACCURATE THERMAL TRANSPORT CALCULATIONS IN LAMMPS Award #: 1931436 Abstract

Molecular dynamics (MD) simulations are used extensively to study thermal transport in materials, and one of the most widely used MD software packages is LAMMPS. However, the most common MD technique to compute thermal conductivity, the Green-Kubo method, yields incorrect results in LAMMPS for many-body potentials. The primary aim of this NSF CSSI project is to create and carefully implement the correct heat flux computation in LAMMPS, a problem made challenging by the fact that this software has hundreds of thousands of users and the solution must be merged into the core LAMMPS code. The objectives of the project are: (1) to implement a corrected heat flux computation for all supported many-body potentials in LAMMPS, (2) to identify the types of molecular systems most affected by the changed heat flux computations, and (3) apply and refine the methodology to predict thermal conductivity for several novel nanomaterials.

Poster Slides
Denis Zorin NYU Open-Source Robust Geometry Toolkit for Black-Box Finite Element Analysis Award #: 1835712 Abstract

The numerical solution of PDEs using finite elements and similar methods is ubiquitous in engineering and science applications. Ideally, a PDE solver should be a “black box”: the user provides as input the domain boundary and boundary conditions, and the code returns an evaluator that can compute the value of the solution at any point of the domain. This is surprisingly far from being the case for almost all existing open-source or commercial software. One important source of non-robustness in solvers is treating meshing and FEM basis construction as two disjoint problems. We present our work towards an integrated fully robust pipeline, considering meshing and basis construction as a single challenge. We will demonstrate that tackling the two problems jointly offers many advantages, based on testing with several PDEs on a large dataset.

Poster Slides
Elsa Olivetti Massachusetts Institute of Technology The Synthesis Genome: Data Mining for Synthesis of New Materials Award #: 1922311 Abstract

Successes in accelerated materials design, made possible in part through the Materials Genome Initiative, have shifted the bottleneck in materials development towards the synthesis of novel compounds. Existing databases do not contain information about the synthesis recipes necessary to produce compounds. As a result, much of the momentum and efficiency gained in the design process becomes gated by trial‚Äêand‚Äêerror synthesis techniques. This delay in going from promising materials concept to validation, optimization, and scale‚Äêup is a significant burden to the commercialization of novel materials. This research is developing a framework to do for materials synthesis what modern computational methods have done for materials properties: Build predictive tools for synthesis so that targeted compounds can be synthesized in a matter of days, rather than months or years. This proposal extends efforts to include synthetic confirmation of hypotheses generated by predictive models.

Poster Slides
David Elbert Johns Hopkins University DMREF: Data-Driven Integration of Experiments and Multi-Scale Modeling for Accelerated Development of Aluminum Alloys Award #: 1921959 Abstract

This project seeks to establish a new paradigm for the materials design loop in which the flow of data, rather than individual modeling or experimental tasks, is viewed as central. The work centers on development of an open semantic infrastructure and streaming data platform to integrate the processing, experimental, and modeling components of materials design. Infrastructure development is embedded in a science program focused on creating aluminum alloys resistant to spall failure in high-energy environments. Such alloys have high value in aircraft and spacecraft while understanding the underlying mechanism of failure has broad scientific importance in understanding ultimate material strength. The tight linkage of infrastructure development and science in this project is central to creating infrastructure that works to close the design loop and encourage more meaningful collaboration between domain experts. Instantiation in a multi-scale modeling framework will provide open tools with broad applicability in the materials domain.

Poster Slides
Session 2 - Feb 13, Afternoon
# Name Organization NSF Award Abstract Poster Talk
Chaowei Yang George Mason University Developing On-Demand Service Module for Mining Geophysical Properties of Sea Ice from High Spatial Resolution Imagery Award #: 1835507 Abstract

Sea ice acts as both an indicator and an amplifier of climate change. Multiple sources of sea ice observations are obtained from a variety of networks of sensors (in situ, airborne, and space-borne). To facilitate the science community to better extract important geophysical parameters for climate modeling, we are developing a smart cyberinfrastructure module for the analyses of high spatial resolution (HSR) remote sensing images of sea ice. The project contributes new domain knowledge to the sea ice community. It integrates HSR images that are spatiotemporally discrete to produce a rapid and reliable identification of ice types, and standardizes image processing so as to create compatible sea ice products. The cyberinfrastructure module is a value-added on-demand web service, e.g., reliable classification of sea ice, that can be easily integrated with existing infrastructure.The key objective is to develop a cyberinfrastructure to efferently collect, search, explore, visualize, organize, analyze and share HSR Arctic sea ice imagery.

Poster Slides
Reed Maxwell Colorado School of Mines Collaborative Research: Framework: Software: NSCI : Computational and data innovation implementing a national community hydrologic modeling framework for scientific discovery Award #: 1835903 Abstract

Hydrologic science studies the movement of water in the earth system. Continental scale simulation of this flow of water through rivers, streams and groundwater is an identified grand challenge in hydrology. Decades of model development, combined with advances in solver technology and software engineering have enabled large-scale, high-resolution simulations of the hydrologic cycle over the US, yet substantial technical and communication challenges remain. Our interdisciplinary team of computer scientists and hydrologists is developing a framework to leverage advances in computer science transforming simulation and data-driven discovery in the Hydrologic Sciences and beyond. This project is advancing the science behind these national scale hydrologic models, accelerating their capabilities and building novel interfaces for user interaction. Our framework brings computational and domain science (hydrology) communities together in order to move more quickly from tools (models, big data, high-performance computing) to discoveries. Our framework facilitates decadal, national scale simulations, which are an unprecedented resource for both the hydrologic community and the much broader community of people working in water dependent systems (e.g., biological system, energy and food production). These simulations will enable the community to address scientific questions about water availability and dynamics from the watershed to the national scale. Additionally, this framework is designed to facilitate multiple modes of interaction and engage a broad spectrum of users outside the hydrologic community. We will provide easy-to-access pre-processed datasets that can be visualized and plotted using built-in tools that will require no computer science or hydrology background. Recognizing that most hydrology training does not generally include High Performance Computing and data analytics or software engineering, this framework will provide a gateway for computationally enhanced hydrologic discovery. Additionally, for educators we will develop packaged videos and educational modules on different hydrologic systems geared towards K-12 classrooms.

Poster Slides
Ron Soltz Wayne State University Jet Energy-loss Tomography with a Statistically and Computationally Advanced Program Envelope Award #: 1550300 Abstract

The Jet Energy-loss Tomography with a Statistically and Computationally Advanced Program Envelope (JETSCAPE) collaboration is an NSF funded multi-institutional effort to design the next generation of event generators to study the physics of jets within the quark-gluon plasma created in ultra-relativistic heavy-ion collisions. Integrated advanced statistical analysis tools provide non-expert users with quantitative methods to validate novel theoretical descriptions of jet modification, by comparison with the complete set of current experimental data. To improve the efficiency of this computationally intensive task, the collaboration has developed trainable emulators that can accurately predict experimental observables by interpolation between full model runs, and employ accelerators such as Graphics Processing Units (GPUs) for both the fluid dynamical simulations and the modification of jets. This framework exists within a user-friendly envelope that allows for continuous modifications, updates and improvements of each of its components.

Poster Slides
Wolfgang Bangerth Colorado State University Collaborative Research: Frameworks: Software: Future Proofing the Finite Element Library Deal.II -- Development and Community Building Award #: 1835673 Abstract

Finite element methods (FEMs) are widely used for the solution of
partial differential equations (PDEs). The deal.II project is an open
source FEM software library that supports simulation and computational
discovery in virtually all parts of the sciences and engineering by
providing tools to solve essentially all PDEs amenable to the FEM. It
is also a project with a thriving, world-wide user and developer

We will provide an overview of the project, what we have done for this
grant to extend functionality and still plan to do, and also comment
on how we build a community of users and developers.

Poster Slides
Mike Pritchard University of California, Irvine Collaborative Research: HDR Elements: Software for a new machine learning based parameterization of moist convection for improved climate and weather prediction using deep learning Award #: 1835863 Abstract

Machine learning based representation of sub grid processes in climate simulation was proved in concept in 2018 by several research groups in the limit of an idealized aquaplanet. I will review some of the interesting findings from follow-on tests during the past two years that have clarified the potential for such an approach to work in more realistic settings focusing on the engineering lessons learned and the challenges that remain. Along the way I will discuss emerging diagnostics for testing the physical credibility of prototype emulators, the importance of formal hyperparameter tuning, the strategy we developed to incorporate physical constraints into hybrid machine learning models, and our new CSSI software that makes prognostic testing simpler. I will also share our latest measurements of skill when emulating explicit convection in a modern version of the Community Earth System Model that includes real geography, seasons, and diurnal cycles.

Poster Slides
Theodore Kisner Lawrence Berkeley National Laboratory Collaborative Research: Elements: Software: NSCI: HDR: Building An HPC/HTC Infrastructure For The Synthesis And Analysis Of Current And Future Cosmic Microwave Background Datasets Award #: 1835865 Abstract

This project aims to develop new software infrastructure to bridge the gap between HPC (TOAST) and HTC (SPT3G) software frameworks currently used by Cosmic Microwave Background experiments. We are developing code to allow running simulation and analysis modules from one framework in the other, supporting data translation in memory between the models used by the two frameworks, and unifying the data representations and conventions in both frameworks. This project will immediately benefit current and future CMB experiments such as SPT, ACT, BICEP/Keck, Simons Array, Simons Observatory, and CMB-S4.

Poster Slides
Dmitry Pekurovsky University of California San Diego Elements: Software: Multidimensional Fast Fourier Transforms on the path to Exascale Award #: 1835885 Abstract

Fast Fourier Transforms (FFT) is a ubiquitous tool in scientific simulations, from CFD to plasma, astrophysics, ocean modeling, materials research, medical imaging, molecular dynamics and many others. This CSSI projects aims at creating a highly efficient, scalable and portable library for FFTs in multiple dimensions. The prototype library is available as open source from It is designed to be highly adaptable to a high number of uses and platforms. This poster presents the efforts of the early stages of this work, namely designing and testing the core of the package and the types of use cases it is designed to handle.

Poster Slides
Yong Chen Texas Tech University Elements:Software:NSCI: Empowering Data-driven Discovery with a Provenance Collection, Management, and Analysis Software Infrastructure Award #: 1835892 Abstract

We envision to create a software infrastructure to collect, manage, and analyze provenance data for high performance computing (HPC) systems. The provenance data refers to entities, such as users, jobs, files, and relationships among them. Such a provenance software infrastructure can describe the history of a piece of data; for instance, a user runs a job that produces a dataset, later used by another user when running another job. Advanced data management functionalities such as identifying the data sources, parameters, or assumptions behind a given result, auditing data history and usage, or understanding the detailed process that how different input data are transformed into outputs can be possible. We will introduce our project progress so far and discuss further tasks.

Poster Slides
Miriah Meyer University of Utah Reproducible Visual Analysis of Multivariate Networks with MultiNet Award #: 1835904 Abstract

Multivariate networks -- datasets that link together entities that are associated with multiple different variables -- are a critical data representation for a range of high-impact problems, from understanding how our bodies work to uncovering how social media influences society. These networks capture information about relationships between entities as well as attributes of the entities and the connections. Tools used in practice today provide very limited support for reasoning about networks. This project aims fill this critical gap in the existing cyber-infrastructure ecosystem for reasoning about multivariate networks by developing MultiNet, a robust, flexible, secure, and sustainable open-source visual analysis system. The web-based tool, along with an underlying plug-in-based framework, will support three core capabilities: 1) interactive, task-driven visualization of both the connectivity and attributes of networks, 2) reshaping the underlying network structure to bring the network into a shape that is well suited to address analysis questions, and 3) leveraging provenance data to support reproducibility, communication, and integration in computational workflows. These capabilities will allow scientists to ask new classes of questions about network datasets, and lead to insights about a wide range of pressing topics. To meet this goal, we will ground the design of MultiNet in four deeply collaborative case studies with domain scientists in biology, neuroscience, sociology, and geology.

Poster Slides
Luke Nambi Mohanam University of California, Irvine Elements: libkrylov, a Modular Open-Source Software Library for Extremely Large Eigenvalue and Linear Problems Award #: 1835909 Abstract

Dense linear systems and eigenvalue problems with extremely large dimensions, i.e., well over a million degrees of freedom or unknowns, underlie many grand challenges in science and engineering, from quantum molecular and materials sciences to fluid dynamics. This project develops, validates, and deploys the general-purpose open-source software library libkrylov for solving these linear systems and eigenvalue problems based solely on vector operations. We will give an overview of the already implemented and planned functionality of libkrylov including the recently developed non-orthonormal Krylov subspace methods, as well as design, data structures, and interfaces. The current implementation uses compile-time polymorphism and user-defined procedure encapsulation to enable high degrees of efficiency, generic coding, and ease of use. Examples of applications to X-ray absorption spectroscopy of single molecular magnets will be discussed.

Poster Slides
Krister Shalm Unviersity of Colorado, Boulder RAISE-TAQS: Randomness Expansion Using a Loophole-Free Bell Test Award #: 1839223 Abstract

Our team is building certifiable random number generator using quantum entanglement. This is the only known method for producing random bits whose quality and true randomness can be directly certified. To achieve this, we are researching high-speed, low-loss (>98% transmission), bulk optical switches. Such switches are a critical piece of infrastructure in any quantum network based on photons. We are also working to incorporate our quantum-entangled random number generator into a public randomness beacon, and developing tools for the people to access and use these random bits. The first application being developed is an online app that will use our random bits to draw fair voting district maps that satisfy constitutional requirements to prevent gerrymandering.

Poster Slides
Saul Teukolsky Cornell University Elements:Collaborative Proposal: A task-based code for multiphysics problems in astrophysics at exascale Award #: 1931280 Abstract

We describe the development of SpECTRE, an open-source community code for multi-scale, multi-physics problems in astrophysics and gravitational physics. The code uses discontinuous Galerkin methods and task-based parallelism to run at exascale. SpECTRE will allow astrophysicists to explore the mechanisms driving core-collapse supernovae, to understand electromagnetic transients and gravitational-wave phenomena for black holes and neutron stars, and to reveal the dense matter equation of state.

Poster Slides
Jeff Horsburgh Utah State University Collaborative Research: Elements: Advancing Data Science and Analytics for Water (DSAW) Award #: 1931297 Abstract

Scientific and management challenges in the water domain are multi-disciplinary, requiring synthesis of data from multiple domains. Many data analysis tasks performed by water scientists are difficult because datasets are large and complex; standard formats for common data types are not always agreed upon nor mapped to an efficient structure for analysis; and water scientists generally lack training in scientific methods needed to efficiently tackle large and complex datasets. This project is advancing Data Science and Analytics for Water (DSAW) by developing: (1) an advanced object data model that maps common water-related data types to high performance Python data structures based on standard file, data, and content types established by the CUAHSI HydroShare system; and (2) new Python packages that enable scientists to automate retrieval of water data, loading it into high performance memory objects, and performing reproducible analyses that can be shared, collaborated around, and formally published for reuse.

Poster Slides
Dane Morgan University of Wisconsin - Madison Collaborative Research: Framework: Machine Learning Materials Innovation Infrastructure Award #: 1931298 Abstract

Our project seeks to support rapid development of machine learning applications in Materials Science and Engineering through (i) easy access to data, (ii) cloud-based tools for application of machine learning, and (iii) support for human and machine accessible and sustainable access to disseminated machine learning models.

Poster Slides
Yinzhi Wang University of Texas at Austin Elements: PASSPP: Provenance-Aware Scalable Seismic Data Processing with Portability Award #: 1931352 Abstract

Most of our understanding about the Earth’s interior comes from seismology. Over the past decade, the huge success in many large-scale projects like the USArray component of Earthscope gave rise to a massive increase in the data volume available to the seismology community. Such data set has revealed the limitation of existing data processing infrastructure available to the seismologists. As a step towards addressing the issue, we devised a new framework we call Massive Parallel Analysis System for Seismologists (MsPASS), for seismic data processing and management. MsPASS leverages existing big data technologies: (1) a scalable parallel processing framework based on a dataflow computation model (Spark), (2) a NoSQL database system centered on document store (MongoDB), and (3) a container-based virtualization environment (Docker and Singularity). The preliminary development indicates the basic components can be easily deployed on desktops to large modern high-performance computing systems.

Poster Slides
David Lange Princeton University C++ as a service - rapid software development and dynamic interoperability with Python and beyond Award #: 1931408 Abstract

A key enabler of innovation and discovery for many scientific researchers is the ability to explore data and express ideas quickly as software prototypes. Tools and techniques that reduce the "time to insight" are essential to the productivity of researchers. At the same time, massive increases in data volumes and computational needs require a continual focus on maximizing code performance to realize the potential science from novel scientific apparatus. Programming language usability and interoperability are omni-disciplinary issues affecting today's scientific research community. As a result, a common approach across many scientific fields research is for scientists to program in Python, while steering kernels written in C++. This C++ as a service (CaaS) project brings a novel interpretative technology to science researchers through a state-of-the-art C++ execution environment. CaaS will enable both beginners and experts in C++. It enables higher-productivity in development and extends the interactive education and training platform for programming languages. CaaS will enable existing technologies as well as truly new development and analysis approaches. CaaS will directly support grow cyber-capabilities that advance scientific research across a broad range of pursuits.

Poster Slides
Ashok Srinivasan University of West Florida Cyberinfrastructure for Pedestrian Dynamics-Based Analysis of Infection Propagation Through Air Travel Award #: 1931511 Abstract

Pedestrian dynamics provides mathematical models that can accurately simulate the movement of individuals in a crowd. These models allow scientists to understand how different policies, such as boarding procedures on planes, can prevent, or make worse, the transmission of infections. This project seeks to develop a novel software that will provide a variety of pedestrian dynamics models, infection spread models, as well as data so that scientists can analyze the effect of different mechanisms on the spread of directly transmitted diseases in crowded areas. The initial focus of this project is on air travel. However, the software can be extended to a broader scope of applications in movement analysis and epidemiology, such as in theme parks and sports venues.

Poster Slides
Shantenu Jha Rutgers University RADICAL-Cybertools: Middleware Building Blocks for NSF's Cyberinfrastructure Ecosystem Award #: 1931512 Abstract

RADICAL-Cybertools embodies the building block approach to middleware. It builds upon a prior prototype investment, which developed a pilot system for leadership-class HPC machines, and a Python implementation of SAGA, a distributed computing standard. The current effort is organized around three activities: (i) Extending RCT functionality to reliably support a range of novel applications at scale; (ii) Enhancing RCT to be ready to support new NSF systems, such as the Frontera supercomputing system and other new systems; (iii) Prototyping a new component: a campaign manager for computational resource management.

Poster Slides
Shantenu Jha Rutgers University S2I2: Impl: The Molecular Sciences Software Institute Award #: 1547580 Abstract

The Molecular Sciences Software Institute serves as a nexus for science, education, and cooperation serving the worldwide community of computational molecular scientists -- a broad field including of biomolecular simulation, quantum chemistry, and materials science.

Poster Slides
Amneet Pal Singh Bhalla San Diego State University Collaborative Research: Frameworks: Multiphase Fluid-Structure Interaction Software Infrastructure to Enable Applications in Medicine, Biology, and Engineering Award #: 1931368 Abstract

This project aims to enhance the IBAMR computer modeling and simulation infrastructure that provides advanced implementations of the immersed boundary method and its extensions with support for adaptive mesh refinement. Most current IBAMR models assume that the properties of the fluid are uniform, but many physical systems involve multiphase fluid models with inhomogeneous properties, such as air-water interfaces or the complex fluid environments of biological systems. This project aims to extend recently developed support in IBAMR for treating multiphase flows and enhance the modeling capability to treat multiphase polymeric fluid flows, which are commonly encountered in biological systems, and to treat reacting flows with complex chemistry, which are relevant to models of combustion, astrophysics, and additive manufacturing using stereolithography. This project also aims to re-engineer IBAMR for massive parallelism, so that it may effectively use very large computational resources in service of applications that require very high fidelity.

Poster Slides
Neil Heffernan Worcester Polytechnic Institute Collaborative Research: Frameworks: Cyber Infrastructure for Shared Algorithmic and Experimental Research in Online Learning Award #: 1931523 Abstract

Research on Adaptive Intelligent Learning for K-12 and MOOCs (RAILKaM) cyber infrastructure will enable 20 researchers to run large-scale field experiments on basic principles in the educational contexts of K-12 mathematics learning and university Massive Online Open Courses (MOOCs). RAILKaM will integrate ASSISTments, an online learning platform used by more than 100,000 K-12 students, with MOOCs offered by the University of Pennsylvania and used by hundreds of thousands of learners each year, in order to enable broader populations, more robust student interactions, and more bountiful data collection than currently feasible in either environment alone. RAILKaM will also support 75 data scientists by supplying carefully redacted datasets that protect student privacy. In facilitating 1) high-power, replicable experiments with diverse student populations and 2) extensive measurement, RAILKaM will increase the efficiency and ease of conducting quality educational research in online learning environments, bringing research methods and long-term learning outcomes to 21st-century classrooms.

Poster Slides
Mahmut Kandemir Penn State University Frameworks: Re-Engineering Galaxy for Performance, Scalability and Energy Efficiency Award #: 1931531 Abstract

Galaxy is an open source, web-based framework that is extensively used by more than 20,000 researchers world-wide for conducting research in many areas such as genomics, molecular dynamics, chemistry, drug discovery, and natural language processing. It provides a web-based environment using which scientists perform various computational analyses on their data, exchange results from these analyses, explore new research concepts, facilitate student training, and preserve their results for future use. Galaxy currently runs on a large variety of high-performance computing (HPC) platforms including local clusters, supercomputers in national labs, public datacenters and Cloud. Unfortunately, while most of these systems supplement conventional CPUs with significant accelerator capabilities (in the form of Graphical Processing Units (GPUs) and/or Field-Programmable Gate Arrays (FPGAs)), the current Galaxy implementation does not take advantage of these powerful accelerators. This is unfortunate because many Galaxy applications (e.g., sequence analysis, metabolomics, and metagenomics) are inherently parallelizable and can benefit from significant latency and throughput improvements when mapped to GPUs and FPGAs.
The main objectives of this proposed work are to (i) enable existing Galaxy tools to take full advantage of the immense computational capabilities offered by state-of-the-art GPUs and FPGAs, and at the same time, (ii) enlarge the Galaxy community by bringing the unique tool, analytics, data preservation and sharing capabilities provided by Galaxy to existing GPU and FPGA based applications from various domains that currently do not use Galaxy.

Poster Slides
Dhabaleswar K (DK) Panda Ohio State University Collaborative Research: Frameworks: Designing Next-Generation MPI Libraries for Emerging Dense GPU Systems Award #: 1931537 Abstract

Modern HPC platforms are using multiple CPU, GPUs and high-performance interconnects per node. Unfortunately, state-of-the-art production quality implementations of the popular Message Passing Interface (MPI) programming model do not have the appropriate support to deliver the best performance and scalability for applications (HPC and DL) on such dense GPU systems. The project involves a synergistic and comprehensive research plan, involving computer scientists from OSU and OSC and computational scientists from TACC, SDSC and UCSD. The proposed innovations include: 1) Designing high-performance and scalable communication operations that fully utilize multiple network adapters and advanced in-network computing features for GPU and CPU; 2) Designing novel datatype processing and unified memory management; 3) Designing CUDA-aware I/O; 4) Designing support for containerized environments; and 5) Carrying out integrated evaluation with a set of driving applications. Initial results from this project using the MVAPICH2 MPI library will be presented.

Poster Slides
Rafal Angryk Georgia State University Elements: Comprehensive Time Series Data Analytics for the Prediction of Solar Flares and Eruptions Award #: 1931555 Abstract

We report on progress made by our interdisciplinary Data Mining Lab at Georgia State University on this recently funded (October 1, 2019) project. We present brief overview of our project and focus on the first two phases of our research: (1) Data & Metadata Acquisition, and (2) Generation of Data Sets for Benchmarking.
Solar flares, along with accompanying solar eruptions, have the potential to disrupt the technology we rely on, such as GPS, radars, high-frequency radio communications, communication satellites (cell phones and Internet), and electricity distribution networks.
Our objectives are:
(1) to improve understanding of the time-dependent physical behavior of solar active regions to the point that we can predict whether, when and how strongly they will flare; and
(2) in doing this, to perform comparative, reproducible, and data-driven prediction of solar magnetic eruptions.

Poster Slides
Andreas Kloeckner University of Illinois at Urbana-Champaign Elements: Transformation-Based High-Performance Computing in Dynamic Languages (also: SHF-1911019: SHF: Small: Collaborative Research: Transform-to-perform: languages, algorithms, and solvers for nonlocal operators --- represented by Rob Kirby) Award #: 1931577 Abstract

Poster Slides
Hendrik Heinz University of Colorado at Boulder Collaborative Research: Frameworks: Cyberloop for Accelerated Bionanomaterials Design Award #: 1931587 Abstract

This project aims at building a sustainable computational infrastructure for all-atom simulations of compounds and multiphase materials across the periodic table in high accuracy up to the 1000 nm scale. Cyberloop consolidates previously disconnected platforms for soft matter and solid state simulations (IFF, OpenKIM, and CHARMM-GUI) into a single unified framework. The new integrated infrastructure will enable users to set up complex bionanomaterial configurations, select reliable force fields, generate input scripts for popular simulation platforms, and assess the uncertainty in the results. Innovations include automated charge assignment protocols and file conversions, expansion of the Interface force field (IFF) and surface model databases, extension of the Open Knowledgebase of Interatomic Models (OpenKIM) to bonded force fields and AI-based force field selection tools, and development of new Nanomaterial Builder and Bionano Builder modules in CHARMM-GUI. Cyberloop supports the discovery of the next generation of therapeutics, materials for energy conversion, and ultrastrong composites, and trains an interdisciplinary, diverse, and cyber-savvy workforce.

Poster Slides
Sameer Shende University of Oregon CSSI: Elements: First Workshop on NSF and DOE High Performance Computing Tools Award #: 1939486 Abstract

High Performance Computing (HPC) software has become increasingly complex to
install. The complex inter-package dependency can lead to significant loss of
productivity. DOE's Exascale Computing Project (ECP) has produced an
Extreme-Scale Scientific Software Stack (E4S) [] of HPC libraries
and tools. It is available through a containerized distribution as well as
Spack. Spack includes recipes for building packages from source code and is the
primary means of deploying ECP software. This talk will focus on an annual
two-day workshop that brings together teams of NSF and DOE researchers. The
outcome of this meeting will be actual deployment of the ECP SDK software stack
and container-based runtimes on the HPC systems and an understanding of how to
develop custom recipes for Spack based builds. The proposed workshop will have
a significant impact on enabling the delivery of HPC software to NSF and other
supercomputing sites.

Poster Slides
Sameer Shende University of Oregon SI2-SSI: Collaborative Research: A Software Infrastructure for MPI Performance Engineering: Integrating MVAPICH and TAU via the MPI Tools Interface Award #: 1450471 Abstract

This project creates an MPI programming infrastructure that can integrate performance analysis capabilities more directly, through the MPI Tools (MPI_T) Information Interface, monitor Performance metrics during run time, and deliver greater optimization opportunities for scientific applications. It integrates MVAPICH2 and the TAU Performance System using the MPI_T interface. MVAPICH2 exports performance variables (PVARs) and exposes key control variables (CVARs) to TAU using the MPI_T interface. MVAPICH2 has multiple optimized designs for collective operations. Choosing the algorithm that deliver the best performance for a given application is complicated and depends on several factors like message size, size of the job, availability of advanced hardware features etc. TAU provides a plugin framework where plugins interact with MVAPICH2 and read PVARs and set CVARs to effect runtime adaptation based on performance data.

Poster Slides
Naveen Sharma Rochester Institute of Technology Citizenly: Empowering Communities by Democratizing Urban Data Science Award #: 1943002 Abstract

RIT and the City of Rochester are collaborating to address commonly present challenges in midsize cities. Democratizing data science is the notion that anyone, with little to no technical expertise, can do data science if provided the right data and user-friendly tools. The Citizenly project aims to realize this broad vision in urban context and challenges. This project will extract data from NY open data sets, readily available city’s data (e.g. building, crime, and transportation), and citizen-generated data to develop a hyper-relevant data set for the community. Citizens and city leaders, without requiring technical, will be able to create, share, and take advantage of urban data and applications for their respective communities. As part of initial work for this project, using the Citizenly approach, two urban use cases will be developed: optimal urban services resource allocation and health impact assessment of socio-economic factors.

Poster Slides
Rajiv Ramnath Ohio State University EAGER: Bridging the last mile; Towards an assistive cyberinfrastructure for accelerating computationally driven science Award #: 1945347 Abstract

With the onset of any research/project, most of the time is spent on data acquisition (published and verified data sets suitable for the study), pre-processing (noise reduction, visualizing and manipulating the data to fit our research) and tool exploration (state-of-art techniques). Some preliminary analysis along with small-scale computations is needed to compare the tools' results while adjusting relevant software parameters and modal parameters. During this entire process, one has to explore various resources (how-to guides, research papers/journals, textbooks, internet) and/or seek ad-hoc advice from colleagues, collaborators, advisors. Most of the suggestions/recommendations go undocumented unless they were implemented and thus recorded. Also, different researchers need guidance during different stages of their research progress and in various forms. This project proposes the use of artificial intelligence to build a cyberinfrastructure tool that assists by utilizing past experiences and other resources to carter to individual researcher's needs.

Poster Slides
Natalia Villanueva Rosales University of Texas at El Paso ELEMENTS: DATA: HDR: SWIM to a Sustainable Water Future Award #: 1835897 Abstract

Water sustainability is a key challenge worldwide and one of the United Nations’ seventeen Sustainable Development Goals with 40% of the global population experiencing water scarcity. The American Southwest will be impacted by more intense drought expected in the coming decades. The Sustainable Water through Integrated Modeling (SWIM) framework will advance water sustainability research capabilities by automating the integration and execution of decoupled water models, facilitating the interpretation of such models and enabling participatory reasoning processes. Convergent research in SWIM is achieved through three synergistic subprojects: 1) SWIM-SEM that focuses on formally described semantics to enhance the automated execution and understanding of data and models generated by SWIM; 2) SWIM-PM that addresses the challenges of enabling participatory analysis of the socio-economic-environmental water system through research on data- and model-based reasoning with biophysical and social models; and 3) SWIM-IT that focuses on cyberinfrastructure for engaging stakeholders, advancing research, and ensure usability, reproducibility, and sustainability of products.

Poster Slides
Kenton McHenry University of Illinois Urbana-Champaign Collaborative Research: CSSI: Framework: Data: Clowder Open Source Customizable Research Data Management, Plus-Plus Award #: 1835834 Abstract

Clowder Open Source Customizable Research Data Management, Plus-Plus

Preserving, sharing, navigating, and reusing large and diverse collections of data is now essential to scientific discoveries in all areas. To support these needs effectively, new methods are required that simplify and reduce the amount of effort needed by researchers to find and utilize data, support community accepted data practices, and bring together the breadth of standards, tools, and resources utilized by a community. Clowder, an active curation based data management system, addresses these needs and challenges by distributing much of the data curation overhead throughout the lifecycle of the data, augmenting this with social curation and automated analysis tools, and providing extensible community-dependent means of viewing and navigating data. The project enhances Clowder's core systems and aims to transition the grassroots Clowder user community and Clowder's other stakeholders (such as current and potential developers) into a larger organized community, with a sustainable software resource supporting convergent research data needs.

Poster Slides
Ken Koedinger Carnegie Mellon University CIF21 DIBBs: Building a Scalable Infrastructure for Data-Driven Discovery and Innovation in Education Award #: 1443068 Abstract

We aim to transform scientific discovery and innovation in education through a scalable data infrastructure that bridges across the many disciplines now contributing to learning science, discipline-based education research, and educational technology innovation (e.g., intelligent tutoring, dialogue systems, MOOCs). The data infrastructure building blocks (DIBBs) we are developing and integrating are available online at LearnSphere spans existing educational data silos through sharing learning analytic components or DIBBS that scientists can use with or without programming. The key is a web-based workflow authoring tool called Tigris. To develop the user community we have held ten workshops reaching 100s of participants. LearnSphere has over 7000 unique user logins and 78 DIBBS components have been created in Tigris. With this critical mass of re-composable DIBBS, over 1000 workflows have been created and are being shared and used for learning R&D in academia and industry.

Poster Slides
Amit Chourasia University of California, San Diego CIF21 DIBBs: Ubiquitous Access to Transient Data and Preliminary Results via the SeedMe Platform Award #: 1443083 Abstract

SeedMeLab is a powerful data management and data sharing software suite. It enables collaboration teams to manage, share, search, visualize, and present their data using an access-controlled, branded, and customizable website that they own and control. It supports storing and viewing data in a familiar tree hierarchy but also supports formatted annotations, lightweight visualizations, and threaded comments on any file/folder. The system can be easily extended and customized to support metadata, job parameters, and other domain and project-specific contextual items. The software is open source and available as an extension to the popular Drupal content management system

Poster Slides
Zachary Ives University of Pennsylvania mProv: Provence-Based Data Analytics Cyberinfrastructure for High-frequency Mobile Sensor Data Award #: 1640813 Abstract

The mProv project develops (1) extensions to the PROV standard for capturing metadata vital to reproducibility for streaming data; (2) instrumentation to open-source components commonly used in streaming, mobile, and big data settings; (3) capabilities for reasoning about data history and quality based on provenance.

Poster Slides
Bill Tolone University of North Carolina at Charlotte Virtual Information-Fabric Infrastructure (VIFI) for Data-Driven Decisions from Distributed Data Award #: 1640818 Abstract

VIFI presents a novel infrastructure that empowers data users to discover, analyze, transform, and evaluate distributed, fragmented data without direct access to or movement of large amounts of data, enabling analyses that are otherwise impossible, infeasible, or impractical.

Poster Slides
Shawn McKee University of Michigan CC*DNI DIBBs: Multi-Institutional Open Storage Research InfraStructure (MI-OSiRIS) Award #: 1541335 Abstract

We will report on the status of the OSiRIS project (NSF Award #1541335, UM, IU, MSU and WSU) during its fifth and final year. OSiRIS is delivering a distributed Ceph storage infrastructure coupled together with software-defined networking to support multiple science domains across Michigan’s three largest research universities. The project’s goal is to provide a single scalable, distributed storage infrastructure that allows researchers at each campus to work collaboratively with other researchers across campus or across institutions. The NSF CC*DNI DIBBs program which funded OSiRIS is seeking solutions to the challenges of multi-institutional collaborations involving large amounts of data and we are exploring the creative use of Ceph and networking to address those challenges.

Poster Slides
Margo Seltzer University of British Columbia SI2-SSI: Collaborative Research: Bringing End-to-End Provenance to Scientists Award #: 1450277 Abstract

The End-to-end Provenance project has produced a collection of tools that use data provenance. Our tools make life easier for data scientists programming in R and python, they help make science more reproducible, and they improve system security. (1-slide presentation)

Poster Slides
Erkan Istanbulluoglu University of Washington Collaborative Research: SI2-SSI: Landlab: A Flexible, Open-Source Modeling Framework for Earth-Surface Dynamics Award #: 1450412 Abstract

This project catalyzes research in earth-surface dynamics by developing a software framework that enables rapid creation, refinement, and reuse of two-dimensional (2D) numerical models. The phrase earth-surface dynamics refers to a remarkably diverse group of science and engineering fields that deal with our planet's surface and near-surface environment: its processes, its management, and its responses to natural and human-made perturbations. Scientists who want to use an earth-surface model often build their own unique model from the ground up, re-coding the basic building blocks of their model rather than taking advantage of codes that have already been written. Whereas the end result may be novel software programs, many person-hours are lost rewriting existing code, and the resulting software is often idiosyncratic, poorly documented, and unable to interact with other software programs in the same scientific community and beyond, leading to lost opportunities for exploring an even wider array of scientific questions than those that can be addressed using a single model. The Landlab model framework seeks to eliminate these redundancies and lost opportunities, and simultaneously lower the bar for entry into numerical modeling, by creating a user- and developer-friendly software library that provides scientists with the fundamental building blocks needed for modeling earth-surface dynamics. The framework takes advantage of the fact that nearly all surface-dynamics models share a set of common software elements, despite the wide range of processes and scales that they encompass. Providing these elements in the context of a popular scientific programming environment, with strong user support and community engagement, contributes to accelerating progress in the diverse sciences of the earth's surface.

Poster Slides
Edward Valeev Virginia Tech Collaborative Research: SI2-SSI: Software Framework for Electronic Structure of Molecules and Solids Award #: 1550456 Abstract

The project focuses on the development of fast and accurate methods for simulation of molecules and solids, bringing under one umbrella three electronic structure codes (PySCF, BAGEL, and MPQC). Unique capabilities developed with the project's support include robust coupled-cluster methods for solids, high-end parallel coupled-cluster capabilities for molecules, and fast techniques for density functional theory in molecules and solids.

Poster Slides
Catherine Zucker Harvard University SI2-SSE: Collaborative Research: A Sustainable Future for the Glue Multi-Dimensional Linked Data Visualization Package Award #: 1739657 Abstract

glue is an open-source python-based software package that enables scientists to explore relationships within and across related datasets. Without merging any data, glue makes it easy to make multi-dimensional linked visualizations of datasets, select subsets of data interactively or programmatically in 1, 2, or 3 dimensions, and to see those selections propagate live across all open visualizations (e.g., histograms, 2 and 3-d scatter plots, images, 3-d volume renderings, etc.). While originally designed as a desktop application, we have built a new prototype interface for glue in JupyterLab, a browser-based environment that supports multi-panel interactive visualizations alongside narrative text, code, and figures. We demonstrate the functionality of glue in the JupyterLab environment, and show how it can be a powerful multi-disciplinary tool for making discoveries across many scientific disciplines.

Poster Slides
Anton Van der Ven University of California Santa Barbara SI2-SSE: Automated statistical mechanics for the first-principles prediction of finite temperature properties in hybrid organic-inorganic crystals Award #: 1642433 Abstract

The CASM software package (a Clusters Approach to Statistical Mechanics) automates first-principles statistical mechanics simulations of crystalline solids. CASM is designed to algorithmically formulate effective Hamiltonians for a wide variety of chemical, electronic and vibrational degrees of freedom within arbitrarily complex crystal structures. The CASM software then automates the tasks to parameterize the generalized effective Hamiltonians to first-principles training data. Subsequently it generates highly optimized (kinetic) Monte Carlo codes tailored for each effective Hamiltonian to enable finite temperature statistical mechanics simulations. These tools are ideally suited to study the coupling between a wide range of chemical and electronic excitations within crystalline solids at finite temperature and have found applications in multi-component alloys, battery materials, magnetic materials that are alloyed and strained and quantum materials.

Poster Slides
David Mencin UNAVCO Inc. and University of Colorado Collaborative Research: Framework: Data: NSCI: HDR: GeoSCIFramework: Scalable Real-Time Streaming Analytics and Machine Learning for Geoscience and Hazards Research Award #: 1835791 Abstract

Update on the first year of GeoSCIFramwork

Poster Slides
Andreas Kloeckner University of Illinois at Urbana-Champaign SHF:Small:Collaborative Research: Transform-to-perform: languages, algorithms, and solvers for nonlocal operators Award #:1911019 Abstract

Poster Slides
Session 3 - Feb 14, Morning
# Name Organization NSF Award Abstract Poster Talk
Petr Sulc Arizona State University Elements: Models and tools for online design and simulations of DNA and RNA nanotechnology Award #: 1931487 Abstract

We develop a set of tools for interactive visualization and analysis of large DNA and RNA nanostructures, simulated by a coarse-grained model of DNA and RNA. The tools allows to interactively edit and visualize simulations consisting of up to 2 million nucleotides, largest system that has ever been simulated at nucleotide level. The tools can further quantify the properties of the simulated structures, allowing experimental groups to probe in silico large numbers of different designs.

Poster Slides
Rachana Ananthakrishnan University of Chicago Automate: A Distributed Research Automation Platform Award #: 1835890 Abstract

Exponential increases in data volumes and velocities are overwhelming finite human capabilities. Continued progress in science and engineering demands that we automate a broad spectrum of currently manual research data manipulation tasks, from transfer and sharing to acquisition, publication, indexing, analysis, and inference. To address this need, which arises across essentially all scientific disciplines, this project is working with scientists in astronomy, engineering, geosciences, materials science, and neurosciences to develop and apply Globus Automate, a distributed research automation platform. Its purpose is to increase productivity and research quality across many science disciplines by allowing scientists to offload the management of a broad range of data acquisition, manipulation, and analysis tasks to a cloud-hosted distributed research automation platform.

Poster Slides
Lucy Fortson University of Minnesota Collaborative Research: Framework: Software: HDR: Building the Twenty-First Century Citizen Science Framework to Enable Scientific Discovery Across Disciplines Award #: 1835530 Abstract

For citizen science as a research framework to fulfill its promise in supporting hundreds of researchers across many disciplines in harnessing the data revolution and in enabling new science not previously possible, this grant develops new Citizen Science Cyberinfrastructure (CSCI) for (1) Combining Modes of Citizen Science - linking field-based citizen science with online analysis citizen science; (2) Smart Task Assignment – combining human and machine intelligences; and (3) exploring new data models presenting Data as Subject to volunteers. Building on the demonstrated success of substantial CI investments in citizen science, namely and, while leveraging Science Gateways Community Institute (SGCI) CI resources, the development of the new CSCI Framework is being driven by three science use cases in biomedicine, ecology, and astronomy; specifically, 3-D reconstructions of bioimaged cell organelles, species monitoring through identifying individual animals via non-invasive imaging, and characterizing astronomical light curves in anticipation of large upcoming surveys.

Poster Slides
Carol Hall North Carolina State University Element: Software: Enabling Millisecond-Scale Biomolecular Dynamics Award #: 1835838 Abstract

The goal of this proposed research is to develop an open software framework to enable multi-millisecond dynamic simulations of peptides and peptidomimetic polymers. This will be achieved by implementing a parallel discontinuous molecular dynamics (DMD) package, developing a suite of DMD interaction potentials, and providing tools for translating continuous atomistic models into DMD models. Although there are many coarse-grained potentials and codes available to simulate large biomolecular systems, the longest times scales that can typically be accessed are on the order of tens of microseconds, and most are unable to predict the formation of structures such as fibrils in a reasonable time frame. Our tools will allow the scientific/engineering community to study long time-scale phenomena such as biopolymer folding, aggregation, and fibril formation. The code will be tested by volunteer users and validated both by comparison with literature results, and an experimental case study on peptoid-based inhibition of antibody aggregation.

Poster Slides
Carol Hall North Carolina State University Element: Computational Toolkit to Discover Peptides that Self-assemble into User-selected Structures Award #: 1931430 Abstract

Many peptides are known to adopt beta-strand conformations and assemble spontaneously into a variety of nanostructures with applications in a variety of fields. The goal of this project is to develop an open software toolkit that enables the identification of peptide sequences that are capable of assembling into user-selected beta-sheet-based structures. Users will be able to screen potentially thousands of peptide sequences that assemble spontaneously into the structure of their choosing, and rank order their stability. Discontinuous molecular dynamics (DMD) simulation software along with the PRIME20 force field will also made available to enable analysis of the designed structures’ assembly kinetics. To establish efficacy and a basis for future improvement of computational tools, selected designs will be validated experimentally using biophysical characterization techniques and solid-state nuclear magnetic resonance (ssNMR) spectroscopy. Our software tool, “Peptide Assembly Designer”, PepAD, will be a “plugin” on the NSF-sponsored Molecular Simulation and Design Framework (MoSDeF).

Poster Slides
Gerard Lemson Johns Hopkins University Long Term Access to Large Scientific Data Sets: The SkyServer and Beyond Award #: 1261715 Abstract

SciServer is a science platform built and supported by the Institute for Data Intensive Engineering and Science at the Johns Hopkins University. SciServer extends the SkyServer system of server-side tools that introduced the astronomical community to SQL and has been serving the Sloan Digital Sky Survey catalog data to the public. SciServer uses a Docker based architecture to provide interactive and batch mode server-side analysis with scripting languages like Python and R in various environments including Jupyter (notebooks), RStudio and command-line. Users have access to private file storage as well as personal SQL database space. A flexible resource access control system allows users to share their resources with collaborators, a feature that has also been very useful in classroom environments. All these services, wrapped in a layer of REST APIs, constitute a scalable collaborative data-driven science platform that is attractive to science disciplines beyond astronomy.

Poster Slides
Geoffrey Charles Fox Indiana University Bloomington Middleware and High-Performance Analytics Libraries for Scalable Data Science Award #: 1443054 Abstract

NSF 1443054 “Middleware and High-Performance Analytics Libraries for Scalable Data Science” is a collaboration between 7 collaborating Universities at Arizona State, Indiana (lead), Kansas, Rutgers, Stony Brook, Virginia, and Utah. It addresses the intersection of HPC and Big Data computing with several different application areas or communities driving the requirements for software systems and algorithms. The base architecture includes the HPC-ABDS, High-Performance Computing Enhanced Apache Big Data Stack, and application use cases identifying key features that determine software and algorithm requirements. The middleware includes the Harp-DAAL collective communication layer, Twister2 Big Data toolkit, and RADICAL pilot jobs for batch and streaming applications. The SPIDAL Scalable Parallel Interoperable Data Analytics Library includes core machine-learning, image processing, and the application communities, Network science, Polar Science, Biomolecular Simulations, Pathology, and Spatial systems. Recent work focuses on the integration of ML with HPC in HPCafterML in biomolecular simulations and a broad study of HPCforML.

Poster Slides
Anand Padmanabhan University of Illinois at Urbana Champaign CIF21 DIBBs: Scalable Capabilities for Spatial Data Synthesis Award #: 1443080 Abstract

Spatial data often embedded with geographic references are important to numerous scientific domains (e.g., ecology, geography and spatial sciences, geosciences, and social sciences, to name just a few), and also beneficial to solving many critical societal problems (e.g., environmental and urban sustainability). In recent years, this type of data has exploded to massive size and significant complexity as increasingly sophisticated location-based sensors and devices (e.g., social networks, smartphones, and environmental sensors) are widely deployed and used. However, the tools and computational platforms available for processing synthesizing such data remain limited. Over the past couple of years, this project has helped establish CyberGIS-Jupyter as a platform for making geospatial data processing and analytics capabilities accessible. CyberGIS-Jupyter is an online geospatial computation platform for a large number of users to conduct and share scalable cyberGIS analytics via Jupyter Notebooks supported by advanced cyberinfrastructure resources such as those provisioned by the Extreme Science and Engineering Discovery Environment (XSEDE). This poster presents details of both CyberGIS-Jupyter platform in terms of both technical progress made and the enabling role it plays in enhancing cyberGIS research and education

Poster Slides
Rich Wolski University of California, Santa Barbara CC*DNI DIBBs: Data Analysis and Management Building Blocks for Multi-Campus Cyberinfrastructure through Cloud Federation Award #: 1541215 Abstract

The poster will outline the project's development and deployment of a cloud federation spanning several university campuses. Science users are able to draw cloud resources from multiple campus clouds using a single-sign-on credentialling capability. The poster will outline the structure and maintenance of the federation, discuss the technological approach, highlight the science achievements that it has enabled.

Poster Slides
Thomas A DeFanti UC San Diego CC*DNI DIBBs: The Pacific Research Platform Award #: 1541349 Abstract

The goal of the Pacific Research Platform (PRP) Cooperative Agreement is to expand the campus Science DMZ network systems model developed by the Department of Energy's ESnet into a regional DMZ model supporting data-intensive science. The PRP is enabling researchers to quickly and easily move data between collaborator labs, supercomputer centers, instruments, and data repositories, creating a big-data freeway that allows the data to traverse multiple, heterogeneous networks with minimal performance degradation. The PRP’s data-sharing architecture, with end-to-end 10--100Gb/s connections, is enabling examples of regionwide, nationwide, and worldwide virtual co-location of data with computing.

Poster Slides
Bertram Ludaescher University of Illinois, Urbana-Champaign CC*DNI DIBBS: Merging Science and Cyberinfrastructure Pathways: The Whole Tale Award #: 1541450 Abstract

Poster Title: Developing, Packaging and Sharing Reproducible Research Objects: The Whole Tale Approach.

Abstract. A tale is an executable research object for the dissemination of
computational scientific findings that captures information needed to facilitate understanding, transparency, and re-execution for review and computational reproducibility at the time of publication. We describe the Whole Tale open source project and platform, and describe different use cases supported by the current release. A current development focus is on advanced provenance capture and querying capabilities.

Poster Slides
George Alter University of Michigan Continuous Capture of Metadata for Statistical Data Award #: 1640575 Abstract

The C2Metadata (“Continuous Capture of Metadata”) Project automates the documentation of data transformations performed by statistical software. Researchers in many fields use statistical software (SPSS, Stata, SAS, R, Python) for data transformation and data management as well as analysis. C2Metadata tools translate scripts used by statistical software into an independent Structured Data Transformation Language (SDTL), which serves as an intermediate language for describing data transformations. SDTL is incorporated into standard metadata formats (Data Documentation Initiative (DDI), Ecological Markup Language (EML), and JSON-LD), which are used for data discovery, codebooks, and auditing data management scripts. C2Metadata differs from most previous approaches to provenance by focusing on documenting transformations at the variable level.

Poster Slides
Bonnie Hurwitz University of Arizona Ocean Cloud Commons Award #: 1640775 Abstract

Next-generation sequencing has lead to the generation of massive genomic datasets to explore the roles and functions of microorganisms in ecosystems. Comparative metagenomic aims to explore these datasets by them to one another and measuring their similarity globally. We developed an algorithm called Libra that uses Hadoop to perform all-vs-all sequence analysis on hundreds of metagenomes to identify microbial and viral signatures linked to key biological processes. Libra performs with unparalleled accuracy compared to existing tools on both simulated and real datasets using billions of reads. Libra’s state-of-the-art algorithm and its implementation on Hadoop allow it to achieve remarkable compute times and accuracy without requiring a reduction in dataset size or simplified distance metrics. Our tool is integrated into iMicrobe ( where users can run Libra using their CyVerse account using their own datasets or those that are integrated into the OCC.

Poster Slides
Juliana Freire New York University CIF21 DIBBs: EI: Vizier, Streamlined Data Curation Award #: 1640864 Abstract

Vizier ( is an open-source tool for data debugging and exploration that combines the flexibility of notebooks with the easy-to-use data manipulation interface of spreadsheets. Combined with advanced provenance tracking for both data and computational steps this enables reproducibility, versioning, and streamlined data exploration.

Poster Slides
Krishna Rajan University at Buffalo DIBBS: EI: Data Laboratory for Materials Engineering Award #: 1640867 Abstract

The primary outcomes of this project are: (i) Creation of AI tools to make valuable experimental data hitherto inaccessible for analysis, available to materials scientists - these tools are domain agnostic and facilitate extraction of data from information-rich sources such as scientific charts and diagrams in academic papers; examples include automatically extracting eutectic points from phase diagrams which was used to identify potential metallic glass forming compounds. (ii) Creation of a machine learning framework for materials scientists to accelerate the discovery of advanced materials. This framework contains synergistic building blocks that enable scientists to gather, model and visualize data. An easy-to-use graphical interface has also been developed to apply different state-of-the-art machine learning models on existing materials data. Performance comparison of different models as well as descriptors in terms of predicted properties is supported by the interface along with visualization of data and results to better understand physical phenomena

Poster Slides
Shyam Dwaraknath Lawrence Berkeley National Laboratory The Local Spectroscopy Data Infrastructure (LSDI) Award #: 1640899 Abstract

The Local Spectroscopy Data Infrastructure (LSDI) Project is developing completely integrated platform for first-principles calculations of the so-called “local” environment at atomic positions within a material, which can be revealed through NMR spectroscopy and local-probe X-ray absorption spectra. The infrastructure broadly addresses the needs of the growing community of chemists and materials scientists who rely on local-environment probe spectroscopy methods, which are relevant for defective, non-stoichiometric, and nano-crystalline materials and interfaces. These classes of materials are increasingly important across a range of applications, and no standardized spectral measurements are available or catalogued to accelerate characterization, understanding and design. Our project created robust, benchmarked workflows for calculating NMR, XAS, EELS, and other spectra and developed tools to use these massive data sets to better understand the local-environments that these techniques probe.

Poster Slides
Ann Christine Catlin Purdue University Creating a Digital Environment for Enabling Data-Driven Science (DEEDS) Award #: 1724728 Abstract

The digital environment for enabling data-driven science (DEEDS) is a cyberinfrastructure for big data and high-performance computing that offers systematic, reliable, and secure support for scientific investigations end-to-end. DEEDS datasets provide a shared research environment for data acquisition, preservation, exploration and analysis, together with the integration and HPC execution of data science research tools, interactive analytics, and the capture of computing workflows for data provenance, results traceability, and reproducibility. User-friendly interfaces on the dataset dashboard are used to create and connect file repositories, multi-dimensional data tables, computational software, scientific workflows, outcomes, and analytics – offering interactive search, exploration, and visualization. Datasets are FAIR-compliant and can be published for discovery, exploration, reuse, and reinterpretation. DEEDS is effective across science domains and is being used to support collaborative research projects in electrical engineering, biological engineering, civil engineering, computational chemistry, agriculture, and health & human sciences.

Poster Slides
Haiying Shen University of Virginia CIF21 DIBBs: PD: Building High-Availability Data Capabilities in Data-Centric Cyberinfrastructure Award #: 1724845 Abstract

Both high performance computing (HPC) clusters and Hadoop clusters use file systems. A Hadoop cluster uses the Hadoop Distributed File System (HDFS) that resides on compute nodes, while an HPC cluster usually uses a remote storage system. Despite years of efforts on research and application development on HPC and Hadoop clusters, the file systems in both types of clusters still face a formidable challenge, that of achieving exascale computing capabilities. The centralized data indexing in HDFS and HPC storage architectures cannot provide high scalability and reliability, and both HDFS and HPC storage architectures have shortcomings such as single point of failure and insufficiently efficient data access. This project builds scalable high-availability data capabilities in data-centric cyberinfrastructure to overcome the shortcomings and create a highly scalable file system with new techniques for distributed load balancing, data replication and consistency maintenance.

Poster Slides
Kimberly Claffy University of California San Diego, San Diego Sumpercomputer Center CIF21 DIBBs: EI: Integrated Platform for Applied Network Data Analysis (PANDA) Award #: 1724853 Abstract

We are developing a new Platform for Applied Network Data Analysis (PANDA) that will offer researchers more accessible calibrated user-friendly tools for collecting, analyzing, querying, and interpreting measurements of the Internet ecosystem.

Poster Slides
Tevfik Kosar University at Buffalo CIF21 DIBBs: PD: OneDataShare: A Universal Data Sharing Building Block for Data-Intensive Applications Award #: 1724898 Abstract

As data has become more abundant, and data resources become more heterogeneous, accessing, sharing, and disseminating these data sets become a bigger challenge. Using simple tools to remotely logon to computers and manually transfer data sets between sites is no longer feasible. Managed file transfer (MFT) services have allowed users to do more, but these services still rely on the users providing specific details to control this process, and they suffer from shortcomings, including low transfer throughput, inflexibility, limited protocol support, and poor scalability. OneDataShare is a universal data sharing building block for data-intensive applications, with three primary goals: (1) optimization of end-to-end data transfers and reduction of the time to delivery of the data; (2) interoperation across heterogeneous data resources and on-the-fly inter-protocol translation; and (3) prediction of the data delivery time to decrease the uncertainty in real-time decision-making processes. These capabilities are being developed as a cloud-hosted service.

Poster Slides
Tevfik Kosar University at Buffalo EAGER: GreenDataFlow: Minimizing the Energy Footprint of Global Data Movement Award #: 1842054 Abstract

The annual electricity consumed by the global data movement is estimated to be more than 200 terawatt-hours at the current rate, costing more than 40 billion U.S. dollars per year. GreenDataFlow project aims to reduce the energy footprint of the global data movement by (1) analyzing the energy vs. performance tradeoffs of end-system and protocol parameters during active data transfers; (2) investigating the accurate prediction of the network device power consumption due to increased data transfer rate on the active links and dynamic readjustment of the transfer rate to balance the energy over performance ratio; and (3) exploring service level agreement (SLA) based energy-efficient transfer algorithms, which will help the service providers to minimize the energy consumption during data transfers without compromising the SLA with the customer in terms of the promised performance level, but still execute the transfers with minimal energy levels given the requirements.

Poster Slides
Saul Youssef Boston University CIF21 DIBBs: EI: North East Storage Exchange Award #: 1753840 Abstract

The Northeast Storage Exchange is an NSF/DIBBs project to create shared storage facilities for research, engineering and education projects in the Northeast. NESE is a collaboration between Boston University, Harvard University, MIT, MGHPCC, Northeastern University and UMass. With our main deployment at the MGHPCC data center, we are uniquely situated to provide cost effective high performance storage with economics that allows long term growth.

Poster Slides
Jose Fortes University of Florida SI2-SSE: Human- and Machine-Intelligent Software Elements for Cost-Effective Scientific Data Digitization Award #: 1535086 Abstract

Biodiversity information extraction (IE) from imaged text in digitized museum specimen records is a challenging task due to both the large number of labels and the complexity of the characters and information to be extracted.
The HuMaIN project investigates software-enabled solutions that support the combination of machine and human intelligence to accelerate IE from specimen labels.
Among other contributions, the project proposed the use of self-aware workflows to orchestrate machines and human tasks (the SELFIE model), Optical Character Recognition (OCR) ensembles and Natural Language Processing (NLP) methods to increase confidence in extracted text, named-entity recognition (NER) techniques for Darwin Core (DC) terms extraction, and a simulator for the study of these workflows with real-world data. The software has been tested and applied on large datasets from museums in the USA and Australia.

Poster Slides
Upulee Kanewala Montana State University CRII: SHF: Toward Sustainable Software for Science - Implementing and Assessing Systematic Testing Approaches for Scientific Software Award #: 1656877 Abstract

Custom scientific software is widely used in science and engineering. Often such
software plays an important role in critical decision making. But, due to the lack of
systematic testing in scientific software, subtle faults can remain undetected. One
of the greatest challenges for systematic testing of scientific software is the oracle
problem. We aim to develop automated testing techniques to overcome this
challenge. These techniques will be implemented in METtester: a publically
available testing tool that can be used in the day-to-day scientific development

Poster Slides
Philip A. Wilsey University of Cincinnati III: Small: Partitioning Big Data for the High Performance Computation of Persistent Homology Award #: 1909096 Abstract

Persistent Homology (PH) is computationally expensive and cannot be directly applied on more than a few thousand data points. This project aims to develop mechanisms to allow the computation of PH on large, high-dimensional data sets. The proposed method will significantly reduce the run-time and memory requirements for the computation of PH without significantly compromising accuracy of the results. This project explores techniques to map a large point cloud P to another point cloud P' with fewer total points such that the topology space characterized by P and P' is nearly equivalent. The mapping from P to P' will potentially hide some of the smaller topological features during the PH computation on P'. Restoration of accurate PH results is achieved by (i) upscaling data for the identified large topological features, and (b) partition the data to run concurrent PH computations that locate the smaller topological features.

Poster Slides
Tim Menzies NC State University Elements: Can Empirical SE be Adapted to Computational Science? Award #: 1931425 Abstract

Poster Slides
Christopher Paciorek University of California, Berkeley SI2-SSI: Integrating the NIMBLE statistical algorithm platform with advanced computational tools andanalysis workflows Award #: 1550488 Abstract

Among other contributions, the project proposed the use of self-aware workflows to orchestrate machines and human tasks (the SELFIE model), Optical Character Recognition (OCR) ensembles and Natural Language Processing (NLP) methods to increase confidence in extracted text, named-entity recognition (NER) techniques for Darwin Core (DC) terms extraction, and a simulator for the study of these workflows with real-world data. The software has been tested and applied on large datasets from museums in the USA and Australia.

Poster Slides
Jason Leigh University of Hawaii at Manoa SI2-SSI: SAGEnext: Next Generation Integrated Persistent Visualization and Collaboration Services for Global Cyberinfrastructure Award #: 1441963 Abstract

SAGE2 - the Scalable Amplified Group Environment is the world’s most advanced software for cyber-infrastructure-enabled visualization, analysis, and distance collaboration on scalable display walls. SAGE's ease of use and affordability makes it an excellent platform on which to display a variety of related, high-resolution information in the form of visualizations, enabling collaborators to reach conclusions and make decisions with greater speed, accuracy, comprehensiveness, and confidence. The SAGE user community comprises ~4000 users located at ~800 sites in over 17 countries worldwide ranging from high schools to universities to national research laboratories. Disciplines using SAGE2 include: Archaeology, Architecture, Art, Atmospheric Science ,Biology, Chemistry, Civil Engineering, Communications, Computer Science, Education, Geoscience, Health, Library Science, Medical, Meteorology, Network Engineering, Neuroscience, Physics, Psychology, and Statistics.

Poster Slides
Volker Blum Duke University Collaborative Research: SI2-SSI: ELSI-Infrastructure for Scalable Electronic Structure Theory Award #: 1450280 Abstract

Routine applications of electronic structure theory to molecules and periodic systems need to compute the electron density from given Hamiltonian and overlap matrices. System sizes can range from few to thousands or (in some cases) millions of atoms. Different discretization schemes (basis sets) and different system geometries (finite non-periodic vs. infinite periodic boundary conditions) dictate matrices with different structure. The ELectronic Structure Infrastructure (ELSI) project provides an open-source software interface to facilitate the implementation and optimal use of high-performance solver libraries covering cubic scaling eigensolvers, linear scaling density-matrix-based algorithms, and other reduced scaling methods in between. We cover the ELSI interface software itself, solvers connected to the interface, as well as practical handling (e.g., routines for density matrix extrapolation in geometry optimization and molecular dynamics calculations and general utilities such as parallel matrix I/O and JSON output). Finally, we present benchmarks comparing different solvers, carried out using the ELSI infrastructure on massively parallel supercomputers.

Poster Slides
Volker Blum Duke University DMREF: Collaborative Research: HybriD3: Discovery, Design, Dissemination of Organic-Inorganic Hybrid Semiconductor Materials for Optoelectronic Applications Award #: 1729297 Abstract

This project, called "HybriD3", aims to accelerate the "Design, Discovery, and Dissemination" (D3) of new crystalline organic-inorganic hybrid semiconductors. This presentation will focus on the software and data related aspects of the project. We describe a the web facing data base infrastructure "MatD3" ( and, a database and online presentation package for research data supporting materials discovery, design, and dissemination, developed as a generic package allowing individual research groups or projects to share materials data of any kind in a reproducible, easily accessible way. The package can be connected to the "Qresp" (“Curation and Exploration of Reproducible Scientific Papers”) software (, which facilitates the organization, annotation and exploration of data presented in scientific papers. We finally describe the use of this infrastructure and our broader scientific activities as reflected in the open, hybrid organic-inorganic materials database "HybriD3" (

Poster Slides
David Wells University of North Carolina, Chapel Hill SI2-SSI: Collaborative Research: Scalable Infrastructure for Enabling Multiscale and Multiphysics Applications in Fluid Dynamics, Solid Mechanics, and Fluid-Structure Interaction Award #: 1450327 Abstract

Many biological and biomedical systems involve the interaction of
a flexible structure and a fluid. These systems range from the
writhing and coiling of DNA to the locomotion of birds. The
immersed boundary (IB) method is a broadly applicable framework
for modeling and simulating fluid-structure interaction (FSI). To
improve the efficiency of the IB method, the PI has developed
adaptive versions of the IB method that employ structured
adaptive mesh refinement (AMR) to deploy high spatial resolution
only where needed. These methods have been implemented within the
IBAMR software framework, which provides parallel implementations
of the IB method and its extensions that leverage high-quality
computational libraries including SAMRAI, PETSc, and libMesh. We
present recent work demonstrating improved performance and
scalability of IBAMR and showcase some applications made possible
by these improvements.

Poster Slides
Anthony Danalis University of Tennessee SI2-SSI: Collaborative Proposal: Performance Application Programming Interface for Extreme-scale Environments (PAPI-EX) Award #: 1450429 Abstract

The PAPI team is developing PAPI support to stand up to the challenges posed by next-generation systems by (1) widening its applicability and providing robust support for newly released hardware resources; (2) extending PAPI’s support for monitoring power usage and setting power limits on GPUs; and (3) applying semantic analysis to hardware counters so that the application developer can better make sense of the ever-growing list of raw hardware performance events that can be measured during execution. The poster presents how the team is channeling the monitoring capabilities of hardware counters, power usage, software-defined events into a robust PAPI software package.

Poster Slides
Yung-Hsiang Lu Purdue University SI2-SSE: Analyze Visual Data from Worldwide Network Cameras Award #: 1535108 Abstract

Many network cameras have been deployed for a wide range of purposes. The data from these cameras can provide rich information about the natural environment and human activities. To extract valuable information from this network of cameras, complex computer programs are needed to retrieve data from the geographically distributed cameras and to analyze the data. This project creates an open source software infrastructure by solving many problems common to different types of analysis programs. By using this infrastructure, researchers can focus on scientific discovery, not writing computer programs. This project can improve efficiency and thus reduce the cost for running programs analyzing large amounts of data. This infrastructure promotes education because students can obtain an instantaneous view of the network cameras and use the visual information to understand the world. Better understanding of the world may encourage innovative solutions for many pressing issues, such as better urban planning and lower air pollution.

Poster Slides
Michael Zentner University of California, San Diego S2I2: Impl: The Science Gateways Community Institute (SGCI) for the Democratization and Acceleration of Science Award #: 1547611 Abstract

The Science Gateways Community Institute is in its fourth year of operation, and has demonstrated substantial success in terms the volume and recognized value of the services it provides. This poster will outline those services, present metrics on performance, and provide a view of future sustainability strategies of the SGCI.

Poster Slides
Kesong YANG University of California San Diego SI2-SSI: Collaborative Research: A Robust High-Throughput Ab Initio Computation and Analysis Software Framework for Interface Materials Science Award #: 1550404 Abstract

A three-year SI2-SSI project is proposed to develop a python-based open-source software framework for data-driven interface materials science. This framework will be built on the existing pymatgen, custodian and FireWorks software libraries, integrating them into a complete, user-friendly, and flexible system for high-throughput ab initio computations and analysis. This SSI will greatly expand the capabilities of this framework beyond ground state bulk electronic structure calculations, targeting developmental efforts on three key focus areas of great interest to interface materials science: (i) Ab-initio thermodynamics of surfaces and interfaces; ii) Advanced methods for materials kinetics and diffusion at materials interfaces; and iii) Automated algorithms for structural construction of grain boundary. This project has yielded more than 18 peer-reviewed research articles, including one recent article published in the most prestigious journal Energy and Environmental Science (with impact factor 33.250), which has been widely reported in multiple media and yielded broad impacts.

Poster Slides
Mark Ghiorso OFM Research SI2-SSI: Collaborative Research: ENKI: Software infrastructure that ENables Knowledge Integration for modeling coupled geochemical and geodynamical processes Award #: 1550482 Abstract

ENKI is an open source software framework designed to facilitate the construction and maintenance of thermodynamic models of naturally occurring materials. It provides the capability of accessing these models with a standardized user interface built upon Jupyter notebooks that are hosted on a cloud-based server. The ENKI API is written in Python. The ENKI platform is designed to provide a uniform access to existing thermodynamic databases. The interface provides a straightforward way of calculating and comparing the thermodynamic properties of phases, the ability to construct phase diagrams, and the ability to perform generalized equilibrium calculations. A key aspect of ENKI is the ability to formulate thermodynamic models as symbolic expressions, and to automatically generate from these expressions compatible computer code. This capability supports calibration of thermochemical models from experimental data and encourages replicable and reproducible science.

Poster Slides
Dan Katz University of Illinois at Urbana-Champaign Collaborative Research: SI2-SSI: Swift/E: Integrating Parallel Scripted Workflow into the Scientific Software Ecosystem Award #: 1550588 Abstract

Parsl is an open source parallel programming library for Python, used by both small and large projects (e.g., LSST-DESC in astronomy, ArcticDEM and EarthDEM in geoscience). Parsl augments Python with simple, scalable, and flexible constructs for encoding parallelism. Developers annotate Python functions to create apps, which represent pure Python functions or calls to external applications, whether sequential, multicore, or multi-node MPI. Parsl further allows calls to these apps, called tasks, to be connected by shared input/output data (e.g., Python objects or files) via which Parsl can construct a dynamic dependency graph of tasks. Parsl scripts can be easily moved between different execution resources: local systems, clouds, clusters, supercomputers, and Kubernetes clusters: developers define a Python-based configuration that outlines where and how to execute tasks. Parsl scripts can scale from a single core through to O(100k) nodes on one or more supercomputers.

Poster Slides
Umberto Villa Washington University in St Louis Collaborative Research:SI2-SSI:Integrating Data with Complex Predictive Models under Uncertainty: An Extensible Software Framework for Large-Scale Bayesian Inversion Award #: 1550593 Abstract

Recent years have seen a massive explosion of datasets across all areas of science, engineering, technology, medicine, and the social sciences. The central questions are: How do we optimally learn from data through the lens of models? And how do we do so taking into account uncertainty in both data and models? These questions can be mathematically framed as Bayesian inverse problems. While powerful and sophisticated approaches have been developed to tackle these problems, such methods are often challenging to implement and typically require first and second order derivatives that are not always available in existing computational models. We present an extensible software framework that overcomes this hurdle by providing unprecedented access to state-of-the-art algorithms for deterministic and Bayesian inverse problems and the ability to compute derivatives using adjoint-based methods. Our goal is to make these advanced inversion capabilities available to a broader scientific community, to provide an environment that accelerates scientific discovery.

Poster Slides
Michael Dixon National Center for Atmospheric Research SI2-SSI: Lidar Radar Open Software Environment (LROSE) Award #: 1550597 Abstract

The LROSE project aims to make high quality open source software available to users in the Lidar and Radar atmospheric sciences research community. This NSF-funded project is a collaboration between Colorado State University in Fort Collins, and the National Center for Atmospheric Research in Boulder Colorado.

Poster Slides
Rafael Ferreira da Silva University of Southern California Collaborative Research: SI2-SSE: WRENCH: A Simulation Workbench for Scientific Workflow Users, Developers, and Researchers Award #: 1642335 Abstract

WRENCH enables novel avenues for scientific workflow use, research, development, and education in the context of large-scale scientific computations and data analyses. WRENCH is an open-source library for developing simulators. WRENCH exposes several high-level simulation abstractions to provide high-level building blocks for developing custom simulators. WRENCH provides a software framework that makes it possible to simulate large-scale hypothetical scenarios quickly and accurately on a single computer, obviating the need for expensive and time-consuming trial and error experiments. WRENCH enables scientists to make quick and informed choices when executing their workflows, software developers to implement more efficient software infrastructures to support workflows, and researchers to develop novel efficient algorithms to be embedded within these software infrastructures.

Poster Slides
Andreas Goetz University of California, San Diego SI2-SSE: Enabling Chemical Accuracy in Computer Simulations: An Integrated Software Platform for Many-Body Molecular Dynamics Award #: 1642336 Abstract

We present software elements that enable computer simulations of molecular systems with unprecedented accuracy based on the many-body molecular dynamics (MB-MD) methodology. MB-MD is built upon a rigorous many-body expansion of interaction energies resulting in a fully transferable representation of potential energy surfaces that are derived entirely from correlated electronic structure data without resorting to empirical parameters. Our software includes a Python based workflow system for machine learning of many-body potential energy functions (PEFs) that integrates numerical tools for generating molecular configurations, electronic structure calculations, training set generation, PEF code generation, PEF parameter training, and PEF export for simulation codes, facilitated via centralized database storage. We also present a high-performance, vectorized and OpenMP parallel C++ code for MB-MD simulations including periodic boundary conditions. It contains an API for easy integration with simulation codes and is coupled to the open source i-PI MD driver and free energy toolkit PLUMED.

Poster Slides
Grey Ballard Wake Forest University High Performance Low Rank Approximation for Scalable Data Analytics Award #: 1642385 Abstract

With the advent of internet-scale data, the data mining and machine learning community has adopted Nonnegative Matrix Factorization (NMF) for performing numerous tasks such as topic modeling, background separation from video data, hyper-spectral imaging, web-scale clustering, and community detection. The goals of this project are to develop efficient parallel algorithms for computing nonnegative matrix and tensor factorizations (NMF and NTF) and their variants using a unified framework, and to produce a software package called Parallel Low-rank Approximation with Nonnegative Constraints (PLANC) that delivers the high performance, flexibility, and scalability necessary to tackle the ever-growing size of today's data sets. The algorithms have been generalized to NTF problems and extend the class of algorithms we can efficiently parallelize; our software framework allows end-users to use and extend our techniques.

Poster Slides
Chad Hanna Penn State University Hearing the signal through the static: Real-time noise reduction in the hunt for binary black holes and other gravitational wave transients Award #: 1642391 Abstract

We show results of a real-time classifier that uses auxiliary sensor data to characterize gravitational wave detector noise.

Poster Slides
Ritu Arora University of Texas at Austin, Texas Advanced Computing Center SI2-SSE: An Interactive Parallelization Tool Award #: 1642396 Abstract

Interactive Parallelization Tool (IPT) is a high-productivity tool that can semi-automatically parallelize certain types of serial C/C++ programs and is currently being used for teaching parallel programming to students and domain-experts. It solicits the specifications for parallelization from the users, such as, what to parallelize and where. On the basis of these specifications, IPT translates the serial programs into working parallel versions using one of the three popular parallel programming paradigms, which are: MPI, OpenMP, and CUDA. Hence, IPT can free the users from the burden of learning the low-level syntax of the different parallel programming paradigms, and any manual reengineering required for parallelizing the existing serial programs. The performance of the parallel versions generated using IPT is within 10% of the performance of the best hand-written parallel versions available to us.

Poster Slides
Cameron Smith Rensselaer Polytechnic Institute Fast Dynamic Load Balancing Tools for Extreme Scale Systems Award #: 1533581 Abstract

High performance simulations running on distributed memory, parallel systems require even work distributions with minimal communications. To efficiently maintain these distributions on systems with accelerators, the balancing and partitioning procedures must utilize the accelerator. This work presents algorithms and speedup results using OpenCL and Kokkos to accelerate critical portions of the EnGPar hypergraph-based diffusive load balancer. Focus is given to basic hypergraph traversal and selection procedures.

Poster Slides
Suresh Marru Indiana University Collaborative Research: SI2-SSI: Open Gateway Computing Environments Science Gateways Platform as a Service (OGCE SciGaP) Award #: 1339774 Abstract


Poster Slides
Shawn Douglas UC San Francisco SI2:SSE: Collaborative Research: Integrated Tools for DNA Nanostructure Design and Simulation Award #: 1740212 Abstract

DNA origami, a method for constructing nanoscale objects, relies on a long single strand of DNA to act as the 'scaffold' to template assembly of numerous short DNA oligonucleotide 'staples', which assemble into megadalton-sized nanostructures comprised of tens of thousands of DNA bases. Designing and experimentally testing a DNA origami nanostructure can take more than 2 weeks effort, and cost over $1000 per design in materials and labor. Fast simulation tools can improve efficiency in this process. We present a GPU-powered simulation tool for DNA origami that can provide a robust 3D structure prediction in a matter of minutes.

Poster Slides
Session 4 - Feb 14, Afternoon
# Name Organization NSF Award Abstract Poster Talk
Edgar Solomonik University of Illinois at Urbana-Champaign Collaborative Research: Frameworks: Scalable Modular Software and Methods for High-Accuracy Materials and Condensed Phase Chemistry Simulation Award #: 1931258 Abstract

The goal of our project is to bring high-accuracy methods to state of practice for materials and condensed phase chemistry by equipping PySCF with robust periodic mean-field and wave-function methods, leveraging reduced-scaling approximations, and innovating in tensor abstractions. We describe preliminary results that include introduction of QMC methods to PySCF, a new algorithmic technique to handle group symmetry in tensors, as well as innovations to tensor decomposition and automatic differentiation methods.

Poster Slides
Philip Harris MIT Collaborative Research: Frameworks: Machine learning and FPGA computing for real-time applications in big-data physics experiments Award #: 1931561 Abstract

Machine learning and FPGA computing for real-time applications in big-data physics experiments

Poster Slides
Gianfranco Ciardo Iowa State University SI2-SSE: A Next-Generation Decision Diagram Library Award #: 1642397 Abstract

The need to store and manipulate massive data is ubiquitous. Data arising from man-made artifacts such as hardware and software may exhibit a structure amenable to compact and efficient decision diagram techniques, a prime example being BDDs for symbolic model checking.

In the last two decades, many extensions of BDDs have been proposed, but libraries implementing them have been lacking. Our project addresses this need with Meddly, a software library that supports arbitrary discrete variable (including variables with unknown bounds and even with infinite domains, under certain restrictions), integer or real numerical ranges for the encoded functions (implemented either using multi-terminal or edge-valued decision diagrams), and various reduction and variable order techniques (to reduce the diagrams' size and the time to manipulate them).

Applications of Meddly include traditional symbolic model checking, generation of minimal counterexamples, numerical solution of Markov chains, analysis of underspecified systems, integer constraint problems, and combinatorial optimization.

Poster Slides
Anthony Danalis University of Tennessee SI2-SSE: PAPI Unifying Layer for Software-Defined Events (PULSE) Award #: 1642440 Abstract

The PAPI Unifying Layer for Software-defined Events (PULSE) project focuses on enabling cross-layer and integrated monitoring of whole application performance by extending PAPI with the capability to expose performance metrics from key software components found in the HPC software stack. On one end, PULSE provides a standard, well-defined and well-documented API that high-level profiling software can utilize to acquire and present to application developers performance information about the libraries used by their application. On the other end, it provides standard APIs that library and runtime writers can utilize to communicate to higher software layers information about the behavior of their software.

Poster Slides
I-Te Lu California Institute of Technology SI2-SSE: PERTURBO: a software for accelerated discovery of microscopic electronic processes in materials Award #: 1642443 Abstract

PERTURBO: a software package for electron interactions, charge transport and ultrafast dynamics

Poster Slides
P. Bryan Heidorn University of Arizona SI2-SSE: Visualizing Astronomy Repository Data using WorldWide Telescope Software Systems Award #: 1642446 Abstract

There are two main outcomes of this project. The first is a port of the WorldWide Telescope (WWT) from a standalone Windows OS VR application to a Web-based portal. The second outcome is the development of astronomy focused data management and processing on the CyVerse computational infrastructure. The WorldWide Telescope (WWT) provides a powerful data-visualization interface for data exploration and presentation. Through the open source WWT visualization software systems, this project enables the broader use of institutional and community-based, researcher-oriented astronomy data repositories and computational tools. The astronomy researcher workflow incorporates depositing data to make it discoverable through search and browsing, accessible through open access, actionable through connections to existing tools as well as community-developed tools running on CyVerse, and finally visualizing or citing data. We have added a cloud-based access to Jupyter Notebooks, R, JS9, and other tools for astronomy, Virtual Observatory compliant server and modified IPAC’s Firefly software for use by the James Web Space Telescope NIRCam.

Poster Slides
Xiaosong Li University of Washington SI2-SSI: Sustainable Open-Source Quantum Dynamics and Spectroscopy Software Award #: 1663636 Abstract

The overarching goal of the project is to develop an innovative software platform, namely Chronus Quantum (ChronusQ), that is capable of modeling any type of time-resolved multidimensional spectroscopy using quantum electronic and nuclear dynamics. ChronusQ performs quantum dynamic simulations of the same light-matter interactions that occur in time-resolved multidimensional spectroscopies directly in the time-domain. The software is unique, in that it seamlessly integrates time-dependent quantum mechanical theories, spectral analysis tools and modular high-performance numerical libraries that are highly parallelized, extensible, reusable, community-driven, and open-sourced. The ChronusQ software is well-designed and well-documented to promote reusability, composability, maintainability, and sustainability. ChronusQ will make predictions and interpretations of multi-dimensional spectral features as routine as it currently is for linear spectra, yielding a direct path to the discovery and design of molecules and materials that demonstrate new or enhanced high-order optical, magnetic, electronic, and plasmonic features.

Poster Slides
Edgar Gabriel University of Houston Collaborative Research: SI2-SSI: EVOLVE: Enhancing the Open MPI Software for Next Generation Architectures and Applications Award #: 1663887 Abstract

Open MPI is a widely used open source implementation of the Message Passing Interface specification. The goal of this project is to enhance the Open MPI software library, focusing on two aspects. First, extending Open MPI to support new features of the MPI specification, such as improving support for hybrid programming models and support for fault tolerance in MPI applications. Second, enhance the Open MPI core to support new architectures and improve scalability. This includes rework of the startup environment that will improve process launch scalability, increase support for asynchronous progress of operations, enable support for accelerators, and reduce sensitivity to system noise. The project would also enhance the support for File I/O operations as part of the Open MPI package by expanding our work on highly scalable collective I/O operations.

Poster Slides
David Tarboton Utah State University Collaborative Research: SI2-SSI: Cyberinfrastructure for Advancing Hydrologic Knowledge through Collaborative Integration of Data Science, Modeling and Analysis Award #: 1664061 Abstract

HydroShare is a domain specific data and model repository operated by the Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) to advance hydrologic science by enabling individual researchers to more easily and freely share data and models from their research. HydroShare supports Findable, Accessible, Interoperable and Reusable (FAIR) principles. It is comprised of two sets of functionalities: (1) a repository for users to share and publish data and models in a variety of formats, and (2) tools (web apps) that can act on content in HydroShare and support web-based access to compute capability. Together these move us towards a platform for collaboration and computation that integrates data storage, organization, discovery, and analysis through web applications (web apps) and that allows researchers to employ services beyond the desktop to make data storage and manipulation more reliable and scalable, while improving their ability to collaborate and reproduce results.

Poster Slides
B.S. Manjunath University of California, Santa Barbara SI2-SSI: LIMPID: Large-Scale IMage Processing Infrastructure Development Award #: 1664172 Abstract

The primary goal is to create a large scale distributed image processing infrastructure, the LIMPID, though a broad, interdisciplinary collaboration of researchers in databases, image analysis, and sciences. In order to create a resource of broad appeal, the focus will be on three types of image processing: simple detection and labelling of objects based on detection of significant features and leveraging recent advances in deep learning, semi-custom pipelines and workflows based on popular image processing tools, and finally fully customizable analysis routines. Popular image processing pipeline tools will be leveraged to allow users to create or customize existing pipeline workflows and easily test these on large-scale cloud infrastructure from their desktop or mobile devices. In addition, a core cloud-based platform will be created where custom image processing can be created, shared, modified, and executed on large-scale datasets and apply novel methods to minimize data movement. Usage test cases will be created for three specific user communities: materials science, marine science and neuroscience.

Poster Slides
Andrew Schultz University at Buffalo SI2-SSE: Infrastructure Enabling Broad Adoption of New Methods That Yield Orders-of-Magnitude Speedup of Molecular Simulation Averaging Award #: 1739145 Abstract

Mapped averaging is a recently published scheme for the reformulation of ensemble averages. The framework uses approximate results from statistical mechanical theory to derive new ensemble averages (mapped averages) that represent exactly the error in the theory. Well-conceived mapped averages can be computed by molecular simulation with remarkable precision and efficiency, and in favorable cases the speedup factors are several orders of magnitude.

Harmonically mapped averaging (HMA) is the application of mapped averaging to crystalline systems. It enables simulation to compute directly the anharmonic contribution to the properties, without noise contributed by harmonic behavior. The result is a technique for computing crystalline properties with unprecedented, transformative efficiency.

The aim of this project is to implement these methods on well-established and widely used software packages for simulation of crystalline systems, and furthermore to develop mapped averages for new applications of interest to the users of these systems.

Poster Slides
Andrew Connolly University of Washington An Ecosystem of Reusable Image Analytics Pipelines Award #: 1739419 Abstract

The data volumes associated with image processing in astronomy can range from small sets of images taken by individual observers to large survey telescopes generating tens of petabytes of data per year. The tools used by researchers to analyze their images are often bespoke, tailored to specific tasks or science use cases. As part of an initiative to share analysis tools across astronomy (and broader communities) we are developing a cloud-aware analysis framework (the astronomy commons). We demonstrate here an image analysis system (built to process data from the Large Synoptic Survey Telescope; LSST) that can be deployed on the cloud using Amazon's S3, RDS, Lambda, and EBS services together with HTCondor and Pegasus to manage the overall workflow. We demonstrate the scaling of this system (and associated processing costs) to the size of nightly data volumes expected from the LSST.

Poster Slides
Christina Peterson University of Central Florida SI2-SSE: TLDS: Transactional Lock-Free Data Structures Award #: 1740095 Abstract

Traditionally, non-blocking data structures provide linearizable operations, but these operations are not composable. Transactional data structures can perform a sequence of operations that appears to execute atomically, which facilitates modular design and software reuse. TLDS encompasses: 1) A scalable methodology for transforming non-blocking data structures into transactional containers; 2) A library of transactional data structures, and 3) A tool to validate their correctness.

Poster Slides
Michael Sokoloff University of Cincinnati Collaborative Research: SI2:SSE: Extending the Physics Reach of LHCb in Run 3 Using Machine Learning in the Real-Time Data Ingestion and Reduction System Award #: 1740102 Abstract

This poster describes a hybrid machine learning algorithm for finding primary vertices in proton-proton collisions produced in the LHCb detector at CERN in Run 3. A proof-of-principle has been demonstrated using a kernel density estimator that transforms sparse 3D data into a rich 1D data set that is processed by a convolutional neural network. The algorithm learns target histograms that serve as proxies for the primary vertex positions. Basic concepts are illustrated. Results to date are summarized. Plans for future work are presented.

Poster Slides
Hyowon Park University of Illinois at Chicago SI2-SSE: Collaborative Research: Software Framework for Strongly Correlated Materials: from DFT to DMFT Award #: 1740112 Abstract

Dynamical Mean Field Theory (DMFT) has been successful in computing electronic structure of strongly correlated materials specially when it is combined with density functional theory (DFT). Here, we present an open-source computational package combining DMFT with various DFT codes interfaced to a Wannier90 package for adopting maximally localized Wannier functions as local orbitals to describe a correlated subspace. Our package provides the library mode for computing a DMFT density matrix such that it can be efficiently linked to various DFT codes and achieve the charge-self-consistency within DFT+DMFT loops. We used our package for the study of well-known correlated materials, namely LaNiO3, SrVO3, and NiO to compute the density of states, the band structure, the total energy, the atomic force, and the Fermi surface within DFT+DMFT.

Poster Slides
Brian Broll Vanderbilt University SI2-SSE: Deep Forge: a Machine Learning Gateway for Scientific Workflow Design Award #: 1740151 Abstract

DeepForge is a gateway to deep learning for the scientific community. It provides an easy-to-use, yet powerful visual interface to facilitate the rapid development of deep learning models. This includes a carefully designed hybrid textual-visual programming interface to support novices as well as experts. Utilizing an extensible cloud-based infrastructure, DeepForge is designed to integrate with external compute and storage APIs to enable reuse of existing HPC resources including the SciServer from Johns Hopkins. The driving design principles are promoting reproducibility, ease of access, and enabling remote execution of machine learning pipelines. The tool currently supports TensorFlow/Keras, but its extensible architecture enables integrating additional platforms easily.

Poster Slides
Yosuke Kanai University of North Carolina at Chapel Hill Collaborative Research: NSCI: SI2-SSE: Time Stepping and Exchange-Correlation Modules for Massively Parallel Real-Time Time-Dependent DFT Award #: 1740204 Abstract

Our goal is to build, test, and broadly disseminate new software modules for the real-time time-dependent density functional theory (RT-TDDFT) component in the massively parallel open-source Qb@ll code, to mitigate two most pressing limitations: (i) Large computational cost due to small time steps needed to control the numerical error of real-time integration. (ii) Limited accuracy of the electronic structure computed by commonly used exchange-correlation approximations. We will address this through developing‚Ä®(1) New modules for improved numerical integration of the underlying non-linear partial differential equations via strong stability-preserving Runge Kutta methods.‚Ä®(2) New modules that compute the electronic structure through a modern implementation of an advanced exchange-correlation functionals. Our second objective is to disseminate these developments by building an academic-research HPC community with the Qb@ll code
(3) Development of convenient interfaces between the Qb@ll and other main-stream electronic-structure codes for data conversion.‚Ä®(4) Engagement of early scientists (PhDs and post-docs) by incorporating the hands-on training on RT-TDDFT using the Qb@ll code during TDDFT workshops.

Poster Slides
Brian Demsky University of California, Irvine SI2-SSE: C11Tester: Scaling Testing of C/C++11 Atomics to Real-World Systems Award #: 1740210 Abstract

We have long relied on increased raw computing power to drive technological progress. However, processors are now reaching their limits in terms of raw computing power, and continuing progress will require increased productivity in developing parallel software. Fully leveraging the performance of multi-core processors will in many cases require developers to make use of low-level "atomic" (or indivisible) operations such as those provided by the C11 and C++11 languages, so that can make very fine-grained optimizations to their code, and take full advantage of the computing power these processors offer them. Writing code using C/C++ atomics is extremely difficult to do correctly and it is very easy to introduce subtle bugs in the use of these constructs. Testing for concurrency bugs in code that uses C/C++11 atomics can be extremely difficult as a bug can depend on the schedule, the state of the processor's memory subsystem, the specific processor, and the compiler. The C11Tester project will develop tools for testing concurrent code that makes use of C/C++11 atomics and make these tools available to both researchers and practitioners.

Poster Slides
Alex Pak University of Chicago Highly Efficient and Scalable Software for Coarse-Grained Molecular Dynamics Award #: 1740211 Abstract

The Voth Group has pioneered the rigorous design and application of systematic molecular coarse-graining (CG) to study biomolecular, condensed phase, and novel materials systems. For example, we have used simulations to study protein-protein self-assembly, membrane-protein interactions, biomolecular and liquid state charge transport, complex fluids, nanoparticle self-assembly, and charge-mediated energy storage. We are currently developing a software infrastructure to make the processes underlying systematic CG modeling accessible to other researchers and the public. These models are characterized by novel and unique challenges in their parameterization and simulation. By integrating our methods into standard simulation packages, workflow environments, and creating a portal and data depository for accurate models, we aim to make these scientific tools more widely used.

Poster Slides
Stanimire Tomov University of Tennessee, Knoxville SI2:SSE: MAtrix, TEnsor, and Deep-Learning Optimized Routines (MATEDOR) Award #: 1740250 Abstract

The MAtrix, TEnsor, and Deep-learning Optimized Routines (MATEDOR) project seeks to develop software technologies and standard APIs, along with a sustainable and portable library for large-scale computations, the individual parts of which are very small matrix or tensor computations. The main target is the acceleration of science and engineering applications that fit this profile, including deep learning, data mining, astrophysics, image and signal processing, hydrodynamics, and more. Working closely with affected application communities, we have defined modular, language agnostic APIs for batched computations. We incorporated the MATEDOR developments in a high-performance numerical library for batched linear algebra computations, autotuned for modern processor architectures and system designs. MATEDOR includes LAPACK routine equivalents for small dense problems, tensors, and application-specific operations, e.g., for deep learning. Routines are constructed as much as possible out of calls to batched BLAS routines and their look-alikes required in sparse computations. The software is released through the open source MAGMA library.

Poster Slides
Ryan May University Corporation for Atmospheric Research SI2-SSE: MetPy - A Python GEMPAK Replacement for Meteorological Data Analysis Award #: 1740315 Abstract

GEMPAK is a legacy scripted weather analysis package used extensively in education and research within the meteorology community. The goal of MetPy is to provide a modern framework to replicate this scripted analysis functionality, but do so by leveraging the extensive, community-driven scientific Python ecosystem. To serve as a viable GEMPAK replacement, MetPy has grown many features, including support for cross-sections and added varied calculations. MetPy’s low-level infrastructure has also standardized on the use of xarray as a standard data model, as well as leveraging unit support to ensure correctness of calculations. MetPy has also developed a simplified plotting syntax that mimics the syntax of GEMPAK. This work discusses the details of these additions, including challenges encountered, as well as future plans for development as we wrap up the final year of this project.

Poster Slides
Serban Porumbescu University of California Davis Gunrock: High-Performance GPU Graph Analytics. Award #: 1740333 Abstract

Our goal with this award was to develop the "Gunrock" programmable, high-performance graph analytics library for programmable graphics processors (GPUs) from a working prototype to a robust, sustainable, open-source component of the GPU computing ecosystem. Our open-source initiatives are strong and we noticed significant spikes in traffic (over 1400 clones) in the two weeks following our 1.0 release alone. DARPA has adopted Gunrock as the benchmark for which its next generation parallel processor must beat. MIT's GraphIt domain specific language is generating Gunrock code. NVIDIA is in the process of incorporating Gunrock into RAPIDS, their open GPU data science initiative. We believe this work is a real success story that is a direct result of this NSF award.

Poster Slides
Shaowen Wang University of Illinois at Urbana-Champaign SI2-S2I2 Conceptualization: Geospatial Software Institute Award #: 1743184 Abstract

Many scientific and societal grand challenges (e.g., emergency management, environmental sustainability, population growth, and rapid urbanization) are inherently geospatial as articulated in a number of visionary reports such as the NSF's Ten Big Ideas and the United Nations Sustainable Development Goals. A variety of fields (e.g., environmental engineering and sciences, geosciences, and social sciences) are increasingly dependent on geospatial software to tackle these challenges. Critical and urgent efforts are also needed to prepare the next-generation workforce for computation- and/or data-intensive geospatial-related research and education, technological innovation, and real- world problem solving and decision making. In response, we have engaged diverse communities that develop and use geospatial concepts and software for conceptualizing a national Geospatial Software Institute (GSI). The mission of the GSI should be to transform geospatial software, cyberinfrastructure (CI), and data science across many fields to revolutionize diverse discovery and innovation by enhancing computational transparency and reproducibility. Its vision is a sustainable social and technical ecosystem to enable geospatial-inspired innovation and discovery. Overall, GSI is well-positioned to revolutionize many science domains while nurturing a high-performance, open, and sustainable geospatial software ecosystem across academia, government, and industry.

Poster Slides
Karthik Ram University of California, Berkeley SI2-S2I2 Conceptualization: Conceptualizing a US Research Software Sustainability Institute (URSSI) Award #: 1743188 Abstract

Many science advances have been possible thanks to use of software. This software, also known as “research software", has become essential to progress in science and engineering. The scientists who develop the software are experts in their discipline, but do not have sufficient understanding of the practices that make software development easier, and the software more robust, reliable, maintainable and sustainable. This is an unfortunate state of affairs as researchers in the UK and the US report that 90-95% rely on research software for their work. 63-70% of these researchers also believe that their work would not be possible if such software were to become unavailable.
Through a grant funded by the US National Science Foundation ( we have been engaged in a series of activities to understand specific challenges that make research software unsustainable and why researchers who develop software face uncertain career paths. In this talk I'd like to discuss some solutions based on surveys, ethnographic studies, and workshops that we carried out over an 18 month period in 2018-2019.

Poster Slides
Ivan Rodero Rutgers University CIF21 DIBBs: EI: Virtual Data Collaboratory: A Regional Cyberinfrastructure for Collaborative Data Intensive Science Award #: 1640834 Abstract

The Virtual Data Collaboratory (VDC) is a federated data cyberinfrastructure that is designed to drive data-intensive, interdisciplinary and collaborative research, and enable data-driven science and engineering discoveries. VDC accomplishes this by providing seamless access to data and tools to researchers, educators, and entrepreneurs across a broad range of disciplines and scientific domains as well as institutional and geographic boundaries. In addition to enabling researchers to advance research frontiers across multiple disciplines, VDC also focuses on (1) training the next generation of scientists with deep disciplinary expertise and a high degree of competence in leveraging data, cyberinfrastructure, and tools to address research problems and (2) helping data scientists and engineers develop and apply advanced federated data management and analysis tools for high impact scientific applications.

Poster Slides
Daniel G Aliaga Purdue University Elements: Data: U-Cube: A Cyberinfrastructure for Unified and Ubiquitous Urban Canopy Parameterization Award #: 1835739 Abstract

As countries around the world rapidly urbanize and continue investing in infrastructure, the vulnerability to extreme weather continues to grow. Due to the large infrastructure and population base, cities are disproportionately affected by weather extremes as was witnessed during recent storms. Challenged by the fact that cities are complex entities, current computational models have a bottleneck in providing a robust means of generating parameter statistics that define a city's morphology. This U-cube project will utilize a novel inverse modeling approach that addresses this bottleneck by utilizing satellite images, population, elevation, road, and typology data about the various zones in a city, to infer a 3D model of a city. From this digital city, a set of urban canopy parameters (UCP) will be distilled for use in simulation models to predict meteorology, and in long run air quality, health and behavior of a city.

Poster Slides
Roland Haas University of Illinois SI2-SSI: Collaborative Research: Einstein Toolkit Community Integration and Data Exploration Award #: 1550514 Abstract

The Einstein Toolkit is a community-driven software platform of core computational tools to advance and support research in relativistic astrophysics and gravitational physics. We are developing and supporting open software for relativistic astrophysics. Our aim is to provide the core computational tools that can enable new science, broaden our community, facilitate interdisciplinary research and take advantage of emerging petascale computers and advanced cyberinfrastructure. I will report on the growth and activity in the Einstein Toolkit User community and scientific results obtained using the Toolkit software.

Poster Slides
Xiaozhu Meng Rice University SI2-SSI: Collaborative Research: A Sustainable Infrastructure for Performance, Security, and Correctness Tools Award #: 1450273 Abstract

Software has become indispensable to society. However, the properties of software systems cannot be understood without accounting for code transformations applied by optimizing compilers used to compose algorithm and data structure templates, and libraries available only in binary form. To address this need, we have been enhancing the Dyninst binary analysis and instrumentation toolkit to provide a foundation for performance, correctness, and security tools. We accelerate Dyninst to analyze large binaries using multiple threads to parse machine code and ingest symbol tables. Using Dyninst, we are building data race detection tools for OpenMP programs. In HPCToolkit performance tools, we use Dyninst to help map performance measurements back to source code, and to analyze execution traces to pinpoint, quantify and diagnose performance bottlenecks in parallel programs.

Poster Slides
Wenchang Lu North Carolina State University NSCI SI2-SSE: Multiscale Software for Quantum Simulations of Nanostructured Materials and Devices Award #: 1740309 Abstract

The development of robust, adaptive software and algorithms that can fully exploit exascale capabilities and future computing architectures is critical to designing advanced materials and devices with targeted properties. We have developed an open-source code that discretizes the DFT equations on real-space grids that are distributed over the nodes of a massively parallel system via domain decomposition. Multigrid techniques are used to dramatically accelerate
convergence while only requiring nearest neighbor communications. The real-space multigrid (RMG) code achieves full plane wave accuracy and scales from desktops and clusters to supercomputers consisting of ~200k cores and 20k GPUs, including the Cray XE-XK systems and the new IBM-NVIDIA pre-exascale Summit. Multilevel parallelization with MPI, threads and/or Cuda/HIP programming enables adaptation to future exascale supercomputers. RMG is
distributed via, with over 3,800 downloads to date. Advanced functionalities are provided through interfaces to other codes, including QMCPACK, BerkeleyGW, Phonopy, and ALAMODE.

Poster Slides
Hari Subramoni Ohio State University SI2-SSI: FAMII: High Performance and Scalable Fabric Analysis, Monitoring and Introspection Infrastructure for HPC and Big Data Award #: 1664137 Abstract

As heterogeneous computing (CPUs, GPUs etc.) and , networking (NVLinks, X-Bus etc.) hardware continue to advance, it becomes increasingly essential and challenging to understand the interactions between High-Performance Computing (HPC) and Deep Learning applications/frameworks, the communication middleware they rely on, the underlying communication fabric these high-performance middlewares depend on, and the schedulers that manage HPC clusters. Such understanding will enable application developers/users, system administrators, and middleware developers to maximize the efficiency and performance of individual components that comprise a modern HPC system and solve different grand challenge problems. Moreover, determining the root cause of performance degradation is complex for the domain scientist. The scale of emerging HPC clusters further exacerbates the problem. These issues lead to the following broad challenge: How can we design a tool that enables in-depth understanding of the communication traffic on the interconnect and GPU through tight integration with the MPI runtime at scale?

Poster Slides
Matthew Turk University of Illinois at Urbana-Champaign Collaborative Research: SI2-SSI: Inquiry-Focused Volumetric Data Analysis Across Scientific Domains: Sustaining and Expanding the yt Community Award #: 1663914 Abstract

We present recent progress on our project to develop a cross-domain analysis platform.

Clare McCabe Vanderbilt University Collaborative Research: NSCI Framework: Software for Building a Community-Based Molecular Modeling Capability Around the Molecular Simulation Design Framework (MoSDeF) Award #: 1835874 Abstract

Molecular simulation plays an important role in many sub-fields of science and engineering.
Systems composed of soft matter are ubiquitous in science and engineering, but molecular simulations of such systems pose particular computational challenges since the differences in potential energy between distant configurations are on the same order as the thermal motion, requiring time and/or ensemble-averaged data to be collected over long simulation trajectories for property evaluation. Furthermore, performing a molecular simulation of a soft matter system involves multiple steps, which have traditionally been performed by researchers in a ``bespoke'' fashion. The result is that many soft matter simulations are not reproducible based on the information provided in a publication, and large-scale screening of soft materials systems is a formidable challenge. To address the issues of reproducibility and computational screening capability, we have been developing the Molecular Simulation and Design Framework (MoSDeF) software suite. We also propose a set of principles to render molecular simulations Transparent, Reproducible, Usable by others, and Extensible (TRUE). While it is not required to use MoSDeF to create TRUE simulations, MoSDeF facilitates the publication and dissemination of TRUE simulations by automating many of the critical steps in performing molecular simulations, thus enhancing their reproducibility.

Poster Slides
Douglas Thain University of Notre Dame DataSwarm: A User-Level Framework for Data Intensive Scientific Applications Award #: 1931348 Abstract

The DataSwam framework will support the construction of large, data intensive scientific applications that must run on top of national cyberinfrastructure, such as large campus clusters, NSF extreme-scale computing facilities, the Open Science Grid, and commercial clouds. Building on a prior SI2 project, DataSwam will bring several new techniques (molecular tasks composition, in-situ data management, and precision provenance) to lightweight task-execution environments.

Poster Slides
Joe Stubbs University of Texas, Austin Collaborative Proposal: Frameworks: Project Tapis: Next Generation Software for Distributed Research Award #: 1931439 Abstract

Tapis is a web-based API framework for securely managing computational workloads across institutions, so that experts can focus on their research instead of the technology needed to accomplish it. In addition to job execution and data management, Tapis is providing capabilities to enable distributed workflows, including a multi-site Security Kernel, Streaming Data APIs, and first-class support for containerized applications.

Poster Slides
Frank Timmes Arizona State University Collaborative Research: SI2-SSI: Modules for Experiments in Stellar Astrophysics Award #: 1663684 Abstract

Modules for Experiments in Stellar Astrophysics (MESA)

Poster Slides
Greg Newman, Stacy Lynn Colorado State University SI2-SSI: Advancing and Mobilizing Citizen Science Data through an Integrated Sustainable Cyber-Infrastructure Award #: 1550463 Abstract

Citizen science engages members of the public in science. It advances the progress of science by involving more people and embracing new ideas. Recent projects use software and apps to do science more efficiently. However, existing citizen science software and databases are ad hoc, non-interoperable, non-standardized, and isolated, resulting in data and software siloes that hamper scientific advancement. This project will develop new software and integrate existing software, apps, and data for citizen science - allowing expanded discovery, appraisal, exploration, visualization, analysis, and reuse of software and data. Over the three phases, the software of two platforms, and CyberTracker, will be integrated and new software will be built to integrate and share additional software and data. The project will: (1) broaden the inclusivity, accessibility, and reach of citizen science; (2) elevate the value and rigor of citizen science data; (3) improve interoperability, usability, scalability and sustainability of citizen science software and data; and (4) mobilize data to allow cross-disciplinary research and meta-analyses.

Poster Slides
Rion Dooley Chapman University The Agave Platform: An Open Science-As-A-Service Cloud Platform for Reproducible Science Award #: 1450459 Abstract

In today's data-driven research environment, the ability to easily and reliably access compute, storage, and derived data sources is as much a necessity as the algorithms used to make the actual discoveries. The earth is not shrinking, it is digitizing, and the ability for US researchers to stay competitive in the global research community will increasingly be determined by their ability to reduce the time from theory to discovery. The Agave Platform addresses this need by providing a Science-as-a-Service cloud platform that allows researchers to run code, manage data, collaborate meaningfully, and integrate virtually anything. In doing so, it eases the process of conducting reproducible science in today's distributed, collaborative digital labs. Agave is available to use as a publicly available, cloud-hosted PaaS, as well as a self-hosted service for internal use. CLI, SDK, and web applications are available from the website,

Poster Slides
Joe Breen University of Utah CIF21 DIBBs: EI: SLATE and the Mobility of CapabilityAward #: 1724821 Abstract

Much of science today is propelled by multi-institutional research collaborations that require computing environments that connect instrumentation, data, and computational resources. These resources are distributed among university research computing centers, national-scale high performance computing facilities, and commercial cloud service providers. The scale of the data and complexity of the science drive this diversity, and the need to aggregate resources from many sources into scalable computing systems. Services Layer At The Edge (SLATE) provides technology that simplifies connecting university and laboratory data center capabilities to the national cyberinfrastructure ecosystem and thus expands the reach of domain-specific science gateways and multi-site research platforms.

Poster Slides
Andrew Lumsdaine University of Washington CSSI Element: GraphPack: Unified Graph Processing with Parallel Boost Graph Library, GraphBLAS and High-Level Generic Algorithm Award #: 1716828 Abstract

Poster Slides


As of February 14, 2020
Name Organization NSF Award Title Award # Poster Talk

Organizing Committee

Haiying Shen (Chair)

Haiying Shen (Chair)

University of Virginia

Carol Song

Carol Song

Purdue University

Natalia Villanueva Rosales

Natalia Villanueva Rosales

University of Texas at El Paso

Ritu Arora

Ritu Arora

University of Texas, Austin

Sandra Gesing

Sandra Gesing

University of Notre Dame

Upulee Kanewala

Upulee Kanewala

Montana State University

Contact the organizers via email at CSSI-PI-Meeting2020 at googlegroups dot com.

Code of Conduct

The 2020 NSF CSSI PI Meeting is an interactive environment for listening and considering new ideas from a diverse group, with respect for all participants without regard to gender, gender identity or expression, race, color, national or ethnic origin, religion or religious belief, age, marital status, sexual orientation, disabilities, veteran status, or any other aspect of how we identify ourselves. It is the policy of the NSF CSSI PI Meeting that all participants will enjoy an environment free from all forms of discrimination, harassment, and retaliation.

Definition of Sexual Harassment:
Sexual harassment refers to unwelcome sexual advances, requests for sexual favors, and other verbal or physical conduct of a sexual nature. Behavior and language that are welcome/acceptable to one person may be unwelcome/offensive to another. Consequently, individuals must use discretion to ensure that their words and actions communicate respect for others. This is especially important for those in positions of authority since individuals with lower rank or status may be reluctant to express their objections or discomfort regarding unwelcome behavior.

Sexual harassment does not refer to occasional compliments of a socially acceptable nature. It refers to behavior that is not welcome, is personally offensive, debilitates morale, and therefore, interferes with work effectiveness. The following are examples of behavior that, when unwelcome, may constitute sexual harassment: sexual flirtations, advances, or propositions; verbal comments or physical actions of a sexual nature; sexually degrading words used to describe an individual; a display of sexually suggestive objects or pictures; sexually explicit jokes; unnecessary touching.
Definition of Other Harassment:
Harassment on the basis of any other protected characteristic is also strictly prohibited. This conduct includes, but is not limited to the following: epithets, slurs, or negative stereotyping; threatening, intimidating, or hostile acts; denigrating jokes and display or circulation of written or graphic material that denigrates or shows hostility or aversion toward an individual or group.
Definition of Discrimination:
Discrimination refers to bias or prejudice resulting in denial of opportunity, or unfair treatment regarding selection, promotion, or transfer. Discrimination is practiced commonly on the grounds of age, disability, ethnicity, origin, political belief, race, religion, sex, etc. factors which are irrelevant to a person's competence or suitability.
Definition of Retaliation:
Retaliation refers to taking some action to negatively impact another based on them reporting an act of discrimination or harassment.
Reporting an Incident:
Violations of this code of conduct policy should be reported immediately to the Organizing Committee Members (email: CSSI-PI-Meeting2020 at googlegroups dot com). All complaints will be treated seriously and be investigated promptly. Confidentiality will be honored to the extent permitted as long as the rights of others are not compromised. Sanctions may range from verbal warning, to ejection from the 2018 NSF CSSI PI Meeting, to the notification of appropriate authorities. Retaliation for complaints of inappropriate conduct will not be tolerated.