This NSF PI meeting aims at further building the community around the NSF CSSI and its precursor programs (DIBBs, SI2) toward a national cyberinfrastructure ecosystem. The meeting provides a forum for PIs to share technical information about their projects with each other, NSF program directors and others; to explore innovative topics emerging in the software and data infrastructure communities; to discuss and learn about best practices across projects; and to stimulate new ideas of achieving software and data sustainability and new collaborations.
At least one representative (PI/Co-PI/senior personnel) from each active CSSI, SI2 and DIBBS project is required by NSF to attend the meeting and give a one-minute lightning talk and a poster presentation on their project. For a collaborative project with multiple awards, one representative from the entire project is required. Anyone who is interested in attending the meeting is welcome and encouraged to do so. There is no registration fee, but participants need to register by the deadline to allow for planning logistics such as space and food.
This meeting venue is in close vicinity to and concurrently with the SIAM Conference on Parallel Processing for Scientific Computing (PP20). Attendees of PP20 are encouraged to join and interact with CSSI investigators and visit the poster sessions in the CSSI PI meeting.
Report | The final workshop report is available here: Report |
---|---|
Your Feedback | Please provide your feedback on the meeting by answering a few questions on this form: https://shorturl.at/bJT16. The organizing committee will pass on your feedback to future CSSI PI meeting organizers. We will also include your aggregated feedback in the meeting report to NSF. |
Future CSSI Program | The organizing committee would like to hear your thoughts on the future directions of the CSSI programs. Please enter your suggestions and comments into this Google doc by Friday, February 28, 2020. We will include a summary of your suggestions in the meeting report to NSF. |
Mailing List |
We now have a mailing list nsf-cssi-pi@googlegroups.com for future communicatios (e.g., announcements and requests). This list, renamed from the previous SI2-PI mailing list, already includes many of the 2020 PI meeting participants. If you are not on the list already, you may opt in by going to https://groups.google.com/d/forum/nsf-cssi-pi and click on "Subscribe to this group". For how to join a Google Group without using a Gmail address, please refer to this Google help page |
Remote participation | Remote participation will be provided via Zoom. Only talks will be available via Zoom. Join Zoom meeting: https://notredame.zoom.us/j/2435257192 |
---|---|
Lightning Talks | Presentation schedule is available at the Lightning Talks page. Please note your session and order number. Slides must be submitted by Sunday, Feb. 9. We will not be able to incorporate late submissions into the slide deck. |
Posters | Each poster session will follow a lightning talk round for that group of projects. Bring your poster to the poster hall before the poster session starts. Detailed information regarding posters is available at the Posters page. |
Registration | Registration closed. Wait-list only. |
---|---|
Hotel group rate cutoff | January 22, 2020 |
Poster (pdf) upload | February 4, 2020 (extended) |
Lightning talk slide (pdf) upload | February 4, 2020 (extended) |
Meeting dates | February 13-14, 2020 |
Registration
Due to an enthusiastic response, the official registration for 2020 NSF CSSI PI Meeting is now closed as we have reached the maximum capacity planned for this event. You may request to be placed on a waiting list and will be accepted on a first-come-first-served basis if additional openings become available. Thank you for your understanding.
To enroll on the waiting list, please email us at cssi-pi-meeting2020@googlegroups.com.
NSF project personnel attending the meeting need to register by the deadline. The designated presenter for each project needs to include NSF award number, project title, and an abstract of no more than 150 words for the poster on the registration form.
Since the registration form is currently turned off, you cannot edit your registration information.
To share your lightning talk slide and poster, please fill out this form.
Venue
The meeting venue will be Residence Inn by Marriott Seattle Downtown/Convention Center, 1815 Terry Avenue, Seattle, Washington 98101 (map direction).
Book your stay with group discount rate before January 22, 2020!
Locations of the CSSI PI Meeting and SIAM PP20:
Agenda
Time | Event | Speaker | Title |
---|---|---|---|
8:00 AM to 8:30 AM | Registration | ||
8:30 AM | Welcome & Announcements | ||
8:45 AM | Opening remarks | Vipin Chaudhary NSF/OAC | NSF Presentation Slides |
9:00 AM | Lightning Talk Session #1 | ||
10:00 AM | Coffee Break | ||
10:20 AM | Poster Session #1 | ||
11:45 PM | Lunch | ||
1:00 PM | Invited talk | Marianna Safronova University of Delaware | Community Portal for High-Precision Atomic Physics Data and Computation Abstract Slides Poster The goal of this project is to provide the scientific community with easily accessible high-quality atomic data and user-friendly broadly-applicable modern relativistic code to treat electronic correlations. The code will be capable of calculating a very broad range of atomic properties to answer the significant needs of atomic, plasma, and astrophysics communities. We also propose a creation on an online portal for high-precision atomic physics data and computation that will provide a variety of services to address the needs of the widest possible community of users. The portal will contribute a novel element to today's U.S. cyberinfrastructure ecosystem, improving usability and access for the atomic physics community and their fields of application. |
1:20 PM | Invited talk | Fred Hansen Nexight Group | Charting a Path Forward: Insights from a CSSI PI Survey Abstract Slides In developing investment priorities, the CSSI programs seeks to engage the capabilities, curiosity and creativity CI research community PIs. Ongoing feedback from and dialogue with PIs from the CI research community is therefore critical. Surveys are an efficient and effective mechanism for staying connected and collecting input. This session describes the methodology and results of a survey of principal investigators (PIs), co-PIs, and others the CI research community. The survey, which was carried out by the Nexight Group under the Office of Advanced Cyberinfrastructure (OAC) Award (#1930025), had two primary purposes. First, the survey was designed to inform decisions about changes to be made to the National Science Foundation (NSF) Cyberinfrastructure for Sustained Scientific Innovation (CSSI) solicitation. Second, the survey was designed to inform decisions about the future direction and focus of the NSF CSSI umbrella program. The survey results provide insights that enhance CSSI’s support of a cyberinfrastructure for scientific research. |
1:40 PM | Lightning Talk Session #2 | ||
2:40 PM | Coffee Break | ||
3:00 PM | Poster Session #2 | ||
4:00 PM | NSF Presentation | Stefan RobilaNSF/OAC | Future Steps of CSSI Slides |
4:15 PM | Panel Discussion | Moderator: Haiying Shen Panelists: Geoffrey Charles Fox, Juliana Freire, Philip Harris, Andreas Mueller |
ML for CI and CI for ML |
5:00 PM to 8:00 PM | Reception |
Time | Event | Speaker | Title |
---|---|---|---|
8:00 AM to 8:30 AM | Registration | ||
8:30 AM | Recap & Day 2 Agenda | ||
8:45 AM | Lightning Talk Session #3 | ||
9:45 AM | Coffee Break | ||
10:00 AM | Poster Session #3 | ||
11:00 AM | Open-Mic session | Moderator: Ritu Arora | |
12:00 PM | Lunch | ||
1:00 PM | Invited talk | Gordon Watts University of Washington | The Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP) Abstract Slides Poster The Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP) is a Software Institute funded by the National Science Foundation. It aims to develop the state-of-the-art software cyberinfrastructure required for the challenges of data intensive scientific research at the High Luminosity Large Hadron Collider (HL-LHC) at CERN, and other planned HEP experiments of the 2020’s. These facilities are discovery machines which aim to understand the fundamental building blocks of nature and their interactions. In this talk I will discuss a bit of this history and some highlights from our first 15 months of operation (OAC-1836650). |
1:20 PM | Invited talk | Madhav Marathe University of Virginia | Networks, Simulation Science and Advanced Computing Abstract Slides Reasoning about real-world social habitats often represented as multiplexed co-evolving networks is complicated and scientifically challenging due to their size, co-evolutionary nature and the need for representing multiple dynamical processes simultaneously. The 2014 Ebola epidemic, 2009 financial crisis, global migration, societal impacts of natural and human initiated disasters and the effect of climate change provide examples of the many challenges faced when developing such environments. Advances in computing has fundamentally altered how such networks can be synthesized, analyzed and reasoned. We will briefly describe our work on co-evolving socio-technical networks by drawing on our work in urban transport planning, national security and public health epidemiology to guide the discussion. We will conclude by describing an exciting new project funded by the NSF that aims to develop CINES: a scalable cyberinfrastructure for network science. CINES builds on a prior NSF project, CINET and serve as a community resource for network science. CINES provide an extensible platform for producers and consumers of network science data, information, and software. |
1:40 PM | Lightning Talk Session #4 | ||
2:30 PM | Poster Session #4 | ||
3:30 PM | Coffee Break | ||
4:00 PM | Invited Talk | Michela Taufer University of Tennessee Knoxville | Cyberinfrastructure Tools for Precision Agriculture in the 21st Century Abstract Slides Soil moisture is a critical variable that links climate dynamics with water and food security. It regulates land-atmosphere interactions (e.g., via evapotranspiration---the loss of water from evaporation and plant transpiration to the atmosphere), and it is directly linked with plant productivity and plant survival. The current availability in soil moisture data over large areas comes from remote sensing (i.e., satellites with radar sensors), which provides nearly global coverage of soil moisture at spatial resolution of tens of kilometers. Satellite soil moisture data has two main shortcomings. First, although satellites can provide daily global information, they are limited to coarse spatial resolution (at the multi-kilometer scale). Second, satellites are unable to measure soil moisture in areas of dense vegetation, snow cover, or extremely dry surfaces; this results in gaps in the data. In this talk, we will present how we address these two shortcomings with a modular SOil MOisture Spatial Inference Engine (SOMOSPIE). SOMOSPIE consists of modular components including input of available data at its native spatial resolution, selection of a geographic region of interest, prediction of missing values across the entire region of interest (i.e., gap-filling), analysis of generated predictions, and visualization of both predictions and analyses. To predict soil moisture, our engine leverages hydrologically meaningful terrain parameters (e.g., slope and topographic wetness index) calculated using an open source platforms for standard terrain analysis (i.e., SAGA-GIS or System for Automated GeoScientific Analysis-Geographical Information System) and a suite of machine learning methods. We will present empirical studies of the engine's functionality including an assessment of data processing and fine-grained predictions over United States ecological regions with a highly diverse soil moisture profile. |
4:20 PM | Closing Remarks | NSF | |
4:30 PM | Meeting Adjourned |
Poster Presentation
Each active CSSI/DIBBs/SI2 project is expected to present a poster on the project at the PI meeting. You will need to print out and bring a physical copy of your poster on your own; we will not be printing any posters. Collaborative projects (including across multiple institutions) should only bring one poster.
The size of your poster should be no bigger than 24 inches (60 cm) wide and 36 inches (90 cm) tall. A Powerpoint poster template is available here: CSSI Poster Template.pptx.
We will use Figshare to share the posters digitally. Please follow these steps to upload your poster by February 4, 2020:
- Create or log into your Figshare account.
- Follow steps from "My Data" -> "Create a new item" to bring up the content upload form.
- Fill in the appropriate metadata (authors, title).
- Set the "Item type" to poster, and at the keyword stage put "NSF-CSSI-2020-Poster" as one of the chosen keywords. (You must hit return/enter after typing each keyword.)
- You may also want to add your NSF award # to the "Funding" section.
- For license, we recommend selecting "CC BY" (which should be the default).
- Please also add a brief abstract describing your project.
- Hit publish! (so your poster will be accessible by others)
After you have uploaded your poster to Figshare, please use this form to fill in the URL pointing to your poster.
View 2020 CSSI PI meeting posters on Figshare
Obtaining the URL to your poster pdf on Figshare:
-
Sign in to figshare.com and go to "My data", your poster PDF should be displayed among the files that you have uploaded.
-
Click on your poster from the list, your poster and its associated information will be displayed (as the example below). Copy the URL and update your registration form.
Lightning Talks
Each project will also give a brief, one-minute lightning talk to introduce their poster. This is an opportunity to drive meeting participants to your poster. To avoid any technical issues and minimize delays between talks, one slide per lightning talk will need to be submitted by February 4, 2020.
Each slide should include the following information:
- Project title
- Names of the investigators and presenter
- NSF award number
- NSF program that funds the project
A Powerpoint template is available here: CSSI Slide Template.pptx.
We will use Figshare to gather your 1 slide pdf files. Follow the instructions in the "Posters" page to upload your slide (PDF) to Figshare, but use the keyword "NSF-CSSI-2020-Talk" as one of the chosen keywords. Your PDF slide will be shown during your one-minute Lightning Talk.
After you have uploaded your slide to Figshare, please use this form to fill in the URL pointing to your slide.
View 2020 CSSI PI meeting lightning talk slides on Figshare
Presentation Schedule
# | Name | Organization | NSF Award | Abstract | Poster | Talk |
---|---|---|---|---|---|---|
Nagarajan Kandasamy | Drexel University | Collaborative Research: SI2-SSE: High-Performance Workflow Primitives for Image Registration and Segmentation Award #: 1642380 |
Abstract
Image registration is an inherently ill-posed problem that lacks a unique mapping between voxels of the two images being registered. As such, we must confine the registration to only physically meaningful transforms by regularizing it via an appropriate penalty term which can be calculated numerically or analytically. The numerical approach, however, is computationally expensive depending on the image size, and therefore analytical methods are preferable. Using cubic B-splines as the basis for registration, we develop a generalized mathematical framework that accommodates five distinct types of regularizers: diffusion, curvature, linear elastic, third-order, and total displacement. We validate our approach by testing the accuracy achieved by each of the regularizers against their numerical counterpart. We also provide benchmarking results showing that the analytic solutions run significantly faster --- up to two orders of magnitude --- than central-differencing based numerical solutions. |
Poster | Slides | |
Cheryl Tiahrt | University of South Dakota | The South Dakota Data Store, a Modular, Affordable Platform to Enable Data-Intensive Research and Education Award #: 1659282 |
Abstract
Expanding opportunities for data-driven research and increasing requirements for data management in sponsored research have resulted in a growing need for retention of both long-term archival data sets that are infrequently accessed, as well as 'active archives' of data that are accessed periodically to revisit, revise, and share experimental results. For this project, the University of South Dakota acquired, deployed, and maintains the South Dakota Data Store (SDDS), a network-accessible, sharable, multi-campus storage resource integrated with existing campus cyberinfrastructure. SDDS supports twelve STEM projects across eight departments at four institutions in South Dakota, including 30 faculty, 43 postdocs, and 303 students. SDDS provides South Dakota researchers with a centralized, efficient, high-performance platform for both archival of and shared access to large quantities of electronic data. |
Poster | Slides | |
Ewa Deelman | University of Southern California | SI2-SSI: Pegasus: Automating Compute and Data Intensive Science Award #: 1664162 |
Abstract
For almost 20 years the Pegasus Workflow Management System has been designed, implemented and supported to provide abstractions that enable scientists to focus on structuring their computations without worrying about the details of the target cyberinfrastructure. To support these workflow abstractions Pegasus provides automation capabilities that seamlessly map workflows onto target resources, sparing scientists the overhead of managing the data flow, job scheduling, fault recovery and adaptation of their applications. Automation enables the delivery of services that consider criteria such as time-to-solution, as well as takes into account efficient use of resources, managing the throughput of tasks, and data transfer requests. |
Poster | Slides | |
Klaus Bartschat | Drake University | Elements: NSCI-Software -- A General and Effective B-Spline R-Matrix Package for Charged-Particle and Photon Collisions with Atoms, Ions, and Molecules Award #: 1834740 |
Abstract
The project deals with the further development and subsequent distribution of a suite of computer codes that can accurately describe the interaction of charged particles (mostly electrons) and light (mostly lasers and synchrotrons) with atoms and ions. The results are of significant importance for the understanding of fundamental collision dynamics, and they also fulfill the urgent practical need for accurate atomic data to model the physics of stars, plasmas, lasers, and planetary atmospheres. With the rapid advances currently seen in computational resources, such studies can now be conducted for realistic systems, as opposed to idealized models. Due to the high demand for data generated with the present program, the source code as well as instructional material to make the program usable by other interested researchers will be made publicly available. Versions to run on desktop workstations as well as massively parallel supercomputers will be created. |
Poster | Slides | |
David Sandwell | Univ. of California, San Diego | Elements: Software - Harnessing the InSAR Data Revolution: GMTSAR Award #: 1834807 |
Abstract
GMTSAR is an open source InSAR processing system for generating wide-area mapping of the deformation of the surface of the Earth using repeated synthetic aperture radar (SAR) images collected by spacecraft ( https://topex.ucsd.edu/gmtsar/ ). The major deformation signals of interest are associated with earthquakes, volcanoes, glacier flow, and subsidence due to withdrawal of crustal fluids (e.g., water and hydrocarbons). |
Poster | Slides | |
Richard Evans | University of Texas at Austin | NSCI Elements: Software - PFSTRASE - A Parallel FileSystem TRacing and Analysis SErvice to Enhance Cyberinfrastructure Performance and Reliability Award #: 1835135 |
Abstract
Parallel Filesystems are a critical yet fragile component in modern HPC systems. As a resource that is shared and required by most jobs running on an HPC system, a single job can adversely impact all other jobs. This impact can vary from performance degradation to job failure. PFSTRASE monitors the filesystem at all times and provides the necessary information to immediately identify the contribution of every node, job, and user to the filesystem load. The monitoring agent that collects data generates negligible load on the filesystem servers and clients while the backend that analyzes and presents the data is scalable to the largest filesystems. The infrastructure currently supports Lustre but is designed to be extensible to other filesystems. A filesystem has been deployed to verify and validate the infrastructure. An Ansible-based provisioning system has been developed to enable the rapid deployment and reconfiguration of a Lustre filesystem. |
Poster | Slides | |
Kennie Merz | Michigan State University | CSSI: Efficient GPU Enabled QM/MM Calculations: AMBER Coupled with GPU Enabled QUICK Award #: 1835144 |
Abstract
The goal of our project is to continue to improve our software cyberinfrastructure aimed at solving important molecular-level problems in catalysis, drug design, energy conversion. Combined quantum mechanical/molecular mechanical (QM/MM) models have enabled significant advances in our understanding of chemical reactivity. The shortcoming of QM/MM models when using ab initio or density functional theory (DFT) methods is the computational expense, which limits QM/MM modeling. The performance of QM methods has been greatly improved over the years through algorithmic and hardware improvements. In our poster we describe the enhancements and performance of our GPU enabled Quantum Interaction Computational Kernel (QUICK) QM program combined with the Sander molecular dynamics (MD) engine from AMBER. AMBER is one of the most popular simulations packages and has been supported and sustained by the AMBER developer community for ~30 years. The developed software is available to the community via the open source AMBERTools package (see http://ambermd.org/AmberTools.php). |
Poster | Slides | |
Ute Herzfeld | University of Colorado Boulder | Element: Software: Data-Driven Auto-Adaptive Classification of Cryospheric Signatures as Informants for Ice-Dynamic Models Award #: 1835256 |
Abstract
Both collection of Earth observation data from satellites and modeling of physical processes have seen unprecedented advances in recent years, but data-derived information is not used to inform modeling in a systematic and automated fashion. This creates a bottleneck that is growing with the data revolution. The objective of this project is to develop a connection between Earth observation data and numerical models of Earth system processes, through the use of an automated classification and parameterization system proto-typed by the PI's group. To take matters another step forward towards a transformation of the data-modeling connection, we are developing a data-driven auto-adaptive classification system that will utilize satellite data to derive informants for numerical models. The cyberinfrastructure will be implemented in a general and transportable way, but its functionality will be demonstrated by addressing a concrete open problem in glaciology: The acceleration during a glacier surge, which is characterized by an increase to 100-200 times the flow normal velocity. Glacial accelerations are important, because they constitute the largest uncertainty in sea-level-rise assessment. |
Poster | Slides | |
Carlo Piermarocchi | Michigan State University | Elements: Software: NSCI: A Quantum Electromagnetics Simulation Toolbox (QuEST) for Active Heterogeneous Media by Design Award #: 1835267 |
Abstract
Designing novel optical materials with enhanced properties would impact many areas of science and technology, leading to new lasers, better components for photonics, and to a deeper understanding of how light interacts with matter. This project will develop software that simulates how light would propagate in yet to be made complex optical materials. The final product will be a software toolbox that computes the dynamics of each individual light emitter in the materials rather than calculating an average macroscopic field. This toolbox will permit the engineering and optimization of optical properties by combining heterogeneous components at the nanoscale. |
Poster | Slides | |
Yuanfang Cai | Drexel University | Collaborative Research: Elements: Software: Software Health Monitoring and Improvement Framework Award #: 1835292 |
Abstract
This project seeks to bridge the gap between software engineering community and other science and engineering community in general. It will provide quantitative comparisons of software projects against an industrial benchmark, enable users to pinpoint software issues responsible for high maintenance costs, visualize the severity of the detected issues, and refactor them using the proposed interactive refactoring framework. The proposed framework will bring together software users and software developers by enabling non software experts to post software challenges for the software community to solve, which will, in turn, boost the research and advances in software research |
Poster | Slides | |
Daniel Shapero | University of Washington | icepack: an open-source glacier flow modeling library in Python Award #: 1835321 |
Abstract
I will present a new software package for modeling the flow of glaciers and ice sheets called icepack. Icepack is developed using the finite element modeling package firedrake, which provides a domain-specific language (DSL) embedded into Python for the specification of differential equations. The use of this DSL lowers the barrier to entry for development of new physics models for practicing scientists who are not experts in scientific computing. |
Poster | Slides | |
Mohamed Soliman | Oklahoma State University | Element: Data: HDR: Enabling data interoperability for NSF archives of high-rate real-time GPS and seismic observations of induced earthquakes and structural damage detection in Oklahoma Award #: 1835371 |
Abstract
This project focuses on enabling research into hazard mitigation for vulnerable buildings in Oklahoma subjected to the recent increase in induced seismicity, and cumulative damage due to successive earthquakes. This is being conducted by expanding cyberinfrastructure that transmits real-time geophysical and engineering data for use in algorithms to provide low frequency and static deformation measurements of ground motion and building response. The investigation covers differences between source processes and frequency content of earthquakes in tectonically active environments versus induced earthquakes in tectonically passive regions of oil and gas exploration and wastewater injection, which could also have significant implications for seismic hazard mitigation. The objectives will be accomplished by developing and demonstrating modules for the Antelope Environmental Monitoring System that transmit additional sensor data and products created in combination with seismic data. The system architecture will assure data integrity, reduce bandwidth requirements, and alleviate telecommunication bottlenecks from remote multi-sensor stations. |
Poster | Slides | |
Bruce Berriman | California Institute of Technology | Elements: Bringing Montage To Cutting Edge Science Environments Award #: 1835379 |
Abstract
We describe the use of Montage to create all-sky astronomy maps compliant with the Hierarchical Progressive Survey (HiPS) sky-tesselation scheme. These maps support panning and zooming across the sky to progressively smaller scales, and are used widely for visualization in astronomy. They are, however, difficult to create at infrared wavelengths because of high background emission. Montage is an ideal tool for creating infrared maps for two reasons: it uses background modeling to rectify the time variable image backgrounds to a common level; and it uses an adaptive image stretch algorithm to convert the image data to display values for visualization. The creation of the maps involves the use of existing Montage tools in tandem with four new tools to support HiPS. We wil present images of infrared sky surveys in the HiPS scheme. |
Poster | Slides | |
James Bordner | University of California, San Diego | Collaborative Research:Framework:Software:NSCI:Enzo for the Exascale Era (Enzo-E) Award #: 1835402 |
Abstract
Cello is a highly scalable "array-of-octree" based adaptive mesh |
Poster | Slides | |
Bryna Hazelton | University of Washington | Collaborative Research: Elements: Software: Accelerating Discovery of the First Stars through a Robust Software Testing Infrastructure Award #: 1835421 |
Abstract
The birth of the first stars and galaxies 13 billions years ago -- our "Cosmic Dawn" -- is one of the last unobserved periods in the history of the Universe. Scientists are working to observe the 21 cm radio light emitted by the primeval neutral hydrogen fog as the first stars formed and reionized the universe. One of the biggest challenges for the detection of the Epoch of Reionization is the presence of bright astrophysical foregrounds that obscures the signal of interest, requiring extraordinarily precise modeling and calibration of the radio telescopes performing these observations. The 21 cm cosmology community is rapidly developing new techniques for instrument calibration, foreground removal and analysis, but thorough testing and integration into existing data analysis pipelines has been slow. This project provides a software infrastructure that enables rigorous, seamless testing of novel algorithmic developments within a unified framework. This infrastructure also ensures a new level of reliability and reproducibility not previously possible and accelerates the speed at which developments become integrated into production level code, providing an invaluable foundation for bringing our field into the next decade and for leveraging the current NSF investments in these experiments. |
Poster | Slides | |
Juan Pablo Vielma | MIT | Framework: Software: Next-Generation Cyberinfrastructure for Large-Scale Computer-Based Scientific Analysis and Discovery Award #: 1835443 |
Abstract
This project seeks to develop methods and software for computer-based scientific analysis that are sufficiently powerful, flexible and accessible to (i) enable domain experts to achieve significant advancements within their domains, and (ii) enable innovative use of advanced computational techniques in unexpected scientific, technological and industrial applications. In this poster we report on the progress towards this goal by describing advancements both in the development and application of the associated cyberinfrastructure. In particular, we report on the use of mathematical optimization techniques for the analysis of economically viable pathways for decarbonization of electrical power networks. We also report on the development of next-generation of interior point algorithms for convex optimization and their potential to revolutionize applications in machine learning and optimal control. Finally, we report on various community building activities. |
Poster | Slides | |
Thomas Hacker | Purdue University | Elements: Data: Integrating Human and Machine for Post-Disaster Visual Data Analytics: A Modern Media-Oriented Approach Award #: 1835473 |
Abstract
Our poster describes VISER (Visual Structural Expertise Replicator), a visual image service we are investigating based on automated image classification and a scalable cyberinfrastructure. |
Poster | Slides | |
Jordan Powers | National Center for Atmospheric Research | CSSI Software Elements: Cloud WRF for the Atmospheric Research and Education Communities Award #: 1835511 |
Abstract
The Weather Research and Forecasting (WRF) Model is the world's most popular numerical weather prediction model and is supported by the National Center for Atmospheric Research (NCAR) to a community of users across universities, research labs, and operational centers. This effort is exploiting the emerging technology of cloud computing for the critical WRF support effort and for the benefit of the worldwide user community. The work has established an officially-supported version of WRF in the cloud that extends system accessibility, improves model support and training, and facilitates model development. The components include: cloud-configured WRF system code; cloud WRF tutorial materials; and a cloud-based testing capability for code contributions. To date, the cloud WRF materials have been used for tutorials on the modeling system, for student instruction at universities, and for new system version releases. |
Poster | Slides | |
Asti Bhatt | SRI International | Integrated Geoscience Observatory Award #: 1835573 |
Abstract
Geoscientists arrive at scientific results by analyzing observations from a diverse set of instrumentation and often assimilate them into a model. Effective collaboration between scientists can be hampered when they are using different resources, resulting in a lengthy and laborious process. Individual researchers need to assemble many of the community resources on their own before they can conduct successful research or get credit for their work. The Integrated Geoscience Observatory (InGeO) project tackles the problem of seamless collaboration between geoscientists by creating a platform where the data from disparate instruments can be brought together with software tools for data interpretation provided by the instrument operators. |
Poster | Slides | |
Genevieve Bartlett | ISI/University of Southern CA | Elements: Software: Distributed Workflows for Cyberexperimentation(Elie) Award #: 1835608 |
Abstract
Distributed Workflows for Cyberexperimentation (Elie) is a new experiment representation. Elie enables the researcher to abstract the definition of an experiment from its realization. It encodes the desired behavior of an experiment at a high-level as a scenario (e.g. “generate attack from A to B, wait 10 seconds, turn on defense at C”), and provides sufficient details as to how each action in a scenario can be realized on the testbed, via bindings (e.g. use script attack.py with specific parameters for the attack action). Elie further encodes only those features of testbed topology, which matter for the experiment, via constraints (e.g. “use Ubuntu OS on C”). |
Poster | Slides | |
Chris Hill | MIT | Collaborative Research: Framework: Data: Toward Exascale Community Ocean Circulation Modeling Award #: 1835618 |
Abstract
We are developing a community model virtual solution for ocean and climate studies geared toward both classical analysis and modern synthetic training activities. |
Poster | Slides | |
Dan Negrut | University of Wisconsin-Madison | Collaborative Research: Elements:Software:NSCI: Chrono - An Open-Source Simulation Platform for Computational Dynamics Problems Award #: 1835674 |
Abstract
The lightning talk and poster are tied to a software infrastructure called Project Chrono, which is developed through a joint project between the University of Wisconsin-Madison and University of Parma, Italy. Chrono provides simulation support for applications that roughly belong to the computational dynamics field: flexible and rigid body dynamics (Newton-Euler equations), fluid-solid interaction problems (mass balance and Navier Stokes), and granular dynamics (friction/contact/impact constitutive laws). The software leverages parallel computing paradigms (AVX vectorization, GPU cards, multi-core shared memory, and distributed memory). It has been used for Mars rover simulation, vehicle dynamics, autonomous vehicle and robotics simulation, wind energy harvesting, granular dynamics, off-road mobility analysis, planet formation, farming and food processing. It has an online forum with more than 300 registered users and it is released off GitHub as open source under a permissive BSD3 license. Release 5.0 is slated for February 2020. |
Poster | Slides | |
Cate Brinson | Duke University | Collaborative Research: Framework: Data: HDR: Nanocomposites to Metamaterials: A Knowledge Graph Framework Award #: 1835677 |
Abstract
A team of experts from five universities (Duke, RPI, Caltech, Northwestern and Univ of Vermont) develops an open-source materials resource, NanoMine, to enable discovery of fundamental processing-structure-property (p-s-p) relationships for polymer nanocomposites and demonstrates extensibility through the creation of a sister resource for metamaterials, MetaMine. The framework enables annotation, organization and storage of composition, processing, microstructure and property data, along with an array of analysis tools and advanced learning algorithms that facilitate discovery of quantitative p-s-p relationships. A broad spectrum of users can query the system, identify materials that may have certain characteristics, and automatically produce information about these materials. The effort demonstrates the capability of the designed data framework through two domain case studies: discovery of factors controlling dissipation in nanocomposites, and tailored mechanical response in metamaterials motivated by an application to personalize running shoes. The project will significantly improve the representation of data and the robustness with which expanding user communities can identify promising materials applications, enabling new collaborations in materials discovery and design. Strong connections with the National Institute of Standards and Technology (NIST), the Air Force Research Laboratory (AFRL), and Lockheed Martin facilitate industry and government use of the developing knowledge graph. |
Poster | Slides | |
Byung-Jun Yoon | Texas A&M University | Elements: Software: Autonomous, Robust, and Optimal In-Silico Experimental Design Platform for Accelerating Innovations in Materials Discovery Award #: 1835690 |
Abstract
Accelerating the development of novel materials that have desirable properties is a critical challenge as it can facilitate advances in diverse fields across science, engineering, and medicine. However, the current prevailing practice in materials discovery relies on trial-and-error experimental campaigns and/or high-throughput screening approaches, which cannot efficiently explore the huge design space to develop materials with the targeted properties. Furthermore, measurements of material composition, structure, and properties often contain considerable errors due to technical limitations in materials synthesis and characterization, making this exploration even more challenging. This project aims to develop an effective in-silico experimental design platform to accelerate the discovery of novel materials. The platform is built on optimal Bayesian learning and experimental design methodologies that can translate scientific principles into predictive models, in a way that takes model and data uncertainty into account. The optimal Bayesian experimental design framework will enable the collection of smart data that can help exploring the material design space efficiently, without relying on slow and costly trial-and-error and/or high-throughput screening approaches. |
Poster | Slides | |
Xiaogang Ma | University of Idaho | Elements: Software: HDR: A knowledge base of deep time to facilitate automated workflows in studying the co-evolution of the geosphere and biosphere Award #: 1835717 |
Abstract
Geologic time, though widely used as a fundamental framework in geoscience, is a system of concepts that faces the issue of semantic heterogeneity. Our work (deeptimekb.org) aims to build a machine-readable knowledge base of deep time. The planned objectives and activities are multi-fold. Semantic technologies will be used to model and encode the knowledge base for the collected standards. A service of the knowledge base will be set up for both human and machine to access and query the precise meaning of each concept. Through the support of the knowledge base, a few existing geoscience data facilities will be used to carry out case studies in data integration and analysis. Workflow platforms such as Jupyter Notebook will be used in those case studies. The project outputs will be shared on the community repositories, such as the ESIP community ontology repository, to support a national open knowledge network. |
Poster | Slides | |
Michael Shirts | University of Colorado | Collaborative Research: NSCI Framework: Software: SCALE-MS - Scalable Adaptive Large Ensembles of Molecular Simulations Award #: 1835720 |
Abstract
The goal of this project is to create a framework for expression and execution of adaptive ensemble molecular simulation algorithms, based on requirements elicited from the molecular dynamics community. The user-facing and developer-facing aspects of this framework are an adaptive ensemble API which is then backed by a capable runtime layer that can execute on production-scale cyberinfrastructure. Our framework design is grounded in both the application science and methods development communities and is designed as a community code. This effort is linked to driving scientific applications with very long time scales in biophysics, materials science, chemistry, and chemical engineering from the PIs’ laboratories and others in the molecular simulation community. |
Poster | Slides | |
David Hudak | Ohio State University | Frameworks: Software NSCI-Open OnDemand 2.0: Advancing Accessibility and Scalability for Computational Science through Leveraged Software Cyberinfrastructure Award #: 1835725 |
Abstract
High performance computing (HPC) has led to remarkable advances in science and engineering and has become an indispensable tool for research. Unfortunately, HPC use and adoption by many researchers is often hindered by the complex way in which these resources are accessed. Indeed, while the web has become the dominant access mechanism for remote computing services in virtually every computing area, it has not for HPC. Open OnDemand is an open source project to provide web based access to HPC resources. The primary goal of OnDemand is to lower the barrier of entry and ease access to HPC resources for both new and existing users. Through OnDemand users can create, edit, upload/download files, create, edit, submit and monitor jobs, create and share apps, run GUI applications and connect to a terminal, all via a web browser, with no client software to install and configure. |
Poster | Slides | |
Marouane kessentini | University of Michigan-Dearborn | Collaborative Research: Elements: Software: Software Health Monitoring and Improvement Framework Award #: 1835747 |
Abstract
software quality, computational intelligence, artificial assistant |
Poster | Slides | |
Xian-He Sun | Illinois Institute of Technology | Framework: Software: NSCI: Collaborative Research: Hermes: Extending the HDF Library to Support Intelligent I/O Buffering for Deep Memory and Storage Hierarchy Systems Award #: 1835764 |
Abstract
Modern HPC and distributed systems come equipped with deep memory and storage hierarchies (DMSH). The expert knowledge required to manage these multi-tier storage environments puts their benefits out of reach for most scientists and researchers. In this project, we propose the design and development of Hermes, a new, heterogeneous-aware, multi-tiered, dynamic, and distributed I/O buffering platform which provides: 1) Vertical and horizontal distributed buffering in the DMSH: 2) Selective buffering; 3) Adaptive and dynamic buffering via system and application profiling. We are developing new buffering algorithms and mechanisms that address the challenges of a DMSH ecosystem. This effort will eventually boost HDF5 core technology and facilitate an agile architecture that will allow the evolution of next generation I/O and will address the increasingly challenging scale and complexity of future systems. Hermes software is intended to support new scientific/engineering methodologies and will be carefully designed, implemented, and thoroughly tested. |
Poster | Slides | |
Yao Liang | Indiana University Purdue University Indianapolis |  CyberWater—An open and sustainable framework for diverse data and model integration with provenance and access to HPC Award #: 1835817 |
Abstract
To advance our fundamental understanding of the complex behaviors and interactions among the various Earth processes that are critical to the health, resilience and sustainability of water resources, scientists need to be able to use diverse data and integrate models outside their own disciplines with sufficient model accuracy and predictability. Currently, however, this is very difficult to accomplish: (1) a vast quantity of diverse data are not readily accessible to models; and (2) diverse models developed individually by different research groups are difficult to share and integrate cross disciplines. To address these critical challenges, we propose to develop an open and sustainable framework software in cyberinfrastructure (CI) that enables easy and incremental integration of diverse data and models for knowledge discovery and interdisciplinary team-work, and also enables reproducible computing and the seamless and on demand access to various HPC resources which are essential and desirable for communities. Our proposed project is fundamentally important and addresses urgent need in enabling new scientific advances for all water related issues. |
Poster | Slides | |
Zhenming Liu | William & Mary | Elements: Software: NSCI: A high performance suite of SVD related solvers for machine learning Award #: 1835821 |
Abstract
We present our recent research progress on joint-optimization for ML algorithm and SVD solver. Our major discovery is that the "stopping criteria" of a SVD algorithm can be directly optimized for downstream ML applications. We will present a few examples, in which when we change the stopping criteria of the "inner SVD algorithm", we see significant performance gain (in running time) in the "outer" ML algorithm. |
Poster | Slides | |
Carol Song | Purdue University | Framework: Data: HDR: Extensible Geospatial Data Framework towards FAIR (Findable, Accessible, Interoperable, Reusable) Science Award #: 1835822 |
Abstract
Multidisciplinary solutions to address the 21st century’s grand challenges in resource sustainability and resilience are increasingly geospatial data-driven, but researchers spend a significant amount of their time “wrangling data”, i.e., accessing and processing data to make them usable in their modeling and analysis tools. Our NSF CSSI project is developing GeoEDF, an extensible geospatial data framework, that aims to reduce and possibly remove this barrier by creating seamless connections among platforms, data and tools, making large distributed scientific and social geospatial datasets directly usable in models and tools. Through an extensible set of modular and reusable data connectors and processors, GeoEDF is designed to abstract away the complexity of acquiring and utilizing data from remote sources. Researchers can string them together into a workflow that can be executed in various environments including HUBzero tools, HPC resources, or Jupyter Notebooks. By bringing data to the science, GeoEDF will help accelerate data-driven discovery, while ensuring that data is not siloed and improving the practice of FAIR science. |
Poster | Slides | |
Alexey Akimov | University at Buffalo, SUNY | Elements: Libra: The Modular Software for Nonadiabatic and Quantum Dynamics Award #: 1931366 |
Abstract
Sustained progress in scientific endeavors in solar energy, functional, and nanoscale material sciences requires advanced methods and software components that can be used to model the complex dynamics of excited states, including charge and energy transfer. Within the current project, a range of advanced nonadiabatic and quantum dynamics (NA/QD) techniques such as independent and collective trajectory surface hopping methods, non-equilibrium Fermi golden rule rate calculations, and new quantum-classical decoherence schemes will be implemented and made available for future reuse via the open-source Libra library. These “building blocks” will enable the testing new ideas and theories of NA/QD. The infrastructure of Libra methods and the related database of model problems will enable systematic assessment of various NA/QD methods, in order to standardize and rank a “zoo” of the presently available methods. The first “Jacob’s ladder” of NA/QD methods will be built to serve as a roadmap to theorists and practitioners. |
Poster | Slides | |
Alexey Akimov | University at buffalo, SUNY | CyberTraining: Pilot: Modeling Excited State Dynamics in Solar Energy Materials Award #: 1924256 |
Abstract
The design and discovery of new efficient and inexpensive solar energy materials can be accelerated via computational modeling of excited states dynamics in these systems. Nonetheless, training in this area remains relatively scarce; the community is often unaware of the available cyberinfrastructure, lacks the best practice guidelines, and may experience entry barriers to employing these advanced tools. This pilot project aims to fill the above gaps by providing targeted training to young scientists in the proficient use of advanced cyberinfrastructure for modeling excited states dynamics in solar energy materials. The project will leverage and combine the general-purpose Libra code library for modeling excited states dynamics and the Virtual Infrastructure for Data Intensive Analysis (VIDIA) platform for web-based data analysis and visualization, as well as the existing electronic structure packages. The resulting versatile gateway will enable advanced training in modeling excited states dynamics of solar energy materials. |
Poster | Slides | |
Shrideep Pallickara | Colorado State University | Frameworks: Collaborative Proposal: Software Infrastructure for Transformative Urban Sustainability Research Award #: 1931363 |
Abstract
The NSF has invested in several strategic research efforts in the area of urban sustainability, all of which generate, collect, and manage large volumes of spatiotemporal data. This project produces an enabling software infrastructure, Sustain, that facilitates and accelerates discovery by significantly alleviating data-induced inefficiencies. The effort innovative leverages spatiotemporal sketching to decouple data and information. |
Poster | Slides | |
Hanna Terletska | Middle Tennessee State University | Collaborative Research: Element: Development of MuST, A Multiple Scattering Theory based Computational Software for First Principles Approach to Disordered Materials. Award #: 1931367 |
Abstract
Disorder is inevitably present in real materials. Understanding and harnessing the role of disorder is critical for controlling and utilizing the functional properties of quantum systems with disorder. A careful theoretical and numerical analysis to be done is required. The product of this project is open-source MuST software for abinitio study of disorder effects in real materials. We aim to accomplish the following goals: 1) Provide an open-source ab-initio numerical framework for systems with disorder; 2) Create a truly scalable multiple-scattering theory approach for the first principle study of quantum materials. 3) Expand the existing capabilities of ab initio codes to study strong disorder effects i.e., disorder-driven quantum phase transitions, transport and electron localization (currently available at model Hamiltonian level only). 4) Perform the method development to enable exploration of disorder effects in a variety of materials: disordered metals, high entropy alloys, semiconductors, and topological insulators. 5) Enable researcher to perform ab-initio calculations for disordered systems that are presently out of reach to most researchers. |
Poster | Slides | |
In-Ho Cho | Iowa State University | Elements: Development of Assumption-Free Parallel Data Curing Service for Robust Machine Learning and Statistical Predictions Award #: 1931380 |
Abstract
The new era of big data and machine learning (ML) is arising. Data- and ML-driven research is becoming a primary paradigm in broad science and engineering. However, the pandemic issue of missing data may hamper robust ML and statistical inference. The negative impact of incomplete data on ML and statistical learning (SL) may seriously depend upon data types and ML/SL methods. Existing data curing (imputation) methods are difficult for general researchers and often unsuitable for large/big, complex data. To overcome the challenges, this project embarked upon developing a new community-level of data-curing platform that can be deployable on NSF cyberinfrastructure and local HPC facilities. The novelty of this project lies in the fact that it requires no (or minimum) expert-level assumptions and has no restrictions on size, dimension, type, and complexity of data. This project’s service will pursue a novel combination of assumption-free, big data-oriented imputation theories, and parallel algorithms. |
Poster | Slides | |
Robert Harrison | Stony Brook University | Collaborative Research: Frameworks: Production quality Ecosystem for Programming and Executing eXtreme-scale Applications (EPEXA) Award #: 1931387 |
Abstract
EPEXA is creating a production-quality, general-purpose, community-supported, open-source software ecosystem to attack the twin challenges of programmer productivity and portable performance for advanced scientific applications. Through application-driven codesign we focus on the needs of irregular and sparse applications that are poorly served by current programming and execution models on massively-parallel, hybrid systems. |
Poster | Slides | |
Nicholas Murphy | Center for Astrophysics | Harvard & Smithsonian | Collaborative Research: Frameworks: An open source software ecosystem for plasma physics Award #: 1931388 |
Abstract
PlasmaPy is an open source Python package for plasma physics that is currently under development. The ultimate goal of this project is to foster a community-wide open source software ecosystem for plasma research and education. Following Astropy's model, functionality needed across disciplines is being implemented in the PlasmaPy core package, while more specialized functionality will be developed in affiliated packages. We strive to use best practices from software engineering that have heretofore been uncommon in plasma physics, including continuous integration testing and code review. We will describe code development and community building activities from the first few months of our NSF CSSI award and plans for the next five years. |
Poster | Slides | |
Hasan Babaei | University of California-Berkeley | ENABLING ACCURATE THERMAL TRANSPORT CALCULATIONS IN LAMMPS Award #: 1931436 |
Abstract
Molecular dynamics (MD) simulations are used extensively to study thermal transport in materials, and one of the most widely used MD software packages is LAMMPS. However, the most common MD technique to compute thermal conductivity, the Green-Kubo method, yields incorrect results in LAMMPS for many-body potentials. The primary aim of this NSF CSSI project is to create and carefully implement the correct heat flux computation in LAMMPS, a problem made challenging by the fact that this software has hundreds of thousands of users and the solution must be merged into the core LAMMPS code. The objectives of the project are: (1) to implement a corrected heat flux computation for all supported many-body potentials in LAMMPS, (2) to identify the types of molecular systems most affected by the changed heat flux computations, and (3) apply and refine the methodology to predict thermal conductivity for several novel nanomaterials. |
Poster | Slides | |
Denis Zorin | NYU | Open-Source Robust Geometry Toolkit for Black-Box Finite Element Analysis Award #: 1835712 |
Abstract
The numerical solution of PDEs using finite elements and similar methods is ubiquitous in engineering and science applications. Ideally, a PDE solver should be a “black box”: the user provides as input the domain boundary and boundary conditions, and the code returns an evaluator that can compute the value of the solution at any point of the domain. This is surprisingly far from being the case for almost all existing open-source or commercial software. One important source of non-robustness in solvers is treating meshing and FEM basis construction as two disjoint problems. We present our work towards an integrated fully robust pipeline, considering meshing and basis construction as a single challenge. We will demonstrate that tackling the two problems jointly offers many advantages, based on testing with several PDEs on a large dataset. |
Poster | Slides | |
Elsa Olivetti | Massachusetts Institute of Technology | The Synthesis Genome: Data Mining for Synthesis of New Materials Award #: 1922311 |
Abstract
Successes in accelerated materials design, made possible in part through the Materials Genome Initiative, have shifted the bottleneck in materials development towards the synthesis of novel compounds. Existing databases do not contain information about the synthesis recipes necessary to produce compounds. As a result, much of the momentum and efficiency gained in the design process becomes gated by trial‚Äêand‚Äêerror synthesis techniques. This delay in going from promising materials concept to validation, optimization, and scale‚Äêup is a significant burden to the commercialization of novel materials. This research is developing a framework to do for materials synthesis what modern computational methods have done for materials properties: Build predictive tools for synthesis so that targeted compounds can be synthesized in a matter of days, rather than months or years. This proposal extends efforts to include synthetic confirmation of hypotheses generated by predictive models. |
Poster | Slides | |
David Elbert | Johns Hopkins University | DMREF: Data-Driven Integration of Experiments and Multi-Scale Modeling for Accelerated Development of Aluminum Alloys Award #: 1921959 |
Abstract
This project seeks to establish a new paradigm for the materials design loop in which the flow of data, rather than individual modeling or experimental tasks, is viewed as central. The work centers on development of an open semantic infrastructure and streaming data platform to integrate the processing, experimental, and modeling components of materials design. Infrastructure development is embedded in a science program focused on creating aluminum alloys resistant to spall failure in high-energy environments. Such alloys have high value in aircraft and spacecraft while understanding the underlying mechanism of failure has broad scientific importance in understanding ultimate material strength. The tight linkage of infrastructure development and science in this project is central to creating infrastructure that works to close the design loop and encourage more meaningful collaboration between domain experts. Instantiation in a multi-scale modeling framework will provide open tools with broad applicability in the materials domain. |
Poster | Slides |
# | Name | Organization | NSF Award | Abstract | Poster | Talk |
---|---|---|---|---|---|---|
Chaowei Yang | George Mason University | Developing On-Demand Service Module for Mining Geophysical Properties of Sea Ice from High Spatial Resolution Imagery Award #: 1835507 |
Abstract
Sea ice acts as both an indicator and an amplifier of climate change. Multiple sources of sea ice observations are obtained from a variety of networks of sensors (in situ, airborne, and space-borne). To facilitate the science community to better extract important geophysical parameters for climate modeling, we are developing a smart cyberinfrastructure module for the analyses of high spatial resolution (HSR) remote sensing images of sea ice. The project contributes new domain knowledge to the sea ice community. It integrates HSR images that are spatiotemporally discrete to produce a rapid and reliable identification of ice types, and standardizes image processing so as to create compatible sea ice products. The cyberinfrastructure module is a value-added on-demand web service, e.g., reliable classification of sea ice, that can be easily integrated with existing infrastructure.The key objective is to develop a cyberinfrastructure to efferently collect, search, explore, visualize, organize, analyze and share HSR Arctic sea ice imagery. |
Poster | Slides | |
Reed Maxwell | Colorado School of Mines | Collaborative Research: Framework: Software: NSCI : Computational and data innovation implementing a national community hydrologic modeling framework for scientific discovery Award #: 1835903 |
Abstract
Hydrologic science studies the movement of water in the earth system. Continental scale simulation of this flow of water through rivers, streams and groundwater is an identified grand challenge in hydrology. Decades of model development, combined with advances in solver technology and software engineering have enabled large-scale, high-resolution simulations of the hydrologic cycle over the US, yet substantial technical and communication challenges remain. Our interdisciplinary team of computer scientists and hydrologists is developing a framework to leverage advances in computer science transforming simulation and data-driven discovery in the Hydrologic Sciences and beyond. This project is advancing the science behind these national scale hydrologic models, accelerating their capabilities and building novel interfaces for user interaction. Our framework brings computational and domain science (hydrology) communities together in order to move more quickly from tools (models, big data, high-performance computing) to discoveries. Our framework facilitates decadal, national scale simulations, which are an unprecedented resource for both the hydrologic community and the much broader community of people working in water dependent systems (e.g., biological system, energy and food production). These simulations will enable the community to address scientific questions about water availability and dynamics from the watershed to the national scale. Additionally, this framework is designed to facilitate multiple modes of interaction and engage a broad spectrum of users outside the hydrologic community. We will provide easy-to-access pre-processed datasets that can be visualized and plotted using built-in tools that will require no computer science or hydrology background. Recognizing that most hydrology training does not generally include High Performance Computing and data analytics or software engineering, this framework will provide a gateway for computationally enhanced hydrologic discovery. Additionally, for educators we will develop packaged videos and educational modules on different hydrologic systems geared towards K-12 classrooms. |
Poster | Slides | |
Ron Soltz | Wayne State University | Jet Energy-loss Tomography with a Statistically and Computationally Advanced Program Envelope Award #: 1550300 |
Abstract
The Jet Energy-loss Tomography with a Statistically and Computationally Advanced Program Envelope (JETSCAPE) collaboration is an NSF funded multi-institutional effort to design the next generation of event generators to study the physics of jets within the quark-gluon plasma created in ultra-relativistic heavy-ion collisions. Integrated advanced statistical analysis tools provide non-expert users with quantitative methods to validate novel theoretical descriptions of jet modification, by comparison with the complete set of current experimental data. To improve the efficiency of this computationally intensive task, the collaboration has developed trainable emulators that can accurately predict experimental observables by interpolation between full model runs, and employ accelerators such as Graphics Processing Units (GPUs) for both the fluid dynamical simulations and the modification of jets. This framework exists within a user-friendly envelope that allows for continuous modifications, updates and improvements of each of its components. |
Poster | Slides | |
Wolfgang Bangerth | Colorado State University | Collaborative Research: Frameworks: Software: Future Proofing the Finite Element Library Deal.II -- Development and Community Building Award #: 1835673 |
Abstract
Finite element methods (FEMs) are widely used for the solution of |
Poster | Slides | |
Mike Pritchard | University of California, Irvine | Collaborative Research: HDR Elements: Software for a new machine learning based parameterization of moist convection for improved climate and weather prediction using deep learning Award #: 1835863 |
Abstract
Machine learning based representation of sub grid processes in climate simulation was proved in concept in 2018 by several research groups in the limit of an idealized aquaplanet. I will review some of the interesting findings from follow-on tests during the past two years that have clarified the potential for such an approach to work in more realistic settings focusing on the engineering lessons learned and the challenges that remain. Along the way I will discuss emerging diagnostics for testing the physical credibility of prototype emulators, the importance of formal hyperparameter tuning, the strategy we developed to incorporate physical constraints into hybrid machine learning models, and our new CSSI software that makes prognostic testing simpler. I will also share our latest measurements of skill when emulating explicit convection in a modern version of the Community Earth System Model that includes real geography, seasons, and diurnal cycles. |
Poster | Slides | |
Theodore Kisner | Lawrence Berkeley National Laboratory | Collaborative Research: Elements: Software: NSCI: HDR: Building An HPC/HTC Infrastructure For The Synthesis And Analysis Of Current And Future Cosmic Microwave Background Datasets Award #: 1835865 |
Abstract
This project aims to develop new software infrastructure to bridge the gap between HPC (TOAST) and HTC (SPT3G) software frameworks currently used by Cosmic Microwave Background experiments. We are developing code to allow running simulation and analysis modules from one framework in the other, supporting data translation in memory between the models used by the two frameworks, and unifying the data representations and conventions in both frameworks. This project will immediately benefit current and future CMB experiments such as SPT, ACT, BICEP/Keck, Simons Array, Simons Observatory, and CMB-S4. |
Poster | Slides | |
Dmitry Pekurovsky | University of California San Diego | Elements: Software: Multidimensional Fast Fourier Transforms on the path to Exascale Award #: 1835885 |
Abstract
Fast Fourier Transforms (FFT) is a ubiquitous tool in scientific simulations, from CFD to plasma, astrophysics, ocean modeling, materials research, medical imaging, molecular dynamics and many others. This CSSI projects aims at creating a highly efficient, scalable and portable library for FFTs in multiple dimensions. The prototype library is available as open source from http://www.p3dfft.net. It is designed to be highly adaptable to a high number of uses and platforms. This poster presents the efforts of the early stages of this work, namely designing and testing the core of the package and the types of use cases it is designed to handle. |
Poster | Slides | |
Yong Chen | Texas Tech University | Elements:Software:NSCI: Empowering Data-driven Discovery with a Provenance Collection, Management, and Analysis Software Infrastructure Award #: 1835892 |
Abstract
We envision to create a software infrastructure to collect, manage, and analyze provenance data for high performance computing (HPC) systems. The provenance data refers to entities, such as users, jobs, files, and relationships among them. Such a provenance software infrastructure can describe the history of a piece of data; for instance, a user runs a job that produces a dataset, later used by another user when running another job. Advanced data management functionalities such as identifying the data sources, parameters, or assumptions behind a given result, auditing data history and usage, or understanding the detailed process that how different input data are transformed into outputs can be possible. We will introduce our project progress so far and discuss further tasks. |
Poster | Slides | |
Miriah Meyer | University of Utah | Reproducible Visual Analysis of Multivariate Networks with MultiNet Award #: 1835904 |
Abstract
Multivariate networks -- datasets that link together entities that are associated with multiple different variables -- are a critical data representation for a range of high-impact problems, from understanding how our bodies work to uncovering how social media influences society. These networks capture information about relationships between entities as well as attributes of the entities and the connections. Tools used in practice today provide very limited support for reasoning about networks. This project aims fill this critical gap in the existing cyber-infrastructure ecosystem for reasoning about multivariate networks by developing MultiNet, a robust, flexible, secure, and sustainable open-source visual analysis system. The web-based tool, along with an underlying plug-in-based framework, will support three core capabilities: 1) interactive, task-driven visualization of both the connectivity and attributes of networks, 2) reshaping the underlying network structure to bring the network into a shape that is well suited to address analysis questions, and 3) leveraging provenance data to support reproducibility, communication, and integration in computational workflows. These capabilities will allow scientists to ask new classes of questions about network datasets, and lead to insights about a wide range of pressing topics. To meet this goal, we will ground the design of MultiNet in four deeply collaborative case studies with domain scientists in biology, neuroscience, sociology, and geology. |
Poster | Slides | |
Luke Nambi Mohanam | University of California, Irvine | Elements: libkrylov, a Modular Open-Source Software Library for Extremely Large Eigenvalue and Linear Problems Award #: 1835909 |
Abstract
Dense linear systems and eigenvalue problems with extremely large dimensions, i.e., well over a million degrees of freedom or unknowns, underlie many grand challenges in science and engineering, from quantum molecular and materials sciences to fluid dynamics. This project develops, validates, and deploys the general-purpose open-source software library libkrylov for solving these linear systems and eigenvalue problems based solely on vector operations. We will give an overview of the already implemented and planned functionality of libkrylov including the recently developed non-orthonormal Krylov subspace methods, as well as design, data structures, and interfaces. The current implementation uses compile-time polymorphism and user-defined procedure encapsulation to enable high degrees of efficiency, generic coding, and ease of use. Examples of applications to X-ray absorption spectroscopy of single molecular magnets will be discussed. |
Poster | Slides | |
Krister Shalm | Unviersity of Colorado, Boulder | RAISE-TAQS: Randomness Expansion Using a Loophole-Free Bell Test Award #: 1839223 |
Abstract
Our team is building certifiable random number generator using quantum entanglement. This is the only known method for producing random bits whose quality and true randomness can be directly certified. To achieve this, we are researching high-speed, low-loss (>98% transmission), bulk optical switches. Such switches are a critical piece of infrastructure in any quantum network based on photons. We are also working to incorporate our quantum-entangled random number generator into a public randomness beacon, and developing tools for the people to access and use these random bits. The first application being developed is an online app that will use our random bits to draw fair voting district maps that satisfy constitutional requirements to prevent gerrymandering. |
Poster | Slides | |
Saul Teukolsky | Cornell University | Elements:Collaborative Proposal: A task-based code for multiphysics problems in astrophysics at exascale Award #: 1931280 |
Abstract
We describe the development of SpECTRE, an open-source community code for multi-scale, multi-physics problems in astrophysics and gravitational physics. The code uses discontinuous Galerkin methods and task-based parallelism to run at exascale. SpECTRE will allow astrophysicists to explore the mechanisms driving core-collapse supernovae, to understand electromagnetic transients and gravitational-wave phenomena for black holes and neutron stars, and to reveal the dense matter equation of state. |
Poster | Slides | |
Jeff Horsburgh | Utah State University | Collaborative Research: Elements: Advancing Data Science and Analytics for Water (DSAW) Award #: 1931297 |
Abstract
Scientific and management challenges in the water domain are multi-disciplinary, requiring synthesis of data from multiple domains. Many data analysis tasks performed by water scientists are difficult because datasets are large and complex; standard formats for common data types are not always agreed upon nor mapped to an efficient structure for analysis; and water scientists generally lack training in scientific methods needed to efficiently tackle large and complex datasets. This project is advancing Data Science and Analytics for Water (DSAW) by developing: (1) an advanced object data model that maps common water-related data types to high performance Python data structures based on standard file, data, and content types established by the CUAHSI HydroShare system; and (2) new Python packages that enable scientists to automate retrieval of water data, loading it into high performance memory objects, and performing reproducible analyses that can be shared, collaborated around, and formally published for reuse. |
Poster | Slides | |
Dane Morgan | University of Wisconsin - Madison | Collaborative Research: Framework: Machine Learning Materials Innovation Infrastructure Award #: 1931298 |
Abstract
Our project seeks to support rapid development of machine learning applications in Materials Science and Engineering through (i) easy access to data, (ii) cloud-based tools for application of machine learning, and (iii) support for human and machine accessible and sustainable access to disseminated machine learning models. |
Poster | Slides | |
Yinzhi Wang | University of Texas at Austin | Elements: PASSPP: Provenance-Aware Scalable Seismic Data Processing with Portability Award #: 1931352 |
Abstract
Most of our understanding about the Earth’s interior comes from seismology. Over the past decade, the huge success in many large-scale projects like the USArray component of Earthscope gave rise to a massive increase in the data volume available to the seismology community. Such data set has revealed the limitation of existing data processing infrastructure available to the seismologists. As a step towards addressing the issue, we devised a new framework we call Massive Parallel Analysis System for Seismologists (MsPASS), for seismic data processing and management. MsPASS leverages existing big data technologies: (1) a scalable parallel processing framework based on a dataflow computation model (Spark), (2) a NoSQL database system centered on document store (MongoDB), and (3) a container-based virtualization environment (Docker and Singularity). The preliminary development indicates the basic components can be easily deployed on desktops to large modern high-performance computing systems. |
Poster | Slides | |
David Lange | Princeton University | C++ as a service - rapid software development and dynamic interoperability with Python and beyond Award #: 1931408 |
Abstract
A key enabler of innovation and discovery for many scientific researchers is the ability to explore data and express ideas quickly as software prototypes. Tools and techniques that reduce the "time to insight" are essential to the productivity of researchers. At the same time, massive increases in data volumes and computational needs require a continual focus on maximizing code performance to realize the potential science from novel scientific apparatus. Programming language usability and interoperability are omni-disciplinary issues affecting today's scientific research community. As a result, a common approach across many scientific fields research is for scientists to program in Python, while steering kernels written in C++. This C++ as a service (CaaS) project brings a novel interpretative technology to science researchers through a state-of-the-art C++ execution environment. CaaS will enable both beginners and experts in C++. It enables higher-productivity in development and extends the interactive education and training platform for programming languages. CaaS will enable existing technologies as well as truly new development and analysis approaches. CaaS will directly support grow cyber-capabilities that advance scientific research across a broad range of pursuits. |
Poster | Slides | |
Ashok Srinivasan | University of West Florida | Cyberinfrastructure for Pedestrian Dynamics-Based Analysis of Infection Propagation Through Air Travel Award #: 1931511 |
Abstract
Pedestrian dynamics provides mathematical models that can accurately simulate the movement of individuals in a crowd. These models allow scientists to understand how different policies, such as boarding procedures on planes, can prevent, or make worse, the transmission of infections. This project seeks to develop a novel software that will provide a variety of pedestrian dynamics models, infection spread models, as well as data so that scientists can analyze the effect of different mechanisms on the spread of directly transmitted diseases in crowded areas. The initial focus of this project is on air travel. However, the software can be extended to a broader scope of applications in movement analysis and epidemiology, such as in theme parks and sports venues. |
Poster | Slides | |
Shantenu Jha | Rutgers University | RADICAL-Cybertools: Middleware Building Blocks for NSF's Cyberinfrastructure Ecosystem Award #: 1931512 |
Abstract
RADICAL-Cybertools embodies the building block approach to middleware. It builds upon a prior prototype investment, which developed a pilot system for leadership-class HPC machines, and a Python implementation of SAGA, a distributed computing standard. The current effort is organized around three activities: (i) Extending RCT functionality to reliably support a range of novel applications at scale; (ii) Enhancing RCT to be ready to support new NSF systems, such as the Frontera supercomputing system and other new systems; (iii) Prototyping a new component: a campaign manager for computational resource management. |
Poster | Slides | |
Shantenu Jha | Rutgers University | S2I2: Impl: The Molecular Sciences Software Institute Award #: 1547580 |
Abstract
The Molecular Sciences Software Institute serves as a nexus for science, education, and cooperation serving the worldwide community of computational molecular scientists -- a broad field including of biomolecular simulation, quantum chemistry, and materials science. |
Poster | Slides | |
Amneet Pal Singh Bhalla | San Diego State University | Collaborative Research: Frameworks: Multiphase Fluid-Structure Interaction Software Infrastructure to Enable Applications in Medicine, Biology, and Engineering Award #: 1931368 |
Abstract
This project aims to enhance the IBAMR computer modeling and simulation infrastructure that provides advanced implementations of the immersed boundary method and its extensions with support for adaptive mesh refinement. Most current IBAMR models assume that the properties of the fluid are uniform, but many physical systems involve multiphase fluid models with inhomogeneous properties, such as air-water interfaces or the complex fluid environments of biological systems. This project aims to extend recently developed support in IBAMR for treating multiphase flows and enhance the modeling capability to treat multiphase polymeric fluid flows, which are commonly encountered in biological systems, and to treat reacting flows with complex chemistry, which are relevant to models of combustion, astrophysics, and additive manufacturing using stereolithography. This project also aims to re-engineer IBAMR for massive parallelism, so that it may effectively use very large computational resources in service of applications that require very high fidelity. |
Poster | Slides | |
Neil Heffernan | Worcester Polytechnic Institute | Collaborative Research: Frameworks: Cyber Infrastructure for Shared Algorithmic and Experimental Research in Online Learning Award #: 1931523 |
Abstract
Research on Adaptive Intelligent Learning for K-12 and MOOCs (RAILKaM) cyber infrastructure will enable 20 researchers to run large-scale field experiments on basic principles in the educational contexts of K-12 mathematics learning and university Massive Online Open Courses (MOOCs). RAILKaM will integrate ASSISTments, an online learning platform used by more than 100,000 K-12 students, with MOOCs offered by the University of Pennsylvania and used by hundreds of thousands of learners each year, in order to enable broader populations, more robust student interactions, and more bountiful data collection than currently feasible in either environment alone. RAILKaM will also support 75 data scientists by supplying carefully redacted datasets that protect student privacy. In facilitating 1) high-power, replicable experiments with diverse student populations and 2) extensive measurement, RAILKaM will increase the efficiency and ease of conducting quality educational research in online learning environments, bringing research methods and long-term learning outcomes to 21st-century classrooms. |
Poster | Slides | |
Mahmut Kandemir | Penn State University | Frameworks: Re-Engineering Galaxy for Performance, Scalability and Energy Efficiency Award #: 1931531 |
Abstract
Galaxy is an open source, web-based framework that is extensively used by more than 20,000 researchers world-wide for conducting research in many areas such as genomics, molecular dynamics, chemistry, drug discovery, and natural language processing. It provides a web-based environment using which scientists perform various computational analyses on their data, exchange results from these analyses, explore new research concepts, facilitate student training, and preserve their results for future use. Galaxy currently runs on a large variety of high-performance computing (HPC) platforms including local clusters, supercomputers in national labs, public datacenters and Cloud. Unfortunately, while most of these systems supplement conventional CPUs with significant accelerator capabilities (in the form of Graphical Processing Units (GPUs) and/or Field-Programmable Gate Arrays (FPGAs)), the current Galaxy implementation does not take advantage of these powerful accelerators. This is unfortunate because many Galaxy applications (e.g., sequence analysis, metabolomics, and metagenomics) are inherently parallelizable and can benefit from significant latency and throughput improvements when mapped to GPUs and FPGAs. |
Poster | Slides | |
Dhabaleswar K (DK) Panda | Ohio State University | Collaborative Research: Frameworks: Designing Next-Generation MPI Libraries for Emerging Dense GPU Systems Award #: 1931537 |
Abstract
Modern HPC platforms are using multiple CPU, GPUs and high-performance interconnects per node. Unfortunately, state-of-the-art production quality implementations of the popular Message Passing Interface (MPI) programming model do not have the appropriate support to deliver the best performance and scalability for applications (HPC and DL) on such dense GPU systems. The project involves a synergistic and comprehensive research plan, involving computer scientists from OSU and OSC and computational scientists from TACC, SDSC and UCSD. The proposed innovations include: 1) Designing high-performance and scalable communication operations that fully utilize multiple network adapters and advanced in-network computing features for GPU and CPU; 2) Designing novel datatype processing and unified memory management; 3) Designing CUDA-aware I/O; 4) Designing support for containerized environments; and 5) Carrying out integrated evaluation with a set of driving applications. Initial results from this project using the MVAPICH2 MPI library will be presented. |
Poster | Slides | |
Rafal Angryk | Georgia State University | Elements: Comprehensive Time Series Data Analytics for the Prediction of Solar Flares and Eruptions Award #: 1931555 |
Abstract
We report on progress made by our interdisciplinary Data Mining Lab at Georgia State University on this recently funded (October 1, 2019) project. We present brief overview of our project and focus on the first two phases of our research: (1) Data & Metadata Acquisition, and (2) Generation of Data Sets for Benchmarking. |
Poster | Slides | |
Andreas Kloeckner | University of Illinois at Urbana-Champaign | Elements: Transformation-Based High-Performance Computing in Dynamic Languages (also: SHF-1911019: SHF: Small: Collaborative Research: Transform-to-perform: languages, algorithms, and solvers for nonlocal operators --- represented by Rob Kirby) Award #: 1931577 |
Abstract
|
Poster | Slides | |
Hendrik Heinz | University of Colorado at Boulder | Collaborative Research: Frameworks: Cyberloop for Accelerated Bionanomaterials Design Award #: 1931587 |
Abstract
This project aims at building a sustainable computational infrastructure for all-atom simulations of compounds and multiphase materials across the periodic table in high accuracy up to the 1000 nm scale. Cyberloop consolidates previously disconnected platforms for soft matter and solid state simulations (IFF, OpenKIM, and CHARMM-GUI) into a single unified framework. The new integrated infrastructure will enable users to set up complex bionanomaterial configurations, select reliable force fields, generate input scripts for popular simulation platforms, and assess the uncertainty in the results. Innovations include automated charge assignment protocols and file conversions, expansion of the Interface force field (IFF) and surface model databases, extension of the Open Knowledgebase of Interatomic Models (OpenKIM) to bonded force fields and AI-based force field selection tools, and development of new Nanomaterial Builder and Bionano Builder modules in CHARMM-GUI. Cyberloop supports the discovery of the next generation of therapeutics, materials for energy conversion, and ultrastrong composites, and trains an interdisciplinary, diverse, and cyber-savvy workforce. |
Poster | Slides | |
Sameer Shende | University of Oregon | CSSI: Elements: First Workshop on NSF and DOE High Performance Computing Tools Award #: 1939486 |
Abstract
High Performance Computing (HPC) software has become increasingly complex to |
Poster | Slides | |
Sameer Shende | University of Oregon | SI2-SSI: Collaborative Research: A Software Infrastructure for MPI Performance Engineering: Integrating MVAPICH and TAU via the MPI Tools Interface Award #: 1450471 |
Abstract
This project creates an MPI programming infrastructure that can integrate performance analysis capabilities more directly, through the MPI Tools (MPI_T) Information Interface, monitor Performance metrics during run time, and deliver greater optimization opportunities for scientific applications. It integrates MVAPICH2 and the TAU Performance System using the MPI_T interface. MVAPICH2 exports performance variables (PVARs) and exposes key control variables (CVARs) to TAU using the MPI_T interface. MVAPICH2 has multiple optimized designs for collective operations. Choosing the algorithm that deliver the best performance for a given application is complicated and depends on several factors like message size, size of the job, availability of advanced hardware features etc. TAU provides a plugin framework where plugins interact with MVAPICH2 and read PVARs and set CVARs to effect runtime adaptation based on performance data. |
Poster | Slides | |
Naveen Sharma | Rochester Institute of Technology | Citizenly: Empowering Communities by Democratizing Urban Data Science Award #: 1943002 |
Abstract
RIT and the City of Rochester are collaborating to address commonly present challenges in midsize cities. Democratizing data science is the notion that anyone, with little to no technical expertise, can do data science if provided the right data and user-friendly tools. The Citizenly project aims to realize this broad vision in urban context and challenges. This project will extract data from NY open data sets, readily available city’s data (e.g. building, crime, and transportation), and citizen-generated data to develop a hyper-relevant data set for the community. Citizens and city leaders, without requiring technical, will be able to create, share, and take advantage of urban data and applications for their respective communities. As part of initial work for this project, using the Citizenly approach, two urban use cases will be developed: optimal urban services resource allocation and health impact assessment of socio-economic factors. |
Poster | Slides | |
Rajiv Ramnath | Ohio State University | EAGER: Bridging the last mile; Towards an assistive cyberinfrastructure for accelerating computationally driven science Award #: 1945347 |
Abstract
With the onset of any research/project, most of the time is spent on data acquisition (published and verified data sets suitable for the study), pre-processing (noise reduction, visualizing and manipulating the data to fit our research) and tool exploration (state-of-art techniques). Some preliminary analysis along with small-scale computations is needed to compare the tools' results while adjusting relevant software parameters and modal parameters. During this entire process, one has to explore various resources (how-to guides, research papers/journals, textbooks, internet) and/or seek ad-hoc advice from colleagues, collaborators, advisors. Most of the suggestions/recommendations go undocumented unless they were implemented and thus recorded. Also, different researchers need guidance during different stages of their research progress and in various forms. This project proposes the use of artificial intelligence to build a cyberinfrastructure tool that assists by utilizing past experiences and other resources to carter to individual researcher's needs. |
Poster | Slides | |
Natalia Villanueva Rosales | University of Texas at El Paso | ELEMENTS: DATA: HDR: SWIM to a Sustainable Water Future Award #: 1835897 |
Abstract
Water sustainability is a key challenge worldwide and one of the United Nations’ seventeen Sustainable Development Goals with 40% of the global population experiencing water scarcity. The American Southwest will be impacted by more intense drought expected in the coming decades. The Sustainable Water through Integrated Modeling (SWIM) framework will advance water sustainability research capabilities by automating the integration and execution of decoupled water models, facilitating the interpretation of such models and enabling participatory reasoning processes. Convergent research in SWIM is achieved through three synergistic subprojects: 1) SWIM-SEM that focuses on formally described semantics to enhance the automated execution and understanding of data and models generated by SWIM; 2) SWIM-PM that addresses the challenges of enabling participatory analysis of the socio-economic-environmental water system through research on data- and model-based reasoning with biophysical and social models; and 3) SWIM-IT that focuses on cyberinfrastructure for engaging stakeholders, advancing research, and ensure usability, reproducibility, and sustainability of products. |
Poster | Slides | |
Kenton McHenry | University of Illinois Urbana-Champaign | Collaborative Research: CSSI: Framework: Data: Clowder Open Source Customizable Research Data Management, Plus-Plus Award #: 1835834 |
Abstract
Clowder Open Source Customizable Research Data Management, Plus-Plus |
Poster | Slides | |
Ken Koedinger | Carnegie Mellon University | CIF21 DIBBs: Building a Scalable Infrastructure for Data-Driven Discovery and Innovation in Education Award #: 1443068 |
Abstract
We aim to transform scientific discovery and innovation in education through a scalable data infrastructure that bridges across the many disciplines now contributing to learning science, discipline-based education research, and educational technology innovation (e.g., intelligent tutoring, dialogue systems, MOOCs). The data infrastructure building blocks (DIBBs) we are developing and integrating are available online at LearnSphere.org. LearnSphere spans existing educational data silos through sharing learning analytic components or DIBBS that scientists can use with or without programming. The key is a web-based workflow authoring tool called Tigris. To develop the user community we have held ten workshops reaching 100s of participants. LearnSphere has over 7000 unique user logins and 78 DIBBS components have been created in Tigris. With this critical mass of re-composable DIBBS, over 1000 workflows have been created and are being shared and used for learning R&D in academia and industry. |
Poster | Slides | |
Amit Chourasia | University of California, San Diego | CIF21 DIBBs: Ubiquitous Access to Transient Data and Preliminary Results via the SeedMe Platform Award #: 1443083 |
Abstract
SeedMeLab is a powerful data management and data sharing software suite. It enables collaboration teams to manage, share, search, visualize, and present their data using an access-controlled, branded, and customizable website that they own and control. It supports storing and viewing data in a familiar tree hierarchy but also supports formatted annotations, lightweight visualizations, and threaded comments on any file/folder. The system can be easily extended and customized to support metadata, job parameters, and other domain and project-specific contextual items. The software is open source and available as an extension to the popular Drupal content management system |
Poster | Slides | |
Zachary Ives | University of Pennsylvania | mProv: Provence-Based Data Analytics Cyberinfrastructure for High-frequency Mobile Sensor Data Award #: 1640813 |
Abstract
The mProv project develops (1) extensions to the PROV standard for capturing metadata vital to reproducibility for streaming data; (2) instrumentation to open-source components commonly used in streaming, mobile, and big data settings; (3) capabilities for reasoning about data history and quality based on provenance. |
Poster | Slides | |
Bill Tolone | University of North Carolina at Charlotte | Virtual Information-Fabric Infrastructure (VIFI) for Data-Driven Decisions from Distributed Data Award #: 1640818 |
Abstract
VIFI presents a novel infrastructure that empowers data users to discover, analyze, transform, and evaluate distributed, fragmented data without direct access to or movement of large amounts of data, enabling analyses that are otherwise impossible, infeasible, or impractical. |
Poster | Slides | |
Shawn McKee | University of Michigan | CC*DNI DIBBs: Multi-Institutional Open Storage Research InfraStructure (MI-OSiRIS) Award #: 1541335 |
Abstract
We will report on the status of the OSiRIS project (NSF Award #1541335, UM, IU, MSU and WSU) during its fifth and final year. OSiRIS is delivering a distributed Ceph storage infrastructure coupled together with software-defined networking to support multiple science domains across Michigan’s three largest research universities. The project’s goal is to provide a single scalable, distributed storage infrastructure that allows researchers at each campus to work collaboratively with other researchers across campus or across institutions. The NSF CC*DNI DIBBs program which funded OSiRIS is seeking solutions to the challenges of multi-institutional collaborations involving large amounts of data and we are exploring the creative use of Ceph and networking to address those challenges. |
Poster | Slides | |
Margo Seltzer | University of British Columbia | SI2-SSI: Collaborative Research: Bringing End-to-End Provenance to Scientists Award #: 1450277 |
Abstract
The End-to-end Provenance project has produced a collection of tools that use data provenance. Our tools make life easier for data scientists programming in R and python, they help make science more reproducible, and they improve system security. (1-slide presentation) |
Poster | Slides | |
Erkan Istanbulluoglu | University of Washington | Collaborative Research: SI2-SSI: Landlab: A Flexible, Open-Source Modeling Framework for Earth-Surface Dynamics Award #: 1450412 |
Abstract
This project catalyzes research in earth-surface dynamics by developing a software framework that enables rapid creation, refinement, and reuse of two-dimensional (2D) numerical models. The phrase earth-surface dynamics refers to a remarkably diverse group of science and engineering fields that deal with our planet's surface and near-surface environment: its processes, its management, and its responses to natural and human-made perturbations. Scientists who want to use an earth-surface model often build their own unique model from the ground up, re-coding the basic building blocks of their model rather than taking advantage of codes that have already been written. Whereas the end result may be novel software programs, many person-hours are lost rewriting existing code, and the resulting software is often idiosyncratic, poorly documented, and unable to interact with other software programs in the same scientific community and beyond, leading to lost opportunities for exploring an even wider array of scientific questions than those that can be addressed using a single model. The Landlab model framework seeks to eliminate these redundancies and lost opportunities, and simultaneously lower the bar for entry into numerical modeling, by creating a user- and developer-friendly software library that provides scientists with the fundamental building blocks needed for modeling earth-surface dynamics. The framework takes advantage of the fact that nearly all surface-dynamics models share a set of common software elements, despite the wide range of processes and scales that they encompass. Providing these elements in the context of a popular scientific programming environment, with strong user support and community engagement, contributes to accelerating progress in the diverse sciences of the earth's surface. |
Poster | Slides | |
Edward Valeev | Virginia Tech | Collaborative Research: SI2-SSI: Software Framework for Electronic Structure of Molecules and Solids Award #: 1550456 |
Abstract
The project focuses on the development of fast and accurate methods for simulation of molecules and solids, bringing under one umbrella three electronic structure codes (PySCF, BAGEL, and MPQC). Unique capabilities developed with the project's support include robust coupled-cluster methods for solids, high-end parallel coupled-cluster capabilities for molecules, and fast techniques for density functional theory in molecules and solids. |
Poster | Slides | |
Catherine Zucker | Harvard University | SI2-SSE: Collaborative Research: A Sustainable Future for the Glue Multi-Dimensional Linked Data Visualization Package Award #: 1739657 |
Abstract
glue is an open-source python-based software package that enables scientists to explore relationships within and across related datasets. Without merging any data, glue makes it easy to make multi-dimensional linked visualizations of datasets, select subsets of data interactively or programmatically in 1, 2, or 3 dimensions, and to see those selections propagate live across all open visualizations (e.g., histograms, 2 and 3-d scatter plots, images, 3-d volume renderings, etc.). While originally designed as a desktop application, we have built a new prototype interface for glue in JupyterLab, a browser-based environment that supports multi-panel interactive visualizations alongside narrative text, code, and figures. We demonstrate the functionality of glue in the JupyterLab environment, and show how it can be a powerful multi-disciplinary tool for making discoveries across many scientific disciplines. |
Poster | Slides | |
Anton Van der Ven | University of California Santa Barbara | SI2-SSE: Automated statistical mechanics for the first-principles prediction of finite temperature properties in hybrid organic-inorganic crystals Award #: 1642433 |
Abstract
The CASM software package (a Clusters Approach to Statistical Mechanics) automates first-principles statistical mechanics simulations of crystalline solids. CASM is designed to algorithmically formulate effective Hamiltonians for a wide variety of chemical, electronic and vibrational degrees of freedom within arbitrarily complex crystal structures. The CASM software then automates the tasks to parameterize the generalized effective Hamiltonians to first-principles training data. Subsequently it generates highly optimized (kinetic) Monte Carlo codes tailored for each effective Hamiltonian to enable finite temperature statistical mechanics simulations. These tools are ideally suited to study the coupling between a wide range of chemical and electronic excitations within crystalline solids at finite temperature and have found applications in multi-component alloys, battery materials, magnetic materials that are alloyed and strained and quantum materials. |
Poster | Slides | |
David Mencin | UNAVCO Inc. and University of Colorado | Collaborative Research: Framework: Data: NSCI: HDR: GeoSCIFramework: Scalable Real-Time Streaming Analytics and Machine Learning for Geoscience and Hazards Research Award #: 1835791 |
Abstract
Update on the first year of GeoSCIFramwork |
Poster | Slides | |
Andreas Kloeckner | University of Illinois at Urbana-Champaign | SHF:Small:Collaborative Research: Transform-to-perform: languages, algorithms, and solvers for nonlocal operators Award #:1911019 |
Abstract
|
Poster | Slides |
# | Name | Organization | NSF Award | Abstract | Poster | Talk |
---|---|---|---|---|---|---|
Petr Sulc | Arizona State University | Elements: Models and tools for online design and simulations of DNA and RNA nanotechnology Award #: 1931487 |
Abstract
We develop a set of tools for interactive visualization and analysis of large DNA and RNA nanostructures, simulated by a coarse-grained model of DNA and RNA. The tools allows to interactively edit and visualize simulations consisting of up to 2 million nucleotides, largest system that has ever been simulated at nucleotide level. The tools can further quantify the properties of the simulated structures, allowing experimental groups to probe in silico large numbers of different designs. |
Poster | Slides | |
Rachana Ananthakrishnan | University of Chicago | Automate: A Distributed Research Automation Platform Award #: 1835890 |
Abstract
Exponential increases in data volumes and velocities are overwhelming finite human capabilities. Continued progress in science and engineering demands that we automate a broad spectrum of currently manual research data manipulation tasks, from transfer and sharing to acquisition, publication, indexing, analysis, and inference. To address this need, which arises across essentially all scientific disciplines, this project is working with scientists in astronomy, engineering, geosciences, materials science, and neurosciences to develop and apply Globus Automate, a distributed research automation platform. Its purpose is to increase productivity and research quality across many science disciplines by allowing scientists to offload the management of a broad range of data acquisition, manipulation, and analysis tasks to a cloud-hosted distributed research automation platform. |
Poster | Slides | |
Lucy Fortson | University of Minnesota | Collaborative Research: Framework: Software: HDR: Building the Twenty-First Century Citizen Science Framework to Enable Scientific Discovery Across Disciplines Award #: 1835530 |
Abstract
For citizen science as a research framework to fulfill its promise in supporting hundreds of researchers across many disciplines in harnessing the data revolution and in enabling new science not previously possible, this grant develops new Citizen Science Cyberinfrastructure (CSCI) for (1) Combining Modes of Citizen Science - linking field-based citizen science with online analysis citizen science; (2) Smart Task Assignment – combining human and machine intelligences; and (3) exploring new data models presenting Data as Subject to volunteers. Building on the demonstrated success of substantial CI investments in citizen science, namely Zooniverse.org and CitSci.org, while leveraging Science Gateways Community Institute (SGCI) CI resources, the development of the new CSCI Framework is being driven by three science use cases in biomedicine, ecology, and astronomy; specifically, 3-D reconstructions of bioimaged cell organelles, species monitoring through identifying individual animals via non-invasive imaging, and characterizing astronomical light curves in anticipation of large upcoming surveys. |
Poster | Slides | |
Carol Hall | North Carolina State University | Element: Software: Enabling Millisecond-Scale Biomolecular Dynamics Award #: 1835838 |
Abstract
The goal of this proposed research is to develop an open software framework to enable multi-millisecond dynamic simulations of peptides and peptidomimetic polymers. This will be achieved by implementing a parallel discontinuous molecular dynamics (DMD) package, developing a suite of DMD interaction potentials, and providing tools for translating continuous atomistic models into DMD models. Although there are many coarse-grained potentials and codes available to simulate large biomolecular systems, the longest times scales that can typically be accessed are on the order of tens of microseconds, and most are unable to predict the formation of structures such as fibrils in a reasonable time frame. Our tools will allow the scientific/engineering community to study long time-scale phenomena such as biopolymer folding, aggregation, and fibril formation. The code will be tested by volunteer users and validated both by comparison with literature results, and an experimental case study on peptoid-based inhibition of antibody aggregation. |
Poster | Slides | |
Carol Hall | North Carolina State University | Element: Computational Toolkit to Discover Peptides that Self-assemble into User-selected Structures Award #: 1931430 |
Abstract
Many peptides are known to adopt beta-strand conformations and assemble spontaneously into a variety of nanostructures with applications in a variety of fields. The goal of this project is to develop an open software toolkit that enables the identification of peptide sequences that are capable of assembling into user-selected beta-sheet-based structures. Users will be able to screen potentially thousands of peptide sequences that assemble spontaneously into the structure of their choosing, and rank order their stability. Discontinuous molecular dynamics (DMD) simulation software along with the PRIME20 force field will also made available to enable analysis of the designed structures’ assembly kinetics. To establish efficacy and a basis for future improvement of computational tools, selected designs will be validated experimentally using biophysical characterization techniques and solid-state nuclear magnetic resonance (ssNMR) spectroscopy. Our software tool, “Peptide Assembly Designer”, PepAD, will be a “plugin” on the NSF-sponsored Molecular Simulation and Design Framework (MoSDeF). |
Poster | Slides | |
Gerard Lemson | Johns Hopkins University | Long Term Access to Large Scientific Data Sets: The SkyServer and Beyond Award #: 1261715 |
Abstract
SciServer is a science platform built and supported by the Institute for Data Intensive Engineering and Science at the Johns Hopkins University. SciServer extends the SkyServer system of server-side tools that introduced the astronomical community to SQL and has been serving the Sloan Digital Sky Survey catalog data to the public. SciServer uses a Docker based architecture to provide interactive and batch mode server-side analysis with scripting languages like Python and R in various environments including Jupyter (notebooks), RStudio and command-line. Users have access to private file storage as well as personal SQL database space. A flexible resource access control system allows users to share their resources with collaborators, a feature that has also been very useful in classroom environments. All these services, wrapped in a layer of REST APIs, constitute a scalable collaborative data-driven science platform that is attractive to science disciplines beyond astronomy. |
Poster | Slides | |
Geoffrey Charles Fox | Indiana University Bloomington | Middleware and High-Performance Analytics Libraries for Scalable Data Science Award #: 1443054 |
Abstract
NSF 1443054 “Middleware and High-Performance Analytics Libraries for Scalable Data Science” is a collaboration between 7 collaborating Universities at Arizona State, Indiana (lead), Kansas, Rutgers, Stony Brook, Virginia, and Utah. It addresses the intersection of HPC and Big Data computing with several different application areas or communities driving the requirements for software systems and algorithms. The base architecture includes the HPC-ABDS, High-Performance Computing Enhanced Apache Big Data Stack, and application use cases identifying key features that determine software and algorithm requirements. The middleware includes the Harp-DAAL collective communication layer, Twister2 Big Data toolkit, and RADICAL pilot jobs for batch and streaming applications. The SPIDAL Scalable Parallel Interoperable Data Analytics Library includes core machine-learning, image processing, and the application communities, Network science, Polar Science, Biomolecular Simulations, Pathology, and Spatial systems. Recent work focuses on the integration of ML with HPC in HPCafterML in biomolecular simulations and a broad study of HPCforML. |
Poster | Slides | |
Anand Padmanabhan | University of Illinois at Urbana Champaign | CIF21 DIBBs: Scalable Capabilities for Spatial Data Synthesis Award #: 1443080 |
Abstract
Spatial data often embedded with geographic references are important to numerous scientific domains (e.g., ecology, geography and spatial sciences, geosciences, and social sciences, to name just a few), and also beneficial to solving many critical societal problems (e.g., environmental and urban sustainability). In recent years, this type of data has exploded to massive size and significant complexity as increasingly sophisticated location-based sensors and devices (e.g., social networks, smartphones, and environmental sensors) are widely deployed and used. However, the tools and computational platforms available for processing synthesizing such data remain limited. Over the past couple of years, this project has helped establish CyberGIS-Jupyter as a platform for making geospatial data processing and analytics capabilities accessible. CyberGIS-Jupyter is an online geospatial computation platform for a large number of users to conduct and share scalable cyberGIS analytics via Jupyter Notebooks supported by advanced cyberinfrastructure resources such as those provisioned by the Extreme Science and Engineering Discovery Environment (XSEDE). This poster presents details of both CyberGIS-Jupyter platform in terms of both technical progress made and the enabling role it plays in enhancing cyberGIS research and education |
Poster | Slides | |
Rich Wolski | University of California, Santa Barbara | CC*DNI DIBBs: Data Analysis and Management Building Blocks for Multi-Campus Cyberinfrastructure through Cloud Federation Award #: 1541215 |
Abstract
The poster will outline the project's development and deployment of a cloud federation spanning several university campuses. Science users are able to draw cloud resources from multiple campus clouds using a single-sign-on credentialling capability. The poster will outline the structure and maintenance of the federation, discuss the technological approach, highlight the science achievements that it has enabled. |
Poster | Slides | |
Thomas A DeFanti | UC San Diego | CC*DNI DIBBs: The Pacific Research Platform Award #: 1541349 |
Abstract
The goal of the Pacific Research Platform (PRP) Cooperative Agreement is to expand the campus Science DMZ network systems model developed by the Department of Energy's ESnet into a regional DMZ model supporting data-intensive science. The PRP is enabling researchers to quickly and easily move data between collaborator labs, supercomputer centers, instruments, and data repositories, creating a big-data freeway that allows the data to traverse multiple, heterogeneous networks with minimal performance degradation. The PRP’s data-sharing architecture, with end-to-end 10--100Gb/s connections, is enabling examples of regionwide, nationwide, and worldwide virtual co-location of data with computing. |
Poster | Slides | |
Bertram Ludaescher | University of Illinois, Urbana-Champaign | CC*DNI DIBBS: Merging Science and Cyberinfrastructure Pathways: The Whole Tale Award #: 1541450 |
Abstract
Poster Title: Developing, Packaging and Sharing Reproducible Research Objects: The Whole Tale Approach. |
Poster | Slides | |
George Alter | University of Michigan | Continuous Capture of Metadata for Statistical Data Award #: 1640575 |
Abstract
The C2Metadata (“Continuous Capture of Metadata”) Project automates the documentation of data transformations performed by statistical software. Researchers in many fields use statistical software (SPSS, Stata, SAS, R, Python) for data transformation and data management as well as analysis. C2Metadata tools translate scripts used by statistical software into an independent Structured Data Transformation Language (SDTL), which serves as an intermediate language for describing data transformations. SDTL is incorporated into standard metadata formats (Data Documentation Initiative (DDI), Ecological Markup Language (EML), and JSON-LD), which are used for data discovery, codebooks, and auditing data management scripts. C2Metadata differs from most previous approaches to provenance by focusing on documenting transformations at the variable level. |
Poster | Slides | |
Bonnie Hurwitz | University of Arizona | Ocean Cloud Commons Award #: 1640775 |
Abstract
Next-generation sequencing has lead to the generation of massive genomic datasets to explore the roles and functions of microorganisms in ecosystems. Comparative metagenomic aims to explore these datasets by them to one another and measuring their similarity globally. We developed an algorithm called Libra that uses Hadoop to perform all-vs-all sequence analysis on hundreds of metagenomes to identify microbial and viral signatures linked to key biological processes. Libra performs with unparalleled accuracy compared to existing tools on both simulated and real datasets using billions of reads. Libra’s state-of-the-art algorithm and its implementation on Hadoop allow it to achieve remarkable compute times and accuracy without requiring a reduction in dataset size or simplified distance metrics. Our tool is integrated into iMicrobe (http://imicrobe.us) where users can run Libra using their CyVerse account using their own datasets or those that are integrated into the OCC. |
Poster | Slides | |
Juliana Freire | New York University | CIF21 DIBBs: EI: Vizier, Streamlined Data Curation Award #: 1640864 |
Abstract
Vizier (https://vizierdb.info) is an open-source tool for data debugging and exploration that combines the flexibility of notebooks with the easy-to-use data manipulation interface of spreadsheets. Combined with advanced provenance tracking for both data and computational steps this enables reproducibility, versioning, and streamlined data exploration. |
Poster | Slides | |
Krishna Rajan | University at Buffalo | DIBBS: EI: Data Laboratory for Materials Engineering Award #: 1640867 |
Abstract
The primary outcomes of this project are: (i) Creation of AI tools to make valuable experimental data hitherto inaccessible for analysis, available to materials scientists - these tools are domain agnostic and facilitate extraction of data from information-rich sources such as scientific charts and diagrams in academic papers; examples include automatically extracting eutectic points from phase diagrams which was used to identify potential metallic glass forming compounds. (ii) Creation of a machine learning framework for materials scientists to accelerate the discovery of advanced materials. This framework contains synergistic building blocks that enable scientists to gather, model and visualize data. An easy-to-use graphical interface has also been developed to apply different state-of-the-art machine learning models on existing materials data. Performance comparison of different models as well as descriptors in terms of predicted properties is supported by the interface along with visualization of data and results to better understand physical phenomena |
Poster | Slides | |
Shyam Dwaraknath | Lawrence Berkeley National Laboratory | The Local Spectroscopy Data Infrastructure (LSDI) Award #: 1640899 |
Abstract
The Local Spectroscopy Data Infrastructure (LSDI) Project is developing completely integrated platform for first-principles calculations of the so-called “local” environment at atomic positions within a material, which can be revealed through NMR spectroscopy and local-probe X-ray absorption spectra. The infrastructure broadly addresses the needs of the growing community of chemists and materials scientists who rely on local-environment probe spectroscopy methods, which are relevant for defective, non-stoichiometric, and nano-crystalline materials and interfaces. These classes of materials are increasingly important across a range of applications, and no standardized spectral measurements are available or catalogued to accelerate characterization, understanding and design. Our project created robust, benchmarked workflows for calculating NMR, XAS, EELS, and other spectra and developed tools to use these massive data sets to better understand the local-environments that these techniques probe. |
Poster | Slides | |
Ann Christine Catlin | Purdue University | Creating a Digital Environment for Enabling Data-Driven Science (DEEDS) Award #: 1724728 |
Abstract
The digital environment for enabling data-driven science (DEEDS) is a cyberinfrastructure for big data and high-performance computing that offers systematic, reliable, and secure support for scientific investigations end-to-end. DEEDS datasets provide a shared research environment for data acquisition, preservation, exploration and analysis, together with the integration and HPC execution of data science research tools, interactive analytics, and the capture of computing workflows for data provenance, results traceability, and reproducibility. User-friendly interfaces on the dataset dashboard are used to create and connect file repositories, multi-dimensional data tables, computational software, scientific workflows, outcomes, and analytics – offering interactive search, exploration, and visualization. Datasets are FAIR-compliant and can be published for discovery, exploration, reuse, and reinterpretation. DEEDS is effective across science domains and is being used to support collaborative research projects in electrical engineering, biological engineering, civil engineering, computational chemistry, agriculture, and health & human sciences. |
Poster | Slides | |
Haiying Shen | University of Virginia | CIF21 DIBBs: PD: Building High-Availability Data Capabilities in Data-Centric Cyberinfrastructure Award #: 1724845 |
Abstract
Both high performance computing (HPC) clusters and Hadoop clusters use file systems. A Hadoop cluster uses the Hadoop Distributed File System (HDFS) that resides on compute nodes, while an HPC cluster usually uses a remote storage system. Despite years of efforts on research and application development on HPC and Hadoop clusters, the file systems in both types of clusters still face a formidable challenge, that of achieving exascale computing capabilities. The centralized data indexing in HDFS and HPC storage architectures cannot provide high scalability and reliability, and both HDFS and HPC storage architectures have shortcomings such as single point of failure and insufficiently efficient data access. This project builds scalable high-availability data capabilities in data-centric cyberinfrastructure to overcome the shortcomings and create a highly scalable file system with new techniques for distributed load balancing, data replication and consistency maintenance. |
Poster | Slides | |
Kimberly Claffy | University of California San Diego, San Diego Sumpercomputer Center | CIF21 DIBBs: EI: Integrated Platform for Applied Network Data Analysis (PANDA) Award #: 1724853 |
Abstract
We are developing a new Platform for Applied Network Data Analysis (PANDA) that will offer researchers more accessible calibrated user-friendly tools for collecting, analyzing, querying, and interpreting measurements of the Internet ecosystem. |
Poster | Slides | |
Tevfik Kosar | University at Buffalo | CIF21 DIBBs: PD: OneDataShare: A Universal Data Sharing Building Block for Data-Intensive Applications Award #: 1724898 |
Abstract
As data has become more abundant, and data resources become more heterogeneous, accessing, sharing, and disseminating these data sets become a bigger challenge. Using simple tools to remotely logon to computers and manually transfer data sets between sites is no longer feasible. Managed file transfer (MFT) services have allowed users to do more, but these services still rely on the users providing specific details to control this process, and they suffer from shortcomings, including low transfer throughput, inflexibility, limited protocol support, and poor scalability. OneDataShare is a universal data sharing building block for data-intensive applications, with three primary goals: (1) optimization of end-to-end data transfers and reduction of the time to delivery of the data; (2) interoperation across heterogeneous data resources and on-the-fly inter-protocol translation; and (3) prediction of the data delivery time to decrease the uncertainty in real-time decision-making processes. These capabilities are being developed as a cloud-hosted service. |
Poster | Slides | |
Tevfik Kosar | University at Buffalo | EAGER: GreenDataFlow: Minimizing the Energy Footprint of Global Data Movement Award #: 1842054 |
Abstract
The annual electricity consumed by the global data movement is estimated to be more than 200 terawatt-hours at the current rate, costing more than 40 billion U.S. dollars per year. GreenDataFlow project aims to reduce the energy footprint of the global data movement by (1) analyzing the energy vs. performance tradeoffs of end-system and protocol parameters during active data transfers; (2) investigating the accurate prediction of the network device power consumption due to increased data transfer rate on the active links and dynamic readjustment of the transfer rate to balance the energy over performance ratio; and (3) exploring service level agreement (SLA) based energy-efficient transfer algorithms, which will help the service providers to minimize the energy consumption during data transfers without compromising the SLA with the customer in terms of the promised performance level, but still execute the transfers with minimal energy levels given the requirements. |
Poster | Slides | |
Saul Youssef | Boston University | CIF21 DIBBs: EI: North East Storage Exchange Award #: 1753840 |
Abstract
The Northeast Storage Exchange is an NSF/DIBBs project to create shared storage facilities for research, engineering and education projects in the Northeast. NESE is a collaboration between Boston University, Harvard University, MIT, MGHPCC, Northeastern University and UMass. With our main deployment at the MGHPCC data center, we are uniquely situated to provide cost effective high performance storage with economics that allows long term growth. |
Poster | Slides | |
Jose Fortes | University of Florida | SI2-SSE: Human- and Machine-Intelligent Software Elements for Cost-Effective Scientific Data Digitization Award #: 1535086 |
Abstract
Biodiversity information extraction (IE) from imaged text in digitized museum specimen records is a challenging task due to both the large number of labels and the complexity of the characters and information to be extracted. |
Poster | Slides | |
Upulee Kanewala | Montana State University | CRII: SHF: Toward Sustainable Software for Science - Implementing and Assessing Systematic Testing Approaches for Scientific Software Award #: 1656877 |
Abstract
Custom scientific software is widely used in science and engineering. Often such |
Poster | Slides | |
Philip A. Wilsey | University of Cincinnati | III: Small: Partitioning Big Data for the High Performance Computation of Persistent Homology Award #: 1909096 |
Abstract
Persistent Homology (PH) is computationally expensive and cannot be directly applied on more than a few thousand data points. This project aims to develop mechanisms to allow the computation of PH on large, high-dimensional data sets. The proposed method will significantly reduce the run-time and memory requirements for the computation of PH without significantly compromising accuracy of the results. This project explores techniques to map a large point cloud P to another point cloud P' with fewer total points such that the topology space characterized by P and P' is nearly equivalent. The mapping from P to P' will potentially hide some of the smaller topological features during the PH computation on P'. Restoration of accurate PH results is achieved by (i) upscaling data for the identified large topological features, and (b) partition the data to run concurrent PH computations that locate the smaller topological features. |
Poster | Slides | |
Tim Menzies | NC State University | Elements: Can Empirical SE be Adapted to Computational Science? Award #: 1931425 |
Abstract
|
Poster | Slides | |
Christopher Paciorek | University of California, Berkeley | SI2-SSI: Integrating the NIMBLE statistical algorithm platform with advanced computational tools andanalysis workflows Award #: 1550488 |
Abstract
Among other contributions, the project proposed the use of self-aware workflows to orchestrate machines and human tasks (the SELFIE model), Optical Character Recognition (OCR) ensembles and Natural Language Processing (NLP) methods to increase confidence in extracted text, named-entity recognition (NER) techniques for Darwin Core (DC) terms extraction, and a simulator for the study of these workflows with real-world data. The software has been tested and applied on large datasets from museums in the USA and Australia. |
Poster | Slides | |
Jason Leigh | University of Hawaii at Manoa | SI2-SSI: SAGEnext: Next Generation Integrated Persistent Visualization and Collaboration Services for Global Cyberinfrastructure Award #: 1441963 |
Abstract
SAGE2 - the Scalable Amplified Group Environment is the world’s most advanced software for cyber-infrastructure-enabled visualization, analysis, and distance collaboration on scalable display walls. SAGE's ease of use and affordability makes it an excellent platform on which to display a variety of related, high-resolution information in the form of visualizations, enabling collaborators to reach conclusions and make decisions with greater speed, accuracy, comprehensiveness, and confidence. The SAGE user community comprises ~4000 users located at ~800 sites in over 17 countries worldwide ranging from high schools to universities to national research laboratories. Disciplines using SAGE2 include: Archaeology, Architecture, Art, Atmospheric Science ,Biology, Chemistry, Civil Engineering, Communications, Computer Science, Education, Geoscience, Health, Library Science, Medical, Meteorology, Network Engineering, Neuroscience, Physics, Psychology, and Statistics. |
Poster | Slides | |
Volker Blum | Duke University | Collaborative Research: SI2-SSI: ELSI-Infrastructure for Scalable Electronic Structure Theory Award #: 1450280 |
Abstract
Routine applications of electronic structure theory to molecules and periodic systems need to compute the electron density from given Hamiltonian and overlap matrices. System sizes can range from few to thousands or (in some cases) millions of atoms. Different discretization schemes (basis sets) and different system geometries (finite non-periodic vs. infinite periodic boundary conditions) dictate matrices with different structure. The ELectronic Structure Infrastructure (ELSI) project provides an open-source software interface to facilitate the implementation and optimal use of high-performance solver libraries covering cubic scaling eigensolvers, linear scaling density-matrix-based algorithms, and other reduced scaling methods in between. We cover the ELSI interface software itself, solvers connected to the interface, as well as practical handling (e.g., routines for density matrix extrapolation in geometry optimization and molecular dynamics calculations and general utilities such as parallel matrix I/O and JSON output). Finally, we present benchmarks comparing different solvers, carried out using the ELSI infrastructure on massively parallel supercomputers. |
Poster | Slides | |
Volker Blum | Duke University | DMREF: Collaborative Research: HybriD3: Discovery, Design, Dissemination of Organic-Inorganic Hybrid Semiconductor Materials for Optoelectronic Applications Award #: 1729297 |
Abstract
This project, called "HybriD3", aims to accelerate the "Design, Discovery, and Dissemination" (D3) of new crystalline organic-inorganic hybrid semiconductors. This presentation will focus on the software and data related aspects of the project. We describe a the web facing data base infrastructure "MatD3" (https://github.com/HybriD3-database/MatD3 and https://arxiv.org/abs/2001.02135), a database and online presentation package for research data supporting materials discovery, design, and dissemination, developed as a generic package allowing individual research groups or projects to share materials data of any kind in a reproducible, easily accessible way. The package can be connected to the "Qresp" (“Curation and Exploration of Reproducible Scientific Papers”) software (http://www.qresp.org/), which facilitates the organization, annotation and exploration of data presented in scientific papers. We finally describe the use of this infrastructure and our broader scientific activities as reflected in the open, hybrid organic-inorganic materials database "HybriD3" (https://materials.hybrid3.duke.edu/). |
Poster | Slides | |
David Wells | University of North Carolina, Chapel Hill | SI2-SSI: Collaborative Research: Scalable Infrastructure for Enabling Multiscale and Multiphysics Applications in Fluid Dynamics, Solid Mechanics, and Fluid-Structure Interaction Award #: 1450327 |
Abstract
Many biological and biomedical systems involve the interaction of |
Poster | Slides | |
Anthony Danalis | University of Tennessee | SI2-SSI: Collaborative Proposal: Performance Application Programming Interface for Extreme-scale Environments (PAPI-EX) Award #: 1450429 |
Abstract
The PAPI team is developing PAPI support to stand up to the challenges posed by next-generation systems by (1) widening its applicability and providing robust support for newly released hardware resources; (2) extending PAPI’s support for monitoring power usage and setting power limits on GPUs; and (3) applying semantic analysis to hardware counters so that the application developer can better make sense of the ever-growing list of raw hardware performance events that can be measured during execution. The poster presents how the team is channeling the monitoring capabilities of hardware counters, power usage, software-defined events into a robust PAPI software package. |
Poster | Slides | |
Yung-Hsiang Lu | Purdue University | SI2-SSE: Analyze Visual Data from Worldwide Network Cameras Award #: 1535108 |
Abstract
Many network cameras have been deployed for a wide range of purposes. The data from these cameras can provide rich information about the natural environment and human activities. To extract valuable information from this network of cameras, complex computer programs are needed to retrieve data from the geographically distributed cameras and to analyze the data. This project creates an open source software infrastructure by solving many problems common to different types of analysis programs. By using this infrastructure, researchers can focus on scientific discovery, not writing computer programs. This project can improve efficiency and thus reduce the cost for running programs analyzing large amounts of data. This infrastructure promotes education because students can obtain an instantaneous view of the network cameras and use the visual information to understand the world. Better understanding of the world may encourage innovative solutions for many pressing issues, such as better urban planning and lower air pollution. |
Poster | Slides | |
Michael Zentner | University of California, San Diego | S2I2: Impl: The Science Gateways Community Institute (SGCI) for the Democratization and Acceleration of Science Award #: 1547611 |
Abstract
The Science Gateways Community Institute is in its fourth year of operation, and has demonstrated substantial success in terms the volume and recognized value of the services it provides. This poster will outline those services, present metrics on performance, and provide a view of future sustainability strategies of the SGCI. |
Poster | Slides | |
Kesong YANG | University of California San Diego | SI2-SSI: Collaborative Research: A Robust High-Throughput Ab Initio Computation and Analysis Software Framework for Interface Materials Science Award #: 1550404 |
Abstract
A three-year SI2-SSI project is proposed to develop a python-based open-source software framework for data-driven interface materials science. This framework will be built on the existing pymatgen, custodian and FireWorks software libraries, integrating them into a complete, user-friendly, and flexible system for high-throughput ab initio computations and analysis. This SSI will greatly expand the capabilities of this framework beyond ground state bulk electronic structure calculations, targeting developmental efforts on three key focus areas of great interest to interface materials science: (i) Ab-initio thermodynamics of surfaces and interfaces; ii) Advanced methods for materials kinetics and diffusion at materials interfaces; and iii) Automated algorithms for structural construction of grain boundary. This project has yielded more than 18 peer-reviewed research articles, including one recent article published in the most prestigious journal Energy and Environmental Science (with impact factor 33.250), which has been widely reported in multiple media and yielded broad impacts. |
Poster | Slides | |
Mark Ghiorso | OFM Research | SI2-SSI: Collaborative Research: ENKI: Software infrastructure that ENables Knowledge Integration for modeling coupled geochemical and geodynamical processes Award #: 1550482 |
Abstract
ENKI is an open source software framework designed to facilitate the construction and maintenance of thermodynamic models of naturally occurring materials. It provides the capability of accessing these models with a standardized user interface built upon Jupyter notebooks that are hosted on a cloud-based server. The ENKI API is written in Python. The ENKI platform is designed to provide a uniform access to existing thermodynamic databases. The interface provides a straightforward way of calculating and comparing the thermodynamic properties of phases, the ability to construct phase diagrams, and the ability to perform generalized equilibrium calculations. A key aspect of ENKI is the ability to formulate thermodynamic models as symbolic expressions, and to automatically generate from these expressions compatible computer code. This capability supports calibration of thermochemical models from experimental data and encourages replicable and reproducible science. |
Poster | Slides | |
Dan Katz | University of Illinois at Urbana-Champaign | Collaborative Research: SI2-SSI: Swift/E: Integrating Parallel Scripted Workflow into the Scientific Software Ecosystem Award #: 1550588 |
Abstract
Parsl is an open source parallel programming library for Python, used by both small and large projects (e.g., LSST-DESC in astronomy, ArcticDEM and EarthDEM in geoscience). Parsl augments Python with simple, scalable, and flexible constructs for encoding parallelism. Developers annotate Python functions to create apps, which represent pure Python functions or calls to external applications, whether sequential, multicore, or multi-node MPI. Parsl further allows calls to these apps, called tasks, to be connected by shared input/output data (e.g., Python objects or files) via which Parsl can construct a dynamic dependency graph of tasks. Parsl scripts can be easily moved between different execution resources: local systems, clouds, clusters, supercomputers, and Kubernetes clusters: developers define a Python-based configuration that outlines where and how to execute tasks. Parsl scripts can scale from a single core through to O(100k) nodes on one or more supercomputers. |
Poster | Slides | |
Umberto Villa | Washington University in St Louis | Collaborative Research:SI2-SSI:Integrating Data with Complex Predictive Models under Uncertainty: An Extensible Software Framework for Large-Scale Bayesian Inversion Award #: 1550593 |
Abstract
Recent years have seen a massive explosion of datasets across all areas of science, engineering, technology, medicine, and the social sciences. The central questions are: How do we optimally learn from data through the lens of models? And how do we do so taking into account uncertainty in both data and models? These questions can be mathematically framed as Bayesian inverse problems. While powerful and sophisticated approaches have been developed to tackle these problems, such methods are often challenging to implement and typically require first and second order derivatives that are not always available in existing computational models. We present an extensible software framework that overcomes this hurdle by providing unprecedented access to state-of-the-art algorithms for deterministic and Bayesian inverse problems and the ability to compute derivatives using adjoint-based methods. Our goal is to make these advanced inversion capabilities available to a broader scientific community, to provide an environment that accelerates scientific discovery. |
Poster | Slides | |
Michael Dixon | National Center for Atmospheric Research | SI2-SSI: Lidar Radar Open Software Environment (LROSE) Award #: 1550597 |
Abstract
The LROSE project aims to make high quality open source software available to users in the Lidar and Radar atmospheric sciences research community. This NSF-funded project is a collaboration between Colorado State University in Fort Collins, and the National Center for Atmospheric Research in Boulder Colorado. |
Poster | Slides | |
Rafael Ferreira da Silva | University of Southern California | Collaborative Research: SI2-SSE: WRENCH: A Simulation Workbench for Scientific Workflow Users, Developers, and Researchers Award #: 1642335 |
Abstract
WRENCH enables novel avenues for scientific workflow use, research, development, and education in the context of large-scale scientific computations and data analyses. WRENCH is an open-source library for developing simulators. WRENCH exposes several high-level simulation abstractions to provide high-level building blocks for developing custom simulators. WRENCH provides a software framework that makes it possible to simulate large-scale hypothetical scenarios quickly and accurately on a single computer, obviating the need for expensive and time-consuming trial and error experiments. WRENCH enables scientists to make quick and informed choices when executing their workflows, software developers to implement more efficient software infrastructures to support workflows, and researchers to develop novel efficient algorithms to be embedded within these software infrastructures. |
Poster | Slides | |
Andreas Goetz | University of California, San Diego | SI2-SSE: Enabling Chemical Accuracy in Computer Simulations: An Integrated Software Platform for Many-Body Molecular Dynamics Award #: 1642336 |
Abstract
We present software elements that enable computer simulations of molecular systems with unprecedented accuracy based on the many-body molecular dynamics (MB-MD) methodology. MB-MD is built upon a rigorous many-body expansion of interaction energies resulting in a fully transferable representation of potential energy surfaces that are derived entirely from correlated electronic structure data without resorting to empirical parameters. Our software includes a Python based workflow system for machine learning of many-body potential energy functions (PEFs) that integrates numerical tools for generating molecular configurations, electronic structure calculations, training set generation, PEF code generation, PEF parameter training, and PEF export for simulation codes, facilitated via centralized database storage. We also present a high-performance, vectorized and OpenMP parallel C++ code for MB-MD simulations including periodic boundary conditions. It contains an API for easy integration with simulation codes and is coupled to the open source i-PI MD driver and free energy toolkit PLUMED. |
Poster | Slides | |
Grey Ballard | Wake Forest University | High Performance Low Rank Approximation for Scalable Data Analytics Award #: 1642385 |
Abstract
With the advent of internet-scale data, the data mining and machine learning community has adopted Nonnegative Matrix Factorization (NMF) for performing numerous tasks such as topic modeling, background separation from video data, hyper-spectral imaging, web-scale clustering, and community detection. The goals of this project are to develop efficient parallel algorithms for computing nonnegative matrix and tensor factorizations (NMF and NTF) and their variants using a unified framework, and to produce a software package called Parallel Low-rank Approximation with Nonnegative Constraints (PLANC) that delivers the high performance, flexibility, and scalability necessary to tackle the ever-growing size of today's data sets. The algorithms have been generalized to NTF problems and extend the class of algorithms we can efficiently parallelize; our software framework allows end-users to use and extend our techniques. |
Poster | Slides | |
Chad Hanna | Penn State University | Hearing the signal through the static: Real-time noise reduction in the hunt for binary black holes and other gravitational wave transients Award #: 1642391 |
Abstract
We show results of a real-time classifier that uses auxiliary sensor data to characterize gravitational wave detector noise. |
Poster | Slides | |
Ritu Arora | University of Texas at Austin, Texas Advanced Computing Center | SI2-SSE: An Interactive Parallelization Tool Award #: 1642396 |
Abstract
Interactive Parallelization Tool (IPT) is a high-productivity tool that can semi-automatically parallelize certain types of serial C/C++ programs and is currently being used for teaching parallel programming to students and domain-experts. It solicits the specifications for parallelization from the users, such as, what to parallelize and where. On the basis of these specifications, IPT translates the serial programs into working parallel versions using one of the three popular parallel programming paradigms, which are: MPI, OpenMP, and CUDA. Hence, IPT can free the users from the burden of learning the low-level syntax of the different parallel programming paradigms, and any manual reengineering required for parallelizing the existing serial programs. The performance of the parallel versions generated using IPT is within 10% of the performance of the best hand-written parallel versions available to us. |
Poster | Slides | |
Cameron Smith | Rensselaer Polytechnic Institute | Fast Dynamic Load Balancing Tools for Extreme Scale Systems Award #: 1533581 |
Abstract
High performance simulations running on distributed memory, parallel systems require even work distributions with minimal communications. To efficiently maintain these distributions on systems with accelerators, the balancing and partitioning procedures must utilize the accelerator. This work presents algorithms and speedup results using OpenCL and Kokkos to accelerate critical portions of the EnGPar hypergraph-based diffusive load balancer. Focus is given to basic hypergraph traversal and selection procedures. |
Poster | Slides | |
Suresh Marru | Indiana University | Collaborative Research: SI2-SSI: Open Gateway Computing Environments Science Gateways Platform as a Service (OGCE SciGaP) Award #: 1339774 |
Abstract
TBD |
Poster | Slides | |
Shawn Douglas | UC San Francisco | SI2:SSE: Collaborative Research: Integrated Tools for DNA Nanostructure Design and Simulation Award #: 1740212 |
Abstract
DNA origami, a method for constructing nanoscale objects, relies on a long single strand of DNA to act as the 'scaffold' to template assembly of numerous short DNA oligonucleotide 'staples', which assemble into megadalton-sized nanostructures comprised of tens of thousands of DNA bases. Designing and experimentally testing a DNA origami nanostructure can take more than 2 weeks effort, and cost over $1000 per design in materials and labor. Fast simulation tools can improve efficiency in this process. We present a GPU-powered simulation tool for DNA origami that can provide a robust 3D structure prediction in a matter of minutes. |
Poster | Slides |
# | Name | Organization | NSF Award | Abstract | Poster | Talk |
---|---|---|---|---|---|---|
Edgar Solomonik | University of Illinois at Urbana-Champaign | Collaborative Research: Frameworks: Scalable Modular Software and Methods for High-Accuracy Materials and Condensed Phase Chemistry Simulation Award #: 1931258 |
Abstract
The goal of our project is to bring high-accuracy methods to state of practice for materials and condensed phase chemistry by equipping PySCF with robust periodic mean-field and wave-function methods, leveraging reduced-scaling approximations, and innovating in tensor abstractions. We describe preliminary results that include introduction of QMC methods to PySCF, a new algorithmic technique to handle group symmetry in tensors, as well as innovations to tensor decomposition and automatic differentiation methods. |
Poster | Slides | |
Philip Harris | MIT | Collaborative Research: Frameworks: Machine learning and FPGA computing for real-time applications in big-data physics experiments Award #: 1931561 |
Abstract
Machine learning and FPGA computing for real-time applications in big-data physics experiments |
Poster | Slides | |
Gianfranco Ciardo | Iowa State University | SI2-SSE: A Next-Generation Decision Diagram Library Award #: 1642397 |
Abstract
The need to store and manipulate massive data is ubiquitous. Data arising from man-made artifacts such as hardware and software may exhibit a structure amenable to compact and efficient decision diagram techniques, a prime example being BDDs for symbolic model checking. |
Poster | Slides | |
Anthony Danalis | University of Tennessee | SI2-SSE: PAPI Unifying Layer for Software-Defined Events (PULSE) Award #: 1642440 |
Abstract
The PAPI Unifying Layer for Software-defined Events (PULSE) project focuses on enabling cross-layer and integrated monitoring of whole application performance by extending PAPI with the capability to expose performance metrics from key software components found in the HPC software stack. On one end, PULSE provides a standard, well-defined and well-documented API that high-level profiling software can utilize to acquire and present to application developers performance information about the libraries used by their application. On the other end, it provides standard APIs that library and runtime writers can utilize to communicate to higher software layers information about the behavior of their software. |
Poster | Slides | |
I-Te Lu | California Institute of Technology | SI2-SSE: PERTURBO: a software for accelerated discovery of microscopic electronic processes in materials Award #: 1642443 |
Abstract
PERTURBO: a software package for electron interactions, charge transport and ultrafast dynamics |
Poster | Slides | |
P. Bryan Heidorn | University of Arizona | SI2-SSE: Visualizing Astronomy Repository Data using WorldWide Telescope Software Systems Award #: 1642446 |
Abstract
There are two main outcomes of this project. The first is a port of the WorldWide Telescope (WWT) from a standalone Windows OS VR application to a Web-based portal. The second outcome is the development of astronomy focused data management and processing on the CyVerse computational infrastructure. The WorldWide Telescope (WWT) provides a powerful data-visualization interface for data exploration and presentation. Through the open source WWT visualization software systems, this project enables the broader use of institutional and community-based, researcher-oriented astronomy data repositories and computational tools. The astronomy researcher workflow incorporates depositing data to make it discoverable through search and browsing, accessible through open access, actionable through connections to existing tools as well as community-developed tools running on CyVerse, and finally visualizing or citing data. We have added a cloud-based access to Jupyter Notebooks, R, JS9, and other tools for astronomy, Virtual Observatory compliant server and modified IPAC’s Firefly software for use by the James Web Space Telescope NIRCam. |
Poster | Slides | |
Xiaosong Li | University of Washington | SI2-SSI: Sustainable Open-Source Quantum Dynamics and Spectroscopy Software Award #: 1663636 |
Abstract
The overarching goal of the project is to develop an innovative software platform, namely Chronus Quantum (ChronusQ), that is capable of modeling any type of time-resolved multidimensional spectroscopy using quantum electronic and nuclear dynamics. ChronusQ performs quantum dynamic simulations of the same light-matter interactions that occur in time-resolved multidimensional spectroscopies directly in the time-domain. The software is unique, in that it seamlessly integrates time-dependent quantum mechanical theories, spectral analysis tools and modular high-performance numerical libraries that are highly parallelized, extensible, reusable, community-driven, and open-sourced. The ChronusQ software is well-designed and well-documented to promote reusability, composability, maintainability, and sustainability. ChronusQ will make predictions and interpretations of multi-dimensional spectral features as routine as it currently is for linear spectra, yielding a direct path to the discovery and design of molecules and materials that demonstrate new or enhanced high-order optical, magnetic, electronic, and plasmonic features. |
Poster | Slides | |
Edgar Gabriel | University of Houston | Collaborative Research: SI2-SSI: EVOLVE: Enhancing the Open MPI Software for Next Generation Architectures and Applications Award #: 1663887 |
Abstract
Open MPI is a widely used open source implementation of the Message Passing Interface specification. The goal of this project is to enhance the Open MPI software library, focusing on two aspects. First, extending Open MPI to support new features of the MPI specification, such as improving support for hybrid programming models and support for fault tolerance in MPI applications. Second, enhance the Open MPI core to support new architectures and improve scalability. This includes rework of the startup environment that will improve process launch scalability, increase support for asynchronous progress of operations, enable support for accelerators, and reduce sensitivity to system noise. The project would also enhance the support for File I/O operations as part of the Open MPI package by expanding our work on highly scalable collective I/O operations. |
Poster | Slides | |
David Tarboton | Utah State University | Collaborative Research: SI2-SSI: Cyberinfrastructure for Advancing Hydrologic Knowledge through Collaborative Integration of Data Science, Modeling and Analysis Award #: 1664061 |
Abstract
HydroShare is a domain specific data and model repository operated by the Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) to advance hydrologic science by enabling individual researchers to more easily and freely share data and models from their research. HydroShare supports Findable, Accessible, Interoperable and Reusable (FAIR) principles. It is comprised of two sets of functionalities: (1) a repository for users to share and publish data and models in a variety of formats, and (2) tools (web apps) that can act on content in HydroShare and support web-based access to compute capability. Together these move us towards a platform for collaboration and computation that integrates data storage, organization, discovery, and analysis through web applications (web apps) and that allows researchers to employ services beyond the desktop to make data storage and manipulation more reliable and scalable, while improving their ability to collaborate and reproduce results. |
Poster | Slides | |
B.S. Manjunath | University of California, Santa Barbara | SI2-SSI: LIMPID: Large-Scale IMage Processing Infrastructure Development Award #: 1664172 |
Abstract
The primary goal is to create a large scale distributed image processing infrastructure, the LIMPID, though a broad, interdisciplinary collaboration of researchers in databases, image analysis, and sciences. In order to create a resource of broad appeal, the focus will be on three types of image processing: simple detection and labelling of objects based on detection of significant features and leveraging recent advances in deep learning, semi-custom pipelines and workflows based on popular image processing tools, and finally fully customizable analysis routines. Popular image processing pipeline tools will be leveraged to allow users to create or customize existing pipeline workflows and easily test these on large-scale cloud infrastructure from their desktop or mobile devices. In addition, a core cloud-based platform will be created where custom image processing can be created, shared, modified, and executed on large-scale datasets and apply novel methods to minimize data movement. Usage test cases will be created for three specific user communities: materials science, marine science and neuroscience. |
Poster | Slides | |
Andrew Schultz | University at Buffalo | SI2-SSE: Infrastructure Enabling Broad Adoption of New Methods That Yield Orders-of-Magnitude Speedup of Molecular Simulation Averaging Award #: 1739145 |
Abstract
Mapped averaging is a recently published scheme for the reformulation of ensemble averages. The framework uses approximate results from statistical mechanical theory to derive new ensemble averages (mapped averages) that represent exactly the error in the theory. Well-conceived mapped averages can be computed by molecular simulation with remarkable precision and efficiency, and in favorable cases the speedup factors are several orders of magnitude. |
Poster | Slides | |
Andrew Connolly | University of Washington | An Ecosystem of Reusable Image Analytics Pipelines Award #: 1739419 |
Abstract
The data volumes associated with image processing in astronomy can range from small sets of images taken by individual observers to large survey telescopes generating tens of petabytes of data per year. The tools used by researchers to analyze their images are often bespoke, tailored to specific tasks or science use cases. As part of an initiative to share analysis tools across astronomy (and broader communities) we are developing a cloud-aware analysis framework (the astronomy commons). We demonstrate here an image analysis system (built to process data from the Large Synoptic Survey Telescope; LSST) that can be deployed on the cloud using Amazon's S3, RDS, Lambda, and EBS services together with HTCondor and Pegasus to manage the overall workflow. We demonstrate the scaling of this system (and associated processing costs) to the size of nightly data volumes expected from the LSST. |
Poster | Slides | |
Christina Peterson | University of Central Florida | SI2-SSE: TLDS: Transactional Lock-Free Data Structures Award #: 1740095 |
Abstract
Traditionally, non-blocking data structures provide linearizable operations, but these operations are not composable. Transactional data structures can perform a sequence of operations that appears to execute atomically, which facilitates modular design and software reuse. TLDS encompasses: 1) A scalable methodology for transforming non-blocking data structures into transactional containers; 2) A library of transactional data structures, and 3) A tool to validate their correctness. |
Poster | Slides | |
Michael Sokoloff | University of Cincinnati | Collaborative Research: SI2:SSE: Extending the Physics Reach of LHCb in Run 3 Using Machine Learning in the Real-Time Data Ingestion and Reduction System Award #: 1740102 |
Abstract
This poster describes a hybrid machine learning algorithm for finding primary vertices in proton-proton collisions produced in the LHCb detector at CERN in Run 3. A proof-of-principle has been demonstrated using a kernel density estimator that transforms sparse 3D data into a rich 1D data set that is processed by a convolutional neural network. The algorithm learns target histograms that serve as proxies for the primary vertex positions. Basic concepts are illustrated. Results to date are summarized. Plans for future work are presented. |
Poster | Slides | |
Hyowon Park | University of Illinois at Chicago | SI2-SSE: Collaborative Research: Software Framework for Strongly Correlated Materials: from DFT to DMFT Award #: 1740112 |
Abstract
Dynamical Mean Field Theory (DMFT) has been successful in computing electronic structure of strongly correlated materials specially when it is combined with density functional theory (DFT). Here, we present an open-source computational package combining DMFT with various DFT codes interfaced to a Wannier90 package for adopting maximally localized Wannier functions as local orbitals to describe a correlated subspace. Our package provides the library mode for computing a DMFT density matrix such that it can be efficiently linked to various DFT codes and achieve the charge-self-consistency within DFT+DMFT loops. We used our package for the study of well-known correlated materials, namely LaNiO3, SrVO3, and NiO to compute the density of states, the band structure, the total energy, the atomic force, and the Fermi surface within DFT+DMFT. |
Poster | Slides | |
Brian Broll | Vanderbilt University | SI2-SSE: Deep Forge: a Machine Learning Gateway for Scientific Workflow Design Award #: 1740151 |
Abstract
DeepForge is a gateway to deep learning for the scientific community. It provides an easy-to-use, yet powerful visual interface to facilitate the rapid development of deep learning models. This includes a carefully designed hybrid textual-visual programming interface to support novices as well as experts. Utilizing an extensible cloud-based infrastructure, DeepForge is designed to integrate with external compute and storage APIs to enable reuse of existing HPC resources including the SciServer from Johns Hopkins. The driving design principles are promoting reproducibility, ease of access, and enabling remote execution of machine learning pipelines. The tool currently supports TensorFlow/Keras, but its extensible architecture enables integrating additional platforms easily. |
Poster | Slides | |
Yosuke Kanai | University of North Carolina at Chapel Hill | Collaborative Research: NSCI: SI2-SSE: Time Stepping and Exchange-Correlation Modules for Massively Parallel Real-Time Time-Dependent DFT Award #: 1740204 |
Abstract
Our goal is to build, test, and broadly disseminate new software modules for the real-time time-dependent density functional theory (RT-TDDFT) component in the massively parallel open-source Qb@ll code, to mitigate two most pressing limitations: (i) Large computational cost due to small time steps needed to control the numerical error of real-time integration. (ii) Limited accuracy of the electronic structure computed by commonly used exchange-correlation approximations. We will address this through developing‚Ä®(1) New modules for improved numerical integration of the underlying non-linear partial differential equations via strong stability-preserving Runge Kutta methods.‚Ä®(2) New modules that compute the electronic structure through a modern implementation of an advanced exchange-correlation functionals. Our second objective is to disseminate these developments by building an academic-research HPC community with the Qb@ll code |
Poster | Slides | |
Brian Demsky | University of California, Irvine | SI2-SSE: C11Tester: Scaling Testing of C/C++11 Atomics to Real-World Systems Award #: 1740210 |
Abstract
We have long relied on increased raw computing power to drive technological progress. However, processors are now reaching their limits in terms of raw computing power, and continuing progress will require increased productivity in developing parallel software. Fully leveraging the performance of multi-core processors will in many cases require developers to make use of low-level "atomic" (or indivisible) operations such as those provided by the C11 and C++11 languages, so that can make very fine-grained optimizations to their code, and take full advantage of the computing power these processors offer them. Writing code using C/C++ atomics is extremely difficult to do correctly and it is very easy to introduce subtle bugs in the use of these constructs. Testing for concurrency bugs in code that uses C/C++11 atomics can be extremely difficult as a bug can depend on the schedule, the state of the processor's memory subsystem, the specific processor, and the compiler. The C11Tester project will develop tools for testing concurrent code that makes use of C/C++11 atomics and make these tools available to both researchers and practitioners. |
Poster | Slides | |
Alex Pak | University of Chicago | Highly Efficient and Scalable Software for Coarse-Grained Molecular Dynamics Award #: 1740211 |
Abstract
The Voth Group has pioneered the rigorous design and application of systematic molecular coarse-graining (CG) to study biomolecular, condensed phase, and novel materials systems. For example, we have used simulations to study protein-protein self-assembly, membrane-protein interactions, biomolecular and liquid state charge transport, complex fluids, nanoparticle self-assembly, and charge-mediated energy storage. We are currently developing a software infrastructure to make the processes underlying systematic CG modeling accessible to other researchers and the public. These models are characterized by novel and unique challenges in their parameterization and simulation. By integrating our methods into standard simulation packages, workflow environments, and creating a portal and data depository for accurate models, we aim to make these scientific tools more widely used. |
Poster | Slides | |
Stanimire Tomov | University of Tennessee, Knoxville | SI2:SSE: MAtrix, TEnsor, and Deep-Learning Optimized Routines (MATEDOR) Award #: 1740250 |
Abstract
The MAtrix, TEnsor, and Deep-learning Optimized Routines (MATEDOR) project seeks to develop software technologies and standard APIs, along with a sustainable and portable library for large-scale computations, the individual parts of which are very small matrix or tensor computations. The main target is the acceleration of science and engineering applications that fit this profile, including deep learning, data mining, astrophysics, image and signal processing, hydrodynamics, and more. Working closely with affected application communities, we have defined modular, language agnostic APIs for batched computations. We incorporated the MATEDOR developments in a high-performance numerical library for batched linear algebra computations, autotuned for modern processor architectures and system designs. MATEDOR includes LAPACK routine equivalents for small dense problems, tensors, and application-specific operations, e.g., for deep learning. Routines are constructed as much as possible out of calls to batched BLAS routines and their look-alikes required in sparse computations. The software is released through the open source MAGMA library. |
Poster | Slides | |
Ryan May | University Corporation for Atmospheric Research | SI2-SSE: MetPy - A Python GEMPAK Replacement for Meteorological Data Analysis Award #: 1740315 |
Abstract
GEMPAK is a legacy scripted weather analysis package used extensively in education and research within the meteorology community. The goal of MetPy is to provide a modern framework to replicate this scripted analysis functionality, but do so by leveraging the extensive, community-driven scientific Python ecosystem. To serve as a viable GEMPAK replacement, MetPy has grown many features, including support for cross-sections and added varied calculations. MetPy’s low-level infrastructure has also standardized on the use of xarray as a standard data model, as well as leveraging unit support to ensure correctness of calculations. MetPy has also developed a simplified plotting syntax that mimics the syntax of GEMPAK. This work discusses the details of these additions, including challenges encountered, as well as future plans for development as we wrap up the final year of this project. |
Poster | Slides | |
Serban Porumbescu | University of California Davis | Gunrock: High-Performance GPU Graph Analytics. Award #: 1740333 |
Abstract
Our goal with this award was to develop the "Gunrock" programmable, high-performance graph analytics library for programmable graphics processors (GPUs) from a working prototype to a robust, sustainable, open-source component of the GPU computing ecosystem. Our open-source initiatives are strong and we noticed significant spikes in traffic (over 1400 clones) in the two weeks following our 1.0 release alone. DARPA has adopted Gunrock as the benchmark for which its next generation parallel processor must beat. MIT's GraphIt domain specific language is generating Gunrock code. NVIDIA is in the process of incorporating Gunrock into RAPIDS, their open GPU data science initiative. We believe this work is a real success story that is a direct result of this NSF award. |
Poster | Slides | |
Shaowen Wang | University of Illinois at Urbana-Champaign | SI2-S2I2 Conceptualization: Geospatial Software Institute Award #: 1743184 |
Abstract
Many scientific and societal grand challenges (e.g., emergency management, environmental sustainability, population growth, and rapid urbanization) are inherently geospatial as articulated in a number of visionary reports such as the NSF's Ten Big Ideas and the United Nations Sustainable Development Goals. A variety of fields (e.g., environmental engineering and sciences, geosciences, and social sciences) are increasingly dependent on geospatial software to tackle these challenges. Critical and urgent efforts are also needed to prepare the next-generation workforce for computation- and/or data-intensive geospatial-related research and education, technological innovation, and real- world problem solving and decision making. In response, we have engaged diverse communities that develop and use geospatial concepts and software for conceptualizing a national Geospatial Software Institute (GSI). The mission of the GSI should be to transform geospatial software, cyberinfrastructure (CI), and data science across many fields to revolutionize diverse discovery and innovation by enhancing computational transparency and reproducibility. Its vision is a sustainable social and technical ecosystem to enable geospatial-inspired innovation and discovery. Overall, GSI is well-positioned to revolutionize many science domains while nurturing a high-performance, open, and sustainable geospatial software ecosystem across academia, government, and industry. |
Poster | Slides | |
Karthik Ram | University of California, Berkeley | SI2-S2I2 Conceptualization: Conceptualizing a US Research Software Sustainability Institute (URSSI) Award #: 1743188 |
Abstract
Many science advances have been possible thanks to use of software. This software, also known as “research software", has become essential to progress in science and engineering. The scientists who develop the software are experts in their discipline, but do not have sufficient understanding of the practices that make software development easier, and the software more robust, reliable, maintainable and sustainable. This is an unfortunate state of affairs as researchers in the UK and the US report that 90-95% rely on research software for their work. 63-70% of these researchers also believe that their work would not be possible if such software were to become unavailable. |
Poster | Slides | |
Ivan Rodero | Rutgers University | CIF21 DIBBs: EI: Virtual Data Collaboratory: A Regional Cyberinfrastructure for Collaborative Data Intensive Science Award #: 1640834 |
Abstract
The Virtual Data Collaboratory (VDC) is a federated data cyberinfrastructure that is designed to drive data-intensive, interdisciplinary and collaborative research, and enable data-driven science and engineering discoveries. VDC accomplishes this by providing seamless access to data and tools to researchers, educators, and entrepreneurs across a broad range of disciplines and scientific domains as well as institutional and geographic boundaries. In addition to enabling researchers to advance research frontiers across multiple disciplines, VDC also focuses on (1) training the next generation of scientists with deep disciplinary expertise and a high degree of competence in leveraging data, cyberinfrastructure, and tools to address research problems and (2) helping data scientists and engineers develop and apply advanced federated data management and analysis tools for high impact scientific applications. |
Poster | Slides | |
Daniel G Aliaga | Purdue University | Elements: Data: U-Cube: A Cyberinfrastructure for Unified and Ubiquitous Urban Canopy Parameterization Award #: 1835739 |
Abstract
As countries around the world rapidly urbanize and continue investing in infrastructure, the vulnerability to extreme weather continues to grow. Due to the large infrastructure and population base, cities are disproportionately affected by weather extremes as was witnessed during recent storms. Challenged by the fact that cities are complex entities, current computational models have a bottleneck in providing a robust means of generating parameter statistics that define a city's morphology. This U-cube project will utilize a novel inverse modeling approach that addresses this bottleneck by utilizing satellite images, population, elevation, road, and typology data about the various zones in a city, to infer a 3D model of a city. From this digital city, a set of urban canopy parameters (UCP) will be distilled for use in simulation models to predict meteorology, and in long run air quality, health and behavior of a city. |
Poster | Slides | |
Roland Haas | University of Illinois | SI2-SSI: Collaborative Research: Einstein Toolkit Community Integration and Data Exploration Award #: 1550514 |
Abstract
The Einstein Toolkit is a community-driven software platform of core computational tools to advance and support research in relativistic astrophysics and gravitational physics. We are developing and supporting open software for relativistic astrophysics. Our aim is to provide the core computational tools that can enable new science, broaden our community, facilitate interdisciplinary research and take advantage of emerging petascale computers and advanced cyberinfrastructure. I will report on the growth and activity in the Einstein Toolkit User community and scientific results obtained using the Toolkit software. |
Poster | Slides | |
Xiaozhu Meng | Rice University | SI2-SSI: Collaborative Research: A Sustainable Infrastructure for Performance, Security, and Correctness Tools Award #: 1450273 |
Abstract
Software has become indispensable to society. However, the properties of software systems cannot be understood without accounting for code transformations applied by optimizing compilers used to compose algorithm and data structure templates, and libraries available only in binary form. To address this need, we have been enhancing the Dyninst binary analysis and instrumentation toolkit to provide a foundation for performance, correctness, and security tools. We accelerate Dyninst to analyze large binaries using multiple threads to parse machine code and ingest symbol tables. Using Dyninst, we are building data race detection tools for OpenMP programs. In HPCToolkit performance tools, we use Dyninst to help map performance measurements back to source code, and to analyze execution traces to pinpoint, quantify and diagnose performance bottlenecks in parallel programs. |
Poster | Slides | |
Wenchang Lu | North Carolina State University | NSCI SI2-SSE: Multiscale Software for Quantum Simulations of Nanostructured Materials and Devices Award #: 1740309 |
Abstract
The development of robust, adaptive software and algorithms that can fully exploit exascale capabilities and future computing architectures is critical to designing advanced materials and devices with targeted properties. We have developed an open-source code that discretizes the DFT equations on real-space grids that are distributed over the nodes of a massively parallel system via domain decomposition. Multigrid techniques are used to dramatically accelerate |
Poster | Slides | |
Hari Subramoni | Ohio State University | SI2-SSI: FAMII: High Performance and Scalable Fabric Analysis, Monitoring and Introspection Infrastructure for HPC and Big Data Award #: 1664137 |
Abstract
As heterogeneous computing (CPUs, GPUs etc.) and , networking (NVLinks, X-Bus etc.) hardware continue to advance, it becomes increasingly essential and challenging to understand the interactions between High-Performance Computing (HPC) and Deep Learning applications/frameworks, the communication middleware they rely on, the underlying communication fabric these high-performance middlewares depend on, and the schedulers that manage HPC clusters. Such understanding will enable application developers/users, system administrators, and middleware developers to maximize the efficiency and performance of individual components that comprise a modern HPC system and solve different grand challenge problems. Moreover, determining the root cause of performance degradation is complex for the domain scientist. The scale of emerging HPC clusters further exacerbates the problem. These issues lead to the following broad challenge: How can we design a tool that enables in-depth understanding of the communication traffic on the interconnect and GPU through tight integration with the MPI runtime at scale? |
Poster | Slides | |
Matthew Turk | University of Illinois at Urbana-Champaign | Collaborative Research: SI2-SSI: Inquiry-Focused Volumetric Data Analysis Across Scientific Domains: Sustaining and Expanding the yt Community Award #: 1663914 |
Abstract
We present recent progress on our project to develop a cross-domain analysis platform. |
Slides | ||
Clare McCabe | Vanderbilt University | Collaborative Research: NSCI Framework: Software for Building a Community-Based Molecular Modeling Capability Around the Molecular Simulation Design Framework (MoSDeF) Award #: 1835874 |
Abstract
Molecular simulation plays an important role in many sub-fields of science and engineering. |
Poster | Slides | |
Douglas Thain | University of Notre Dame | DataSwarm: A User-Level Framework for Data Intensive Scientific Applications Award #: 1931348 |
Abstract
The DataSwam framework will support the construction of large, data intensive scientific applications that must run on top of national cyberinfrastructure, such as large campus clusters, NSF extreme-scale computing facilities, the Open Science Grid, and commercial clouds. Building on a prior SI2 project, DataSwam will bring several new techniques (molecular tasks composition, in-situ data management, and precision provenance) to lightweight task-execution environments. |
Poster | Slides | |
Joe Stubbs | University of Texas, Austin | Collaborative Proposal: Frameworks: Project Tapis: Next Generation Software for Distributed Research Award #: 1931439 |
Abstract
Tapis is a web-based API framework for securely managing computational workloads across institutions, so that experts can focus on their research instead of the technology needed to accomplish it. In addition to job execution and data management, Tapis is providing capabilities to enable distributed workflows, including a multi-site Security Kernel, Streaming Data APIs, and first-class support for containerized applications. |
Poster | Slides | |
Frank Timmes | Arizona State University | Collaborative Research: SI2-SSI: Modules for Experiments in Stellar Astrophysics Award #: 1663684 |
Abstract
Modules for Experiments in Stellar Astrophysics (MESA) |
Poster | Slides | |
Greg Newman, Stacy Lynn | Colorado State University | SI2-SSI: Advancing and Mobilizing Citizen Science Data through an Integrated Sustainable Cyber-Infrastructure Award #: 1550463 |
Abstract
Citizen science engages members of the public in science. It advances the progress of science by involving more people and embracing new ideas. Recent projects use software and apps to do science more efficiently. However, existing citizen science software and databases are ad hoc, non-interoperable, non-standardized, and isolated, resulting in data and software siloes that hamper scientific advancement. This project will develop new software and integrate existing software, apps, and data for citizen science - allowing expanded discovery, appraisal, exploration, visualization, analysis, and reuse of software and data. Over the three phases, the software of two platforms, CitSci.org and CyberTracker, will be integrated and new software will be built to integrate and share additional software and data. The project will: (1) broaden the inclusivity, accessibility, and reach of citizen science; (2) elevate the value and rigor of citizen science data; (3) improve interoperability, usability, scalability and sustainability of citizen science software and data; and (4) mobilize data to allow cross-disciplinary research and meta-analyses. |
Poster | Slides | |
Rion Dooley | Chapman University | The Agave Platform: An Open Science-As-A-Service Cloud Platform for Reproducible Science Award #: 1450459 |
Abstract
In today's data-driven research environment, the ability to easily and reliably access compute, storage, and derived data sources is as much a necessity as the algorithms used to make the actual discoveries. The earth is not shrinking, it is digitizing, and the ability for US researchers to stay competitive in the global research community will increasingly be determined by their ability to reduce the time from theory to discovery. The Agave Platform addresses this need by providing a Science-as-a-Service cloud platform that allows researchers to run code, manage data, collaborate meaningfully, and integrate virtually anything. In doing so, it eases the process of conducting reproducible science in today's distributed, collaborative digital labs. Agave is available to use as a publicly available, cloud-hosted PaaS, as well as a self-hosted service for internal use. CLI, SDK, and web applications are available from the website, https://agaveplatform.org/. |
Poster | Slides | |
Joe Breen | University of Utah | CIF21 DIBBs: EI: SLATE and the Mobility of CapabilityAward #: 1724821 |
Abstract
Much of science today is propelled by multi-institutional research collaborations that require computing environments that connect instrumentation, data, and computational resources. These resources are distributed among university research computing centers, national-scale high performance computing facilities, and commercial cloud service providers. The scale of the data and complexity of the science drive this diversity, and the need to aggregate resources from many sources into scalable computing systems. Services Layer At The Edge (SLATE) provides technology that simplifies connecting university and laboratory data center capabilities to the national cyberinfrastructure ecosystem and thus expands the reach of domain-specific science gateways and multi-site research platforms. |
Poster | Slides | |
Andrew Lumsdaine | University of Washington | CSSI Element: GraphPack: Unified Graph Processing with Parallel Boost Graph Library, GraphBLAS and High-Level Generic Algorithm Award #: 1716828 |
Abstract
|
Poster | Slides |
Attendees
Name | Organization | NSF Award Title | Award # | Poster | Talk |
---|---|---|---|---|---|
TBD | TBD | TBD | CSSI | View | View |
Organizing Committee
Haiying Shen (Chair)
University of Virginia
Carol Song
Purdue University
Natalia Villanueva Rosales
University of Texas at El Paso
Ritu Arora
University of Texas, Austin
Sandra Gesing
University of Notre Dame
Upulee Kanewala
Montana State University
Contact the organizers via email at CSSI-PI-Meeting2020 at googlegroups dot com.
Code of Conduct
The 2020 NSF CSSI PI Meeting is an interactive environment for listening and considering new ideas from a diverse group, with respect for all participants without regard to gender, gender identity or expression, race, color, national or ethnic origin, religion or religious belief, age, marital status, sexual orientation, disabilities, veteran status, or any other aspect of how we identify ourselves. It is the policy of the NSF CSSI PI Meeting that all participants will enjoy an environment free from all forms of discrimination, harassment, and retaliation.
- Definition of Sexual Harassment:
-
Sexual harassment refers to unwelcome sexual advances, requests for sexual favors, and other verbal or physical conduct of a sexual nature. Behavior and language that are welcome/acceptable to one person may be unwelcome/offensive to another. Consequently, individuals must use discretion to ensure that their words and actions communicate respect for others. This is especially important for those in positions of authority since individuals with lower rank or status may be reluctant to express their objections or discomfort regarding unwelcome behavior.
Sexual harassment does not refer to occasional compliments of a socially acceptable nature. It refers to behavior that is not welcome, is personally offensive, debilitates morale, and therefore, interferes with work effectiveness. The following are examples of behavior that, when unwelcome, may constitute sexual harassment: sexual flirtations, advances, or propositions; verbal comments or physical actions of a sexual nature; sexually degrading words used to describe an individual; a display of sexually suggestive objects or pictures; sexually explicit jokes; unnecessary touching. - Definition of Other Harassment:
- Harassment on the basis of any other protected characteristic is also strictly prohibited. This conduct includes, but is not limited to the following: epithets, slurs, or negative stereotyping; threatening, intimidating, or hostile acts; denigrating jokes and display or circulation of written or graphic material that denigrates or shows hostility or aversion toward an individual or group.
- Definition of Discrimination:
- Discrimination refers to bias or prejudice resulting in denial of opportunity, or unfair treatment regarding selection, promotion, or transfer. Discrimination is practiced commonly on the grounds of age, disability, ethnicity, origin, political belief, race, religion, sex, etc. factors which are irrelevant to a person's competence or suitability.
- Definition of Retaliation:
- Retaliation refers to taking some action to negatively impact another based on them reporting an act of discrimination or harassment.
- Reporting an Incident:
- Violations of this code of conduct policy should be reported immediately to the Organizing Committee Members (email: CSSI-PI-Meeting2020 at googlegroups dot com). All complaints will be treated seriously and be investigated promptly. Confidentiality will be honored to the extent permitted as long as the rights of others are not compromised. Sanctions may range from verbal warning, to ejection from the 2018 NSF CSSI PI Meeting, to the notification of appropriate authorities. Retaliation for complaints of inappropriate conduct will not be tolerated.