Chris A. Mattmann1,2,4, Duane Waliser1,2, Jinwon Kim2, Paul Ramirez1, Cameron Goodale1,
Andrew F. Hart1, Paul Loikith1, Huikyo Lee1, Michael Joyce1, Maziyar Boustani1,
Shakeh Khudikyan1, Kim Whitehall3, Jesslyn Whittell5, Paul Zimdars1, Daniel Crichton1,
Yolanda Gil4, Luca Cinquini1
1Jet Propulsion Laboratory
California Institute of Technology
Pasadena, CA 91109 USA
2UCLA Joint Institute for Regional and Earth System Science and Engineering
Young Hall, Room 4242
Los Angeles, CAåÊ 90095-7228
3Department of Physics and Astronomy
Washington DC 20059
4Computer Science Department
University of Southern California
Los Angeles, CA 90089 USA
5Electrical Engineering and Computer Science
University of California, Berkeley
387 Soda Hall
Berkeley, CA 94720-1776
Future projections of the Earth’s climate derived from climate models suggest drastic changes in parameters such as temperature for a variety of reasons, including greenhouse gases and other human-influenced parameters . Climate models are physical and heuristically based models of Earth system dynamics involving the ocean, atmosphere, land, and other domains. Climate models are traditionally developed by specific modeling groups whose expertise may reside in one or more of these domains. For instance, Institution X may specialize in the development of ocean models, whereas Institution Y is excellent at atmospheric land surface models. Climate models simulate one or more parameters that are dimensions of Earth domains, such as the atmosphere. These parameters may include temperature, solar radiation, heat flux, and so forth. For the ocean, parameters may include sea surface temperature, wind speed, or salinity.
Climate models traditionally have a temporal span and nature. They may cover a decadal time span such as 2000-2010. They may simulate a 50- or 100-year span, or they may simulateåÊ paståÊ observed environment parameters. Furthermore, climate models traditionally have spatial parameters. They may be global and cover the Earth at a uniformly spaced NxM degree grid box. Or they may be regionalåÊ and feature much smaller, more precise resolutions.
The ability of climate models to simulate parameters like temperature or wind speed relies greatly on scientists’ ability to observe and measure those values. These values are essential to providing åÊan accuragte basis to derive the mathematics and physics needed to perform these simulations and thus to produce climate model outputs. Observations typically come from different ground sources, such as stations, towers, or handheld instruments; they come from airborne platforms (helicopters, jets, etc.); and they come from spaceborne missions such as those that are flown by NASA and the National Oceanic and Atmospheric Administration (NOAA). . The data acquired through such sources is considered remotely sensed or remote sensing data.
The various dimensions of climate models and remote sensing data are highlighted in Figure 1.
2. Accessing, Analyzing and Making Use of Climate Information: Why So Difficult?
As previously stated, climate models and observations are generated from a variety of sources and by a variety of institutions, including governmental agencies like NASA, NOAA, and the U.S. Environmental Protection Agency. Because of the geographically distributed nature of these institutions and the heterogeneity of the sources of climate information, bringing together climate models outputs and remote-sensing observations to compare measurements is quite difficult.
Consider a climate model simulating sea surface salinity. There is a NASA remote sensing mission, Aquarius, which deploys an instrument to observe this parameter from space. The remote sensing data may be stored in the Jet Propulsion Laboratory’s Physical Oceanography Distributed Active Archive CenteråÊ and may be available via FTP as a hierarchical data format (HDF) 5 file , with associated HDF-EOS metadata (or ÛÏdata about the dataÛ) describing where the data was captured (its temporal and spatial bounds) and information about the mission. The climate model output may be available through the HTTP/REST OPeNDAP protocol from the Earth System Grid Federation  and one of its many replicated nodes throughout the U.S. and the world as a NetCDF formatted file  with climate forecast conventions (CF) metadata .
Note the variations in data file format (HDF versus NetCDF), metadata (HDF-EOS versus CF), protocol (FTP versus OPeNDAP), not to mention other differences that would have to be mitigated to effectively compare the same measured parameter. For example, the climate model output for sea surface salinity may have a temporal range of 2000-2010, but the NASA remote sensing data may only begin in 2008. Also, the remote sensing data may only have discrete values for the times the instrument takes data (e.g., 1 a.m. and 1 p.m.). Spatially, the climate model may produce global estimates of sea surface salinity whereas the NASA instrument may only take data in a swath geometric pattern with non-uniform grid cells.
Various statistical means are available to mitigate these differences. For example, one could compute an interpolation, or average, of the discrete time measurements to allow interrogation of the remote sensing data at any time (as the climate model output provides). One could also spatially interpolate the remote sensing data into uniform grid cells like the climate model output. These strategies are both computationally intensive as well as data intensive considering the large amount of information (decades’ worth) and precise spatial resolution of measured or simulated value of sea surface salinity.
Once the values from the model output and remote sensing data are comparable, one may then compute a distance measurement or metric that allows their comparison, such as bias or root mean squared error (RMSE). Or one may compute a probability distribution function (PDF). These are also data and computationally intensive operations.
Finally, with a computed distance metric between the model output and remote sensing observation, one can visualize the difference using toolkits such as the NCAR NCL command language and visualization package, Matplotlib from Python, Matlab or R to demonstrate this variance. These visualized comparisons are typically fed to decision-makers to inform climate policy.
3. The Regional Climate Model Evaluation System
The Regional Climate Model Evaluation System (RCMES) , a collaboration between NASA JPL and the Joint Institute for Regional and Earth System Science and Engineering at the University of California-Los Angeles, provides the necessary software and architecture to easily and rapidly perform model evaluation activities. The RCMES architecture is highlighted in Figure 2.
RCMES provides two modular components that allow users to perform model evaluations using remote sensing data from NASA and other agencies. The first component, the Regional Climate Model Evaluation Database (RCMED) , shown in Figure 2, decimates the heterogeneity of incoming remote sensing data, using the Apache Tika  and Apache OODT  frameworks. Extractors parse the remote sensing data files in HDF5 and other formats, such as Grib, and can perform operations such as transforming and reprojecting grids and renaming variables . The extracted information (latitude, longitude, time, value, height) forms a tuple in which height is optional. The tuple is stored in cloud data stores including PostGIS, Hadoop/HIVE, MongoDB and traditional MySQL approaches.
RCMED makes its data available via a spatio-temporal web service (labeled in Figure 2 as ÛÏURLÛ) to the Regional Climate Model Evaluation Toolkit (RCMET). RCMET is a distributable, detached analysis system that allows a user to bring his or her own model output files; or to obtain them from various provider networks like the Earth System Grid Federation), the international ExArch network and the NASA Distributed Active Archive Centers. Remote sensing data from RCMED and the model output are regridded and made uniform on either the model output grid or the remote sensing data grid. The data are then temporally regridded in hourly, daily, or monthly fashion, and then available for metrics computation. RCMET currently provides metrics to compute various statistics including bias, root mean squared error, and probability distribution functions as well as user-defined metrics. After metrics computation, the metrics can be visualized and plotted, as simple difference plots (model, obs, and then metric) or as more sophisticated plots like Taylor or portrait diagrams. The RCMET pipeline is shown in the detail on the right of Figure 2.
4. Application to CORDEX and Future Work
RCMES is currently being applied and evaluated in a number of domains, including the international Coordinated Regional Downscaling Experiment , and the U.S. National Climate Assessment (NCA)  through the North American Regional Climate Change Assessment Program (NARCCP). NARCCAP is the U.S.-based contribution to Coordinated Regional Climate Downscaling Experiment, or CORDEX, a broader international framework targeted at regional downscaling to provide precise, decision-ready model evaluations that focus on Africa, East and South Asia, Australia, and the Arctic. RCMES is an enabling tool in both NCA and CORDEX and can provide a framework for broader metrics, visualizations, regridding and data storage approaches to be developed.
While the early experience with RCMES has proven quite positive, we plan to continue to develop and explore future extensions to the system, including new dataset and study-specific metrics focused on regional downscaling (as opposed to traditional statistical techniques); the connection of the RCMET output visualizations into the Geographic Information System (GIS) domain, including watershed mapping and information overlay; and the inclusion of future cloud data warehouse technologies like Apache Spark and the Shark SQL. In addition, we are now basing RCMES on the Apache Open Climate WorkbenchåÊ incubating technology to provide a framework for others to contribute to the core software for RCMES and to easily incorporate RCMES into their own applications. We expect these extensions and improvements will make RCMES a necessary tool and key capability in regional climate modeling and in incorporating remote sensing data in model evaluations for years to come.
Support provided by NASA’s Earth Sciences Division, NASA NCA (ID: 11-NCA11-0028), AIST (ID: AIST-QRS-12-0002) and the Applied Sciences Program via the American Recovery and Reinvestment Act (, and the National Science Foundation ExArch program (ID: 1125798), a component of the G8 initiative. Valuable contributions to the RCMES activity by way of collaboration comes from the WCRP Coordinated Regional Climate Downscaling Experiment, the North American Regional Climate Change Assessment Program , the Climate and Development Knowledge Network, the University of Cape Town, and PCMDI/DOE through support of the obs4MIPs activity.
1. B. Fortner, ÛÏHDF: The Hierarchical Data Format,Û Dr. Dobb’s J. Software Tools and Professional Programming, 1998;
2. R.K. Rew and G.P. Davis, ÛÏNetCDF: An Interface for Scientific Data Access,Û IEEE Computer Graphics and Applications, vol. 10, no. 4, 1990, pp. 76ÛÒ82.
3. R.K. Rew et al., ÛÏThe CF Conventions: Governance and Community Issues in Establishing Standards for Representing Climate, Forecast, and Observational Data,Û slide presentation, Am. Geophysical Union Fall Meeting, abstract #IN52A-07, 2007.
4. F. Giorgi, C. Jones, and G. Asrar, ÛÏAddressing Climate Information Needs at the Regional Level: The CORDEX Framework,Û WMO Bulletin, vol. 58, no. 3, 2009, pp. 175ÛÒ183.
5. C. Mattmann, A. Braverman, and D. Crichton, ÛÏUnderstanding Architectural Tradeoffs Necessary to Increase Climate Model Intercomparison Efficiency,Û ACM SIGSOFT Software Eng. Notes, vol. 35, no. 3, 2010, pp. 1ÛÒ6.
6. C. Mattmann et al., ÛÏA Software Architecture-Based Framework for Highly Distributed and Data Intensive Scientific Applications,Û Proc. Int’l Conf. Software Eng. (ICSE 06), IEEE CS, 2006, pp. 721ÛÒ730.
7. C. Mattmann, D. Waliser, J. Kim, C. Goodale, A. Hart, P. Ramirez, D. Crichton, P. Zimdars, M. Boustani, H. Lee, P. Loikith, K. Whitehall, C. Jack, B. Hewitson. Cloud Computing and Virtualization within the Regional Climate Model and Evaluation System. Earth Science Informatics, accepted, July 2013. http://link.springer.com/article/10.1007%2Fs12145-013-0126-2
8. J. Kim, D. Waliser, C. Mattmann, L. Mearns, C. Goodale, A. Hart, D. Crichton, S. McGinnis, M. Boustani, H. Lee, P. C. Loikith and M. Boustani. Evaluation of the surface climatology over the conterminous United States in the North American Regional Climate Change Assessment Program hindmost experiment using regional climate model evaluation system. Journal of Climate, Volume 26 Issue 15 (August 2013). http://dx.doi.org/10.1175/JCLI-D-12-00452.1
9. J. Kim, D. E. Waliser, C. Mattmann, C. Goodale, A. Hart, P. Zimdars, D. Crichton, C. Jones, G. Nikulin, B. Hewitson, C. Jack, C. Lennard and A. Favre. Evaluation of the CORDEX- Africa multi-RCM Hindcast: Systematic Model Errors. Climate Dynamics, pp. 1-14, April 2013.
10. L. Cinquini, D. Crichton, C. Mattmann, J. Harney, G. Shipman, F. Wang, R. Ananthakrishnand, N. Millerd, S. Denvil, M. Morgan, Z. Pobre, G. M.. Bell, C. Doutriaux, R. Drach, D. Williams, P. Kershaw, S. Pascoe, E. Gonzalez, S. Fiore, R. Schweitzer. The Earth System Grid Federation: An Open Infrastructure for Access to Distributed Geospatial Data. Future Generation Computer Systems – Special Issue on Best Papers of eScience 2012, Available on- line 17 September 2013, ISSN 0167-739X. http://www.sciencedirect.com/science/article/pii/S0167739X13001477
11. K. Whitehall, C. Mattmann, D. Waliser, J. Kim, C. Goodale, A. Hart, P. Ramirez, P. Zimdars, D. Crichton, G. Jenkins, C. Jones, G. Asrar, B. Hewitson. Building Model Evaluation and Decision Support Capacity for CORDEX. WMO Bulletin, Vol. 61, No. 2, pp. 29-34, 2012. http://www.wmo.int/pages/publications/bulletin_en/61_2_cordex_en.html
12.åÊ C. Mattmann and J. Zitting. Tika in Action. Manning Publications, November 2011.