ESC Summer 2011 Notes

From Federation of Earth Science Information Partners

Session 4.3: ESIP Earth Science Collaboratory

  • Winter Meeting focused on collaboration
  • Quick intro to Earth Science Collaboratory
    • What is an Earth Science Collaboratory?
      • rich environment, including data, software, models, development environments
    • Situation today
      • make all of the connections between Earth Science projects
    • Architecture
      • Tools for sharing
      • Data Library
      • Workflow Library
      • Workflow Results (Lab Notebook)
      • CI Required: Social Collaboration, Security, Cloud, CMS, etc.
      • Also a Mediator component to manage between components
    • ESC beyond current capabilities
      • More tools support more datasets (identify and fix mismatches)
      • More tools to more users
      • Transparency of provenance
      • "wikihow" for data processing (watch how experts move data through the pipeline)
  • Kuo - "My Own Experience"
    • Mismatch between technologist tools and scientist needs
    • Goals: Understand processes in Earth system, advance predictive power
    • Based mostly on Remote Sensing
    • Problem of "Multiple Solutions"
      • Attempted Solution: Get more precise data
      • Non-spherical partical example
        • Needed to use both data and models to describe real world
      • "Data Fusion"
      • Parallax Correction
      • "Data Deluge" problem
        • too many data and too many kinds
        • backup, storage, organization
        • complexity of algorithms are also hard to manage
      • Massive volume gives...
        • Questionable correctness (bugs, etc.)
      • Environment will...
        • Guarantee correctness
        • Guarantee reproducibility
        • Boost productivity
  • Brian Wi - NEON - Scientific Assessments
    • Assessments done at agency level for specific issues
      • EPA water quality on nations lakes
      • water use/quality related to agriculture
    • ESC relevance?
      • Generating data alone is not enough
      • managing data and publishing is not enough
      • knowledge use in decision-making is the real measure
    • Seeks direct connections between data and ARS/USDA policies (put tools in place)
    • Flow of science to policy
      • Congress gives money to science
      • Cooperative research
      • Issue Identification
      • Policy Development
      • Program implementation
    • Expose policy makers to the workflows
      • a "sandbox" copy to play with assumptions and parameters
      • increases trust of assessment
    • "Farm bill" example
    • Bottom line - get the data and models directly to the policy makers and resource managers
  • Chris Lynnes - "Novice researcher perspective"
    • Stu needs to write a Master's thesis
    • Use Case: "Find out why MODIS aqua and Terra aerosols are anticorrelated over Tibet"
    • Stu has to bootstrap his own learning
    • Earth Science Collaboratory has Google results for use case
      • From ESC, can find related journal articles, look at research notebook, and rerun analysis
    • Decides he needs to look at level 2 data than level 3 data
      • How should he get started with complicated level 2 data
    • looks for MODIS level 2 articles and get a view of how others are using it
    • Stu can clone the workflow
    • Looks for a tool that computes coincidence
    • Finds scatter plot tool, and a HDF -> NetCDF converter
    • Circumvents a lot of tedious data tasks (leverages a lot of previous tools for the same data)
    • Stu shares level 2 correlation with the community
  • Input from group:
    • Implemenation Strategies
      • Direct Funding
        • There may not be a single funding source for this
        • Recommend ESC as a program to agencies
          • NASA, NOAA, EPA, USGS are candidates
          • Individually funded, maybe not, but together...?
          • NSF Earth Cube looks a lot like ESC (also Tim Killeen)
          • Agency members can't do as much with NSF as academics
          • How to enable interactions between academics and feds, contractors, etc.
  • Frank: 5-7 year time frame is too long.
    • Chris: Incremental process, progress will be seen each year
    • Chris: Other estimates?
  • Eric (NGDC): Parts are already pretty far along, including virtual observatories, already being done with space data, do we need to look at whose got what already?
    • Chris: Distinguish between program that pulls things together and programs that develop.
    • Chris: collaboratory needs to be open to integration with other collaborative frameworks
  • Kevin: primary qualities - develop trust, how do you share trust?
  • Chris M.: How do you incentivize participation? Lots of portals being developed, scope the collaboratory within its funding.
  • DataNet: how does data extend beyond the grant. Now: how does the software extend beyond the grant? Open source, etc.?
  • Paulo: Earth Science Domain Scenario for collaboratory. Does the infrastructure have to be Earth Science specific? Is funding more available to a more general approach (such as OCI)?
    • Chris: The stakeholders are domain specific, which is directly tied to the funding.
  • Andy Maffei: Clarify more about Earth Cube. Looking to the community to define what will be developed. Effort is to combine CI components for geosciences, build the existing components into something bigger. Upcoming webinar and website for Earth Cube. ESIP should look at tying into Earth Cube and grabbing some of its funding.
    • Chris: If Earth Cube needs direction for whats being built, ESIP can provide that direction. Need to seize Eath Cube opportunity now
  • Mark Parsons: Go back to trust, "build it and they will come" is risky. What are some success stories of collaboratories? Carol Gobel "create a workflow get a t-shirt". **How do we ensure that a critical mass is reached?
    • Chris: "Reputation scoring", points/badges as workflows and tools are commited.
  • Sky Bristol: USGS looking for same solutions. Co-chairing a ten year strategy. Bringing in more specific projects to a broader data integration framework. Collaboratory is on the same track as USGS.
    • Chris: How can we get agencies at agency level to contribute? How can we get 5 groups together - a future ESIP meeting?
  • Kuo: how to incentivise - scientists don't mind sharing as long as they get the credit. In this regard provenance will be critical.
  • Chris: contributing to ESC could be something equivalent to publications. Conference preceedings could include links to ESC
  • NASA AMES has set up NEX which is something similar to ESC
    • Chris: these examples could be used to example needs for ESC. ResearchGate is similar but stops at the citation level
  • Hook: should we build a plan from all the discussion today? Technologies exist but not in harmony that could build an ESC. Should we build a roadmap? this would likely include four areas:

1. science community (cyberinfrastructure), 2. user stories (use cases) 3. social (trust, incentive, social science issues) and 4. programatic issues

    • Chris: Cluster should focus on these four areas and identify a plan and possible speakers for winter meeting
  • Greg: it would be nice to create an inventory of existing modules and technologies - gap analysis
    • Chris: it's not only the components, but also how to properly connect components
    • Frew: the what is more important than the how. we need more discussion of how this will be used. He read this as an educational tool. ESC could also be a way to bring students up to date with large bodies of existing knowledge.
    • Chris: we need volunteers for these stories
  • Paulo: the question here is not really related to technology. is there evidence that scientists would be willing to use such a system. trust and incentives are very important issues. forget about technology and think about mode of operation
    • Chris: there is a lot of variation amongst the various domains regarding sharing. Chris believes that Earth science leans toward sharing and that some call require sharing. there needs to be incentives beyond what is available now, but there may be enough to get started now.
  • Mark Parsons: is this a U.S. based and ESIP focused project?
    • Chris: it should be started within ESIP and then expanded
  • Kuo: success depends on modeling how we collaborate now. A number of scientists share first within a closed circle and then within a larger community. ESC needs to model this in order to be successful.
    • Chris: we need a user story on how a project is shared. ESC needs to make the process easy in order to survive.
  • cyberinfrastructure - if ESC invovles running algorithms and models then there is an expectation of quality of service and availability of resources
    • Chris: this is expected to be done via cloud computing but details still need to be worked out. Cost analysis is still an outstanding issue. Ideally, part of the process could be done on desktop and parts on the cloud. This should be transparent to the user.
  • Richard: this process appears inevitable regardless of what ESIP/ESC does
    • Chris: these ideas have been discussed for sometime and there is some acceptance, but it still won't happen on its own. we need to make it happen. current funding opportunities focus on development of components, but there is little incentive to develop integration of components.
  • Hook: there are issues related to single sign on across multiple agencies. incentives must be integrated and coordinated across agencies.
  • Chris: existing social tools should be explored to see how far they can be taken. In order to proceed ESC must become a more formal cluster with telecons monthly. *Visit the wiki and join the mail list. A poll will follow to set up telecon possibilities.

Yammer, MyExperiment.org, Google+,

      • Indirect or Minimalist Funding
        • More User Stories