Data.gov Concept of Operations DGCONOPS 48dad93f-2f5b-47b8-aba9-9968cd05a86f To lay out the overall strategic intent, operational overview (“as is”), future conceptual architecture (“to be”), and agency next steps [for Data.gov] Focus on Access Data.gov is designed to increase access to government data as close to the authoritative source as possible. The goal is to strengthen our democratic institutions through a transparent, collaborative and participatory platform while fostering development of innovative applications (e.g. visualizations, mash‐ups) and analysis by third parties. Policy analysts, researchers, application developers, non‐profit organizations, entrepreneurs and the general public should have numerous resources for accessing, understanding and using the vast array of government datasets. Open Platform Data.gov will use a modular architecture with application programming interfaces (API) to facilitate shared services for agencies and enable the development of third party tools. The architecture, APIs, and services will evolve based on public and agency input. Disaggregation of Data Data should be disaggregated from agency reports, tools or visualizations to enable direct access to the underlying data. Grow and Improve through User Feedback Feedback should be used to identify and characterize high value data sets, set priorities for integration of new and existing data sets and agency provided applications, and drive priorities and plans to improve the usability of disseminated data and applications. Program Responsibility Agency program executives and data stewards are responsible for ensuring information quality, providing context and meaning for data, protecting privacy and assuring information security. Agencies are also responsible for establishing effective data and information management, dissemination, and sharing policies, processes and activities consistent with Federal polices and guidelines. Rapid Integration Agencies should rapidly integrate current and new data into Data.gov with sufficient documentation to allow the public to determine fitness for use in the targeted context. Embrace, Scale and Drive Best Practices Data.gov will implement, enhance and propagate best practices for data and information management, sharing and dissemination across agencies, with our state, local and tribal partners as well as internationally. 1 Federal Data Allow members of the public to leverage Federal data for robust discovery of information, knowledge and innovation. General Public The general public can use the platform to download datasets. The general public can also discover and access Federal data via third‐party visualizations, applications, tools or data infrastructure. Application Developers Application developers can develop and deliver applications by leveraging the raw data, APIs or other methods of data delivery. Government Mission Owners Mission owners can expand access to and leverage data from their public sector partners to enhance service delivery, drive performance outcomes and effectively manage government resources. Data Infrastructure Developers Data infrastructure developers can increase the utility of Data.gov by enhancing its search capability, metadata catalog processes, data interoperability and ongoing evolution. Research Community The research community can help unlock the value of multiple datasets by providing insight on a plethora of research topics. Data Infrastructure Innovators Existing entities and new ventures developing innovative data and application offerings that combine public sector data with their own data. 1.1 Raw Data Catalog Incorporate a "raw" data catalog Features instant view/download of platform‐independent, machine readable data in a variety of formats. Note: The term “raw data” is used within the Data.gov context to mean data that are in a format that allows manipulation and are disaggregated to the lowest level consistent with maintaining privacy, confidentiality, and national security. f110bbc0-878c-4278-927a-1b9251231a51 b8386d0c-4839-4098-889f-a09c9a9b1605 1.2 Tool Catalog Incorporate a tool catalog Provides the public with simple, application‐driven access to Federal data with hyperlinks. This catalog features widgets and data‐mining and extraction tools, applications, and other services. 8a07a368-168d-4457-8482-5414e73f8e48 8523584c-4619-4088-bf26-96a041a030b1 1.3 Geodata Catalog Incorporate a geodata catalog Includes trusted, authoritative, Federal geospatial data. This catalog includes datasets and tools. This catalog employs a separate search mechanism. 3085d38a-29c9-4dc7-af12-f08f0e0ae424 e873392a-feb3-40c6-a5bc-2b0f260301e9 Making Federal data more transparent has many benefits including the potential to maximize the return on investments in collecting and managing the data themselves by transcending agency stovepipes, encouraging data to be disseminated in reusable and interoperable formats, and facilitating enhanced search abilities. As was the case for the Human Genome project, releasing datasets beyond the walls of government allows for expanded public access, facilitating creativity and ingenuity. Understanding the potential value of Data.gov rests with considering the nature and quantity of the Federal data themselves. For example, Performance and Accountability Reports5 (PAR) are currently published by agencies in a document‐centric report. While PAR is of value to students of government performance, the reports are not standardized and for the most part the underlying data is programmatically inaccessible – making it difficult and effort intensive to do additional analysis on the provided information, much less look at cross‐agency trends and performance. In the future the standard reports, such as the PAR, could separate and publish via Data.gov the underlying data. This vision of unbundling the finished report from the underlying data, and potentially augmenting or replacing the traditional document‐centric report with data visualizations or web applications, can be extended across many other classes of Government reports. Further, many opportunities exist for adding value such as exploring more timely release of in‐process data assets, rather than accumulating, processing, and disseminating data on longer, agency‐centric timelines. In particular, more timely release of data would support more timely, third‐party analysis and have the potential to empower more proactive public‐initiated dialog. 2 Delivery Channel [Provide] a delivery channel to enable agencies to make their data more accessible, discoverable, comprehensible, and usable. Agency Administrators Agency administrators, in support of enterprise transparency, should direct their program offices and CIO to jointly coordinate and support Data.gov requirements. Data Stewards Agencies are encouraged to vet Data.gov requirements through a Data Stewards’ Advisory Group or equivalent internal organization, whose participants would represent each of the agency’s program areas. This will have the effect of empowering individuals agency‐wide who are most familiar with potential datasets that could be made ready for public dissemination. * The data steward has the responsibility for documenting the agency’s data using the Data.gov metadata template and the POC should help coordinate the exposure of this metadata to Data.gov in one of the several ways detailed in this Concept of Operations document. * The data steward is responsible for ensuring that the data is compliant with information quality guidelines and other applicable information dissemination requirements, that the corresponding metadata are compliant with the Data.gov requirements and are complete, and that the data are available online through the agency’s website. Agency Program Offices Agency program offices are the source of the data that are posted to Data.gov. * Program offices are responsible for determining which data and tools are suitable to be posted on Data.gov, being mindful of the significance of exposing data through Data.gov in terms of the authoritative and high quality nature of data being included in a high profile Presidential Initiative. * Program offices retain the right and responsibility for managing their own data and providing adequate technical documentation. This role tends to be carried out by program offices within the context of their particular missions. The term ‘data steward’ is used to refer to the agency staff that is directly responsible for managing a particular dataset. -- Agency program offices are responsible for ensuring that the data stewards for a particular data asset complete the required metadata for each dataset or tool to be publicized via Data.gov. -- Agency program offices should facilitate Data.gov POCs and CIO efforts to understand and catalog data assets, as indicated below. -- Agency program offices, in conjunction with Data.gov POCs and CIOs, are responsible for ensuring that their data assets are consistent with their statutory responsibilities within the context of information dissemination, including those related to information quality, security, accessibility, privacy, and confidentiality. Agency CIOs Agency CIOs, in conjunction with program partners, are responsible for cataloging8 and understanding their data assets, establishing authoritative sources, and ensuring the high quality of data. Agencies are encouraged to engage their enterprise architecture programs to formally catalog their data assets, determine which sources are authoritative, and evaluate adherence to information quality guidelines. Agencies are encouraged to leverage the Federal Data Reference Model which provides agencies with assistance to: * Identify how information and data are created, maintained, accessed, and used; * Define agency data and describe relationships among mission and program performance and information resources to improve the efficiency of mission performance. * Define data and describe relationships among data elements used in the agency’s information systems and related information systems of other agencies, state and local governments, and the private sector. -- Agency CIOs and program offices have the responsibility of ensuring that authoritative data sources are made available in formats that are platform independent and machine readable. Agency enterprise architecture programs should promote the publication of web services, linked open data, and general machine readable formats such as XML. -- Agency CIOs have the responsibility for assigning an overall Data.gov “point of contact” for their agency (POC). The agency’s Data.gov point of contact (POC) is responsible for ensuring that the requested documentation accompanies all datasets posted to Data.gov. Agency POCs * The POC is responsible for training data stewards as to the importance of the metadata template that accompanies a Data.gov submission as well as how to complete this template. -- The POC is responsible for understanding Data.gov processes, Data.gov metadata requirements, and compliance requirements for coordinating data submissions for the agency. The Data.gov POC role is expected to evolve as each agency’s dissemination processes mature and the Data Management System improves (as discussed in the next section). Initially, the role and responsibilities of these POCs are as follows: * POCs are responsible for coordinating an internal (agency) process for working with the program offices to ensure the identification and evaluation of data for inclusion in Data.gov. Such a process must include: a) screening for security, privacy, accessibility, confidentiality, and other risks and sensitivities; b) adherence to the agency’s Information Quality Guidelines; c) appropriate certification and accreditation (C&A); and d) signoff by the program office responsible for the data. * The POC is also responsible for facilitating feedback to Data.gov from the data stewards regarding improvements to the metadata requirements, including recommendations that generate taxonomies to facilitate interoperability. Senior Advisory Group The Executive branch hosts many interagency efforts that focus on missions associated with data policy, information management, and enhancing the dissemination of digital data. Representatives of many of these formal communities of interest have been recruited to serve on a council of senior advisors to Data.gov, including: * The Chief Information Officers’ (CIO) Council, which includes the CIOs from all Chief Financial Officer‐Act departments and agencies * The Interagency Council on Statistical Policy (ICSP), which includes representation from 14 principal Federal statistical agencies * The Federal Geographic Data Committee (FGDC), an interagency committee that promotes the coordinated development, use, sharing, and dissemination of geospatial data on a national basis * The Commerce, Energy, NASA, and Defense Information Managers Group (CENDI), an interagency working group of senior scientific and technical information (STI) managers from 13 Federal agencies * The Interagency Working Group on Digital Data (IWGDD), coordinated by the Office of Science and Technology Policy (OSTP) * The Networking and Information Technology Research and Development (NITRD) Program, the Nation's primary source of federally funded revolutionary breakthroughs in advanced information technologies such as computing, networking, and software. Their input serves both to ensure alignment across the Executive branch with respect to modernizing information dissemination and to motivate their colleagues in their home agencies to become more active in making their data assets available to Data.gov. Specifics on the Advisory Group include: * The Advisory Group is designed to be a vehicle for cross‐government fertilization with respect to information policy and transparency opportunities raised by the Data.gov initiative as well as a mechanism for mobilizing support for and implementing the transformational goals of Data.gov. Furthermore, the Senior Advisory Group helps the Data.gov team understand and establish models, frameworks, and technology support for performance measurement and management around information quality, dissemination, and transparency. * The Advisory Group is one of several vehicles that the Office of Management and Budget (OMB) uses to obtain advice on the direction of Data.gov. OMB also receives advice from the Project Management Office at GSA, the CIO‐appointed agency POCs, representatives of related websites described in Section 4, as well as from technology and information policy leaders outside the government, and Data.gov users. * The Advisory Group is wholly comprised of senior level Executive Branch (government) employees. Participants are representatives of existing, formally chartered Executive Branch communities of interest around information policy and transparency. Additional communities of interest may be nominated for potential representation on the Council; OMB will determine the goodness of fit, and offer invitations as appropriate. * In addition to representing the perspective of their Federal community of interest, Advisory Group participants will be asked to speak from the perspective of agency data stewards. * The Advisory Group is not a decision making or voting body, and consensus will not be sought. * The Advisory Group is co‐led by OMB’s Office of Information and Regulatory Affairs and OMB’s Office of E‐government and Information Technology. The Advisory Group: * Encourages the development and implementation of a unified vision for achieving data interoperability and other efforts to modernize Executive Branch data dissemination and sharing. * Provides OMB with a forum for working interactively with senior program executives, agency data stewards, and others responsible for the generation and dissemination of data accessible through Data.gov. * Provides feedback to OMB on the potential impact of Data.gov proposals on agency and interagency data generation and dissemination programs. Subject Matter Expert Technical Working Groups Data.gov will harness the interests and expertise of staff across the Government when it stands up technical working groups designed to develop approaches to modernizing and streamlining data formats and structures to allow linking, tagging, and crawling. For instance, the Data.gov team will draw on expertise from across the Government to provide advice regarding the best approaches to publishing metadata that facilitate encoding meaning into datasets in such a way that they are directly interpretable by computers and that strengthen the interoperability of Federal datasets. Federal Communities of Interest/Information Portals Other interagency efforts, such as Science.gov and Fedstats.gov, are focused on serving distinct user communities and to make information easier to find and more useful for those communities. These Federal communities of interest often disseminate their data through information portals. Increasingly, these sites will have the opportunity to mimic the design patterns of Data.gov including the metadata template, catalog capabilities, and end user search and feedback capabilities. Federal communities of interest that offer these information portals to the public are encouraged to first standardize a metadata taxonomy or syntax to be shared with Data.gov, and then communicate any changes to it as the community evolves the standard. These communities of interest are encouraged to expose corresponding data either as downloadable data, query points, or tools. In this way, the information portals provided by these Federal communities of interest will become more standardized in how their data is maintained and shared, and these information portals will become networked to Data.gov to allow for maximum visibility, discoverability, understanding and usefulness of the data. States, Localities, and Tribes Although Data.gov does not catalog state, local, or tribal datasets, there is a shared benefit of cross‐promoting efforts to catalog and make non‐Federal data assets more transparent. State, local, and tribal governments are encouraged to leverage the thoughts, ideas, and patterns used by Data.gov to develop their own Data.gov style solutions. State, local, and tribal governments are also encouraged to inform the Data.gov team of their own implementations so that Data.gov can link to those specific sites. Furthermore, state, local, and tribal governments are encouraged to innovate new and interesting ways of cataloging, presenting, searching, and visualizing their data. State, local, and tribal governments are also encouraged to find more interactive and elegant ways of interacting directly with the public. As innovations are implemented, non‐Federal governments are encouraged to share these breakthroughs with the Data.gov team for potential use on Data.gov. International Standardization Organizations Key stakeholders in the development and continual improvement of Data.gov will include the relevant bodies that set international standards related to Data.gov’s processes, including the World Wide Web Consortium, International Standardization Organization, National Institute of Standards and Technology, or the actual standards themselves, such as Dublin Core. As Data.gov evolves, it may be necessary to build upon or add to these standards, and even make recommendations to these organizations to update their standards. Data.gov will look to the expert domain communities within the Federal government to work with these organizations and recommend adjustments to Data.gov’s processes and metadata requirements. 2.1 Data Availability Evaluate agency participation based upon the quantity of data that they make available through Data.gov Agency participation will be evaluated based upon the quantity of data that they make available through Data.gov, relative to the total amount of data that the agency has eligible for such access (i.e., the release of which does not compromise privacy, confidentiality, security or other policy concerns). Agencies should prioritize information dissemination efforts to accelerate dissemination of high value data sets in accord with mission imperatives, agency strategy, and public demand. 527d8e8d-8c7e-4e57-90bf-7e928d304948 da0a2eed-dff0-4b16-a906-a1ca73446890 2.2 Data Usage Evaluate usage based upon the number of Data.gov page views, downloads of data, API calls, success in facilitating innovation as demonstrated through the number and scope of third party applications, feedback from users and intra‐governmental collaboration on Data.gov‐related content. Usage metrics include the number of Data.gov page views, downloads of data, API calls, success in facilitating innovation as demonstrated through the number and scope of third party applications, feedback from users and intra‐governmental collaboration on Data.gov‐related content. Success in facilitating innovation could be measured through proxies including the diversity of use of Data.gov’s content and Data.gov’s propagation. Diversity of use will be ranked by the average number of datasets used by externally produced applications or tools, and the number of registered and active third‐party applications. User feedback metrics could measure the volume and sentiment of feedback, both overall and on a dataset basis. 01b044ec-d616-473a-afae-4183772a8e12 ef3616b7-62c9-4455-9dff-618dc6e35ce0 2.3 Data Usability Evaluate usability based upon how clearly and completely the strengths and weaknesses of agency data are conveyed through identification and descriptive information, and technical documentation. Usability will be measured by how clearly and completely the strengths and weaknesses of agency data are conveyed through identification and descriptive information, and technical documentation. This will be measured via completeness of the structured data provided and maintained by agencies that represent integrated datasets in the Data.gov platform. Additional metrics include: how detailed are the key words that feed the search; the degree to which semantic web approaches as described in Chapter 3 are used; and user data set scoring. 22374a73-58de-40a2-8a30-2518972236e5 9db2f74c-cf9c-4d0e-8261-c76e291a82a5 Since most agencies have information dissemination as part of their mission, Data.gov is a key component for improved mission delivery. It is a delivery channel to enable agencies to make their data more accessible, discoverable, comprehensible, and usable. As such, agencies may choose to use Data.gov as their primary means of information dissemination to the public and forego the need to maintain their current processes for publishing their data. Specifically, agencies can use Data.gov not only to store their metadata via the Data.gov metadata storage shared service, but agencies can also forego management of their own data storage infrastructure by leveraging what will become the data storage shared service described in Chapter 3. As additional data and tools are made available through Data.gov and as improvements are made to metadata and data quality, and search, discovery, and access tools, it will become an important resource to user groups, leading in turn to greater the visibility and use of data. By this logic, the benefit to participating agencies increases as additional agencies begin to participate more actively. In this manner, agencies have a vested interested in not only their own active participation, but the active participation of their peer agencies. From an individual agency perspective, one value proposition of Data.gov is that it gives high visibility to data that the agency wants to share with the public. As illustrated by an example in Figure 4, Data.gov includes revolving panels of “featured tools and datasets” that provide even more visibility and showcase high‐quality data and tools provided by the Federal government. These featured tools and datasets are rotated on a regular basis to keep the content fresh and representative of the many topical areas within the purview of the Federal government. Data.gov assists agencies with their information dissemination requirements. It also provides agencies with a new and important public feedback mechanism. As Data.gov continues to evolve, agencies will be provided with new and more robust ways to obtain feedback directly from the end users of their authoritative data. For instance, the public can provide specific narrative feedback on published data and tools. Agencies that actively participate in Data.gov not only share their data more widely, but also increase the public’s awareness of their works in key mission areas. Active participation in Data.gov increases overall visibility and can engender a greater trust and appreciation for agency missions, their roles, and their overall performance in the service of the country. Transparency of agency data provides the public with the ability ‐‐ either through government tools, third‐party web applications, or other means ‐‐ to understand their government, its impact on their lives, and hold it accountable. This transparency can also translate into the discovery and implementation of collaborative initiatives with other Federal organizations. Measuring Success -- Data.gov will measure its success based upon primary and secondary metrics. The three primary metrics are: (1) cross agency participation, (2) use of disseminated data, and (3) the usability of the data available through Data.gov. All primary metrics will be recorded over time and displayed on the performance dashboard tool discussed in Chapter 3, Future Conceptual Solution Architecture. The initial secondary performance metrics are many and are included in “Appendix A – Detailed Metrics for Measuring Success”. 3 Future Conceptual Solution Architecture 3.1 Security, Privacy and Personally Identifiable Information Ensure that Data.gov does not compromise privacy or confidentiality. The Administration is committed to ensuring that Data.gov does not compromise privacy or confidentiality. Specifically, agencies that make their data available online must ensure that the data does not include personally identifiable information or in any way compromise law and policy. In addition, the Data.gov team will enhance and extend working groups under the Senior Advisory Group to learn from third party use of Federal data with an eye towards better understanding the effect of new mash‐ups and applications on intentionally and unintentionally unmasking sensitive personally identifiable information and/or creating security‐related issues. These working groups will recommend enhancements to policy and guidance as needed. Privacy considerations extend beyond the data itself to include the way that Data.gov measures performance and gathers feedback from the site. All feedback provided through Data.gov is anonymous with no tracking or identifier information captured. Furthermore, performance statistics will be gathered as specified within this document. Performance measures will be specifically targeted at macro use statistics without any identification of specific uses of specific data by individuals or groups. 2c9a0ebe-14bb-4c8a-bf1a-09d89ac5322d dc78f825-d941-4529-9bd6-474f0af31ba4 3.2 Core Modules Include six core modules: (1) the website, (2) the DMS, (3) the metadata catalog, (4) a performance tracking and analysis engine, (5) an audit tool, and (6) a hosting service. As depicted in the following visual, Data.gov’s future architecture will include six core modules: (1) the website, (2) the DMS, (3) the metadata catalog, (4) a performance tracking and analysis engine, (5) an audit tool, and (6) a hosting service. The architecture will also utilize at least four data infrastructure tools: collaboration, feedback, agency and site performance dashboards, and search related tools. The modules and tools will be made more accessible through a collection of application programming interfaces (API) that expose metadata and data. Together, these modules, tools, and APIs will allow Data.gov to adapt to its customer base as needed. Note that many of the capabilities outlined in section 2, such as the Dataset Management System, are currently in use. Where this is the case they will be enhanced and extended. In other cases, for example the data infrastructure tools, the Data.gov team will partner with others to deliver the capability. Module 1 – The Site All citizens, technically inclined or not, can access the Data.gov website to discover structured data, otherwise known as data sets, published by the federal government and download them to their local computer. To serve up these data sets, the Data.gov website accesses a catalog of records with one record representing each data set published to it. Data.gov visualization services could be delivered through the site and could include analytics, graphics, charting, and other ways of using the data. In many cases enhanced visualizations will be delivered by the Data.gov team or others as data infrastructure tools, built on top of published APIs. These enhanced visualizations or other uses will in some cases be accessed via the Data.gov site and in others via external web sites. Another enhanced feature of Data.gov could allow customers to receive alerts on the availability of new data sets in a subject area in which they are interested. A variation of this would be alerts to developers related to changes or updates to data sets they use to power their applications. Alerting and notification as a feature could be implemented via a data infrastructure tool, or via specific features added into core modules, or both. This seems an area where Data.gov should implement a basic capability and invite experimentation and innovation to identify opportunities for greater added value – data domain specific, in general, or in some unforeseen manner. Module 2 – The Dataset Management System The Dataset Management System (DMS) was recently unveiled to facilitate agencies’ efforts to organize and maintain their Data.gov submissions via a web‐based user interface. The Data.gov DMS provides agencies a self‐service process for publishing datasets into the Data.gov catalog. The DMS is the approach of choice if an agency does not have its own metadata repository and does not have the resources to leverage the Data.gov metadata API or harvesting approaches. The DMS allows the originators to submit new datasets and review the status of previously submitted datasets. New datasets can be submitted either one dataset or multiple datasets at a time. Once a dataset suggestion has been added to the DMS, its status can be tracked through the submission lifecycle. Agency POCs can access the DMS to view the entire published catalog, all published datasets and tools submitted by their agency, and a dashboard of all pending submissions. The DMS could, in the future, also disclose to the POCs compliance issues that are not being met by the agency and its data stewards. Module 3 – The Metadata Catalog The Data.gov metadata catalog will evolve into a shared metadata storage service that allows agencies to utilize a metadata repository that is centralized in a Data.gov controlled host, and use it for their own needs. Agencies that do not have metadata repositories of their own will be able to leverage Data.gov’s shared metadata repository as a service. So that agencies can leverage the shared metadata repository as an enterprise service, agencies will be able to flag which of their metadata they choose to share with the public via Data.gov versus those stored in the service but not exposed via Data.gov. Additionally, agencies will be able to designate whether their data contains personally identifiable data and whether the data adheres to information quality requirements. Figure 10 depicts the key components of a catalog record. It is important to understand that while these various components are drawn in separate boxes, they are actually all part of a single catalog record. The four parts of a robust catalog record are: * Catalog record header – this part holds both administrative book‐keeping parts of the overall record and all data needed to manage the target data resource. To manage a target data resource, this part will keep track of ratings, comments and metrics about the resource. * Data resource part – a data resource is the target data referred to by the catalog record. A data resource could be a dataset, result set or any new type of structured data pointed to by a catalog record. * Data resource domain part – a data resource belongs to a domain or area of knowledge. The domain of a data resource has two basic parts: resource coverage and resource context. Resource coverage is a description about what the resource “covers”. Resource context is metadata about the environment that produced the data including the production process. * Related resources part – a structured data resource may have one or more resources related to it. For example, structured data may have images, web pages or other unstructured data (like policy documents) related to it. Additionally, as evidenced on the current site, a dataset may have tools related to it or tools that help visualize or manipulate the data. Module 4 – Performance Tracking and Analysis Engine Data.gov will include a performance tracking and analysis engine that will store Data.gov and wider Federal information on data dissemination performance. Data.gov related measures will be combined with Federal‐wide data dissemination measures to gain a better understanding of overall Federal data dissemination. Agencies will supply measures to Data.gov and the total set of performance and measurement data will be made available to the public. A discussion of performance measures is in sections: “Measuring Success” and “Appendix A – Detailed Metrics for Measuring Success”. Module 5 – Audit Tool Over time, any organization can find that data have been published and exist in the public domain without active management or visibility inside of the organization. The Data.gov team may provide the expertise to assist agencies with identifying previously published data to assist those agencies in their own processes for data management and potential publication to Data.gov. The Data.gov team is considering deploying a search agent to scan Federal government domains in order to provide data that will assist agencies in evaluating their data management practices and accelerate integration of already public data resources into Data.gov. The audit tool will prioritize delivery for a basic capability focused on identifying and characterizing already public data assets in a useful manner for agency POCs. It would scan through Federal domains and formulate an index of potential datasets and build reports to deliver to agencies. Associated reporting would serve to provide some basis for the total population of data, provide intelligence to agencies on their potential data assets, and serve to assist the data steward community with an assessment of what is currently exposed to the public. This is not intended to automatically populate Data.gov, but rather to assist agencies with their own data inventory, management, and publication processes. The result should be better, more granular agency plans to integrate their already public data sets into Data.gov; more efficient and lower cost data management and dissemination activities through leveraging reported data to jump start and validate data inventories; enhanced ability to develop a proactive understanding of agency compliance with information dissemination and related policy. Most importantly, through continuous measurement the audit tool provides timely and actionable management data to agencies and makes their progress with integration into Data.gov transparent. Module 6 – Shared Hosting Services Data.gov will implement a shared data storage service for use by agencies. This service will be accessible via APIs and will provide agencies with a cost effective mechanism for storing data that will be made available to the public. The data stored within the service will be made available via feeds and APIs so that the application development community can receive direct enablement from Data.gov. Providing data in the right format is as critical as providing the data themselves. For instance, the shared hosting service could be used to provide data using query points such as RESTful web services, web queries, application programming interfaces, or bulk downloads. Data can be made more useful through these services and by extending the metadata template to include data‐type specific or domain‐specific elements in addition to the core ‘fitness for use’ type metadata currently in the Data.gov metadata template. Agency use of query points drives value in some instances. For example, agencies using query points would be able to directly measure “run‐time” use of their data as opposed to just recording instances of data downloads. Also, given agency control over the query point, agencies would be able to better support access to most the current and correct versions of data resources as well as more clearly understand downstream use and value creation resulting from their data resources. Data storage and publishing (end user access) would be subject to metering of some sort, to be determined. Given the operational aspect of this module and the need to scale based on volume and end user usage of data, the Data.gov team will look to fully align on the Federal Cloud Computing Initiative and leverage its managed service focus for this module. The core value proposition to agencies for using the shared hosting service is integration with the other modules, as well as alignment with the cloud initiative, which should reduce total costs and enable more efficient and effective realization of the full Data.gov value proposition. 3702eb65-4834-46ae-a9f5-0330e8166fce 0430bc3a-4b3f-4a33-b90d-ecd1cfbb5418 3.3 The Data.gov APIs [Enable developers to] interact with Data.gov through multiple Application Programming Interfaces (APIs). Application Developers Developers will interact with Data.gov through multiple Application Programming Interfaces (APIs), as shown below in Figure 11. The APIs will give programmatic access to the Data.gov catalog entries and the data within the shared data storage service. These APIs are a near‐term objective and are expected to be developed over the next six months. Specifically, the APIs will be both inbound and outbound. Inbound APIs will allow developers from within the Federal government to submit data or tools to the metadata catalog and submit actual data to the shared data storage service. These inbound APIs will be the most automated way, in the near term, to submit data to Data.gov. Outbound APIs will allow developers to leverage the data from the shared data service and the Data.gov metadata catalog to develop their solutions. Developers can build their own websites that leverage the Data.gov metadata catalog or can develop their applications using data from the shared data storage services. In these instances of developers leveraging the outbound APIs, the developers will be provided a means to submit usage statistics for their own applications and websites, as well as appropriate feedback, to help the government understand the overall usage and opportunities to improve the data being accessed via Data.gov. Additionally, the usage of the APIs will be metered and developers accessing those APIs will be required to register their use. Indeed, the metering will open opportunities for third‐party innovation and business models around un‐metered, fee‐for service, third‐party hosting and publication. As shown in the diagram above, there are three APIs developers use to access specific parts of the Data.gov architecture. The main API, called the DEV or developer API, provides read access to all records in the Data.gov catalog. This includes all components of a catalog record as discussed in the content architecture section below. For searching for datasets and other related data (like unstructured data) the developer can use the search API which will be developed to search across both Data.gov and USAsearch.gov. For accessing datasets stored in the Data.gov storage service, the developer will be able to use the data storage, or DS, API to retrieve them. 898f616a-b77f-4f6a-abad-06a28c33745a 166292a4-24ae-413e-ad1a-a312e683ff37 3.4 Data Infrastructure Tools Include data infrastructure tools that will enhance the Data.gov experience. The architecture for Data.gov includes data infrastructure tools that will enhance the Data.gov experience. Many of these data infrastructure tools will be developed not by Data.gov but by the expert communities that are most appropriate. For instance, the search data infrastructure tool will come from the work related to USAsearch.gov. The Data.gov architecture includes four data infrastructure tools as detailed below. Data Infrastructure Tool 1 – Collaboration Collaboration related tools will initiate and enable inter‐agency communication as mission owners explore and find other mission owners with similar goals and areas of responsibility. These tools will be built by agencies and to their own functional and business specifications. They will, however, have to adhere to Data.gov’s technical specifications so they may be properly hosted (or accessible from) and utilized on the site. Data Infrastructure Tool 2 – Feedback Data.gov will be able to accommodate a variety of feedback tools as they are developed either internal to the government or by third parties. The feedback tools will allow the general public to engage more efficiently with the Federal government around Federal data sets and the tools can be re‐used across Federal websites. Data Infrastructure Tool 3 – Search Since structured data, like data sets and query results (from query points, discussed previously), can be related to unstructured documents (like web pages indexed by USAsearch.gov), the Data.gov and USAsearch.gov teams are collaborating on an integrated search API and integrated search box widget, as depicted in Figure 12, that can federate search across both sites and return both structured and unstructured results. As depicted in Figure 12, a single integrated search widget can be shared across both the Data.gov and USAsearch.gov websites. This single search widget will use the same integrated search API that will search across both the Data.gov and USAsearch index11 (it could eventually be federated across other sites). The utilization of USAsearch will provide an economy of scale that would not otherwise be achieved had the project team gone about developing its own search capability internally. The user interface (UI) of the search page could look similar to the following figures: As indicated by the notional screenshots, Data.gov’s search page will display to each user the top queries by volume, top queries by trend (rate of change), and a keyword or tag cloud on the right. Additionally, users will have the ability to browse through each community of interest’s specific taxonomy. In the near term, browsing Data.gov catalog holdings will be improved by the Data.gov technical working groups crafting a taxonomy (a hierarchical structure of topics) that allows users to drill‐down by topic area. Data.gov’s search capability will be improved by adding an advanced search feature, end‐user tagging (known as a folksonomy) of datasets and the ability to “search inside” datasets for keywords. The advanced search feature will expand the number of data types that can be selected for search to include XML, RDF and all other formats contained in the catalog. Other advanced discovery mechanisms including geographic searching are being targeted for future releases. The Data.gov team will work with the FGDC to support their development of this capability for integration into the Data.gov solution. Geospatial search would allow an end‐user to draw a bounding box on a map, constrain the results by time or topic areas and then query the Data.gov catalog and visually return the hits that fall into that area. An example of this would be to display icons for any datasets on brownfields in a specific geographic area. In addition to Data.gov implementing this functionality directly the team will explore expanding the APIs as appropriate to include geospatial search by external websites. Figure 14 below is a notional illustration of the concept. Data Infrastructure Tool 4 – Agency and Site Performance Dashboards The agency and site performance dashboards will display the relevant metrics that are collected by the performance and analysis engine. As previously discussed, each agency will collect and share performance metric information with Data.gov through an automated process. This process will standardize the incoming performance data, and then load the data into a viewable dashboard environment that will be displayed to the public, Data.gov personnel, and agency personnel. The public’s performance dashboard will have limited access to the performance metrics. The performance data will be re‐usable across Federal websites as well as by the public. c52e5702-6e79-4603-a93a-0780a7feb672 dab42edc-6ee5-4707-a829-16dc834cefbd 3.5 Agency Publishing Mechanisms An agency will have three mechanisms to publish metadata records to Data.gov. These three mechanisms are: 1. The Dataset Management System – this is a protected website only accessible by authorized users as described previously. This website will enable agencies to publish metadata records to the Data.gov catalog in accordance with the agency’s dissemination process. 2. A Publisher API – an application programming interface that will allow an agency the ability to programmatically submit one or more records into the Data.gov catalog. 3. A Metadata Feed – if an agency desires to control publishing of metadata records on their own websites, the Data.gov PMO harvesting service will read the metadata feed and publish the records to the Data.gov catalog. The metadata feed will be a file in a standard feed format like Really Simple Syndication (RSS) or the Atom Syndication format. 3e9c11f8-b793-4f29-a2be-54c49ac68060 d48fb9a9-32f8-47ad-a105-847cf5929824 3.6 Semantic Web Take an evolutionary approach to implementing semantic web techniques OMB Memorandum M‐06‐02 released on December 16, 2005, stated “when interchanging data among specific identifiable groups or disseminating significant information dissemination products, advance preparation, such as using formal information models, may be necessary to ensure effective interchange or dissemination”. OMB Memorandum M‐06‐02 further noted that “formal information models” would “unambiguously describe information or data for the purpose of enabling precise exchange between systems”. A good example of this is OMB’s Office of Information and Regulatory Affair’s development, support, and use of formal statistical policy standards12 like the standards for data on Race and Ethnicity, Metropolitan Statistical Areas (MSA), and the North American Industry Classification System (NAICS). Agencies can enable cross‐domain correlation between datasets by tagging datasets or fields in datasets as belonging to standard categories of such data standards. For example, let’s say a web‐savvy developer wants to create a mashup that visualizes and ranks various industries on revenue per employee. If one agency has published data on a designated industry’s revenue and another agency has published data on its employment, then these records could be correlated if both datasets are categorized via the standard NAICS codes to produce revenue per employee for the given industry. Through reuse of these semantically harmonized and uniquely identified categories across domains, the data from multiple sources can be appropriately merged and new insights achieved. The government has also produced several cross‐domain data models that can be leveraged to improve both semantic understanding and discoverability of government data sets. The National Information Exchange Model (NIEM) and the Universal Core (UCore) are two robust data models that are gaining traction, incorporating new domains and increasing information sharing across federal agencies, the Department of Defense and the Intelligence Community. The NIEM data model is designed in accordance with Resource Description Framework (RDF) principles and can generate an OWL representation. NIEM has extensive use across levels and domains of government. In particular, it has been endorsed by the National Association of State Chief Information Officers. The US Army has created the UCore‐Semantic Layer (SL) which is an OWL representation of the basic interrogative concepts (who, what, when, and where). These efforts are prime examples of the government’s ability and commitment to providing robust tagging and modeling mechanisms to improve discovery of, sharing of and eventually reasoning about federal data. Today’s “industry best practices” are more frequently grounded in semantic techniques that enable the semantic web and query points that the public can directly access (like Amazon Web Services13). Under this model, it is the (formally coded) data concepts themselves that are cross‐linked, as opposed to just cross linked web pages. There is a push among some search engine companies to create standards for indicating certain kinds of metadata directly within web pages. Rich Snippets from Google and Search Monkey from Yahoo14 are competing attempts (but with similar goals) to allow content developers to associate structured data with information shown on their websites. They currently support a variety of formats, including micro formats and Resource Description Framework (RDF). In accordance with the philosophy of OMB Memorandum M‐06‐02, and leveraging today’s mainstream “formal information model” capabilities, the evolution of Data.gov will include the incorporation of semantically enabled techniques within the sites and within the datasets themselves. Semantic Web Techniques -- The semantic web has a simple value proposition: create a web of data instead of a web of documents. The “web of data” will be designed to be both human and machine readable. The core insight is that data has distinct or overlapping meaning in different contexts. This is a core information technology problem and is manifest in applications such as cross‐boundary, cross‐domain information sharing, natural language processing, and in enterprise data integration and business intelligence (i.e., mash‐ups, dashboards). An example of how this is manifest is the ambiguity highlighted via an example in Wordnet as depicted in Figure 17 Figure 17 shows how the word tank can have quite a few different meanings as both a verb and a noun. In some applications the context is implicitly understood and this is not an issue. But as soon as two distinct data sets use the same label to have distinct meanings, or the meanings overlap but only partially, or the meanings are the same but that is hidden due to distinct coding or syntactical issues, we introduce ambiguity and most likely defeat the purpose of combining the data sets in the first place. In order to create this web of data, the W3C and other standards groups have designed specific data modeling techniques to provide such machine readable precision via identification, relationships, advanced modeling and rules. Let’s briefly describe each technique and then demonstrate examples of this “curated” data approach. Unique and persistent identification of a unique concept is important to insure unambiguous linking and the accrual of facts on a specific topic. For example, Sir Tim Berners‐Lee uses the identifier, http://www.w3.org/People/Berners‐Lee/, to identify himself and the people he knows using a Resource Description Framework (RDF) formatted data model called FOAF for “Friend of a Friend” as depicted in Figure 18. Unambiguously identifying all things in a domain is the key first step to enabling machine readable correlation and reasoning about those things. Additionally, by identifying something with a unique Uniform Resource Locator (a URL is a form of URI), one can retrieve a document that provides additional information about the topic and possible equate other things that have been previously identified and are the “same as” this one. Once things are identified, formal relationships between things (and unique identifiers for those relationships) can be asserted. For example, also shown in Figure 18 is the FOAF relationship labeled “knows” which is uniquely identified with the URI: http://xmlns.com/foaf/0.1/knows. Semantic web modeling expands the traditional modeling techniques of Entity‐Relationship Diagrams (ERDs) and Class modeling (as in the Unified Modeling Language or UML) to add powerful logical primitives like relationship characteristics and set theory. Some powerful relationship characteristics are relationships that are “transitive” or “symmetric”. A transitive relationship is something like the genealogical relationship “has Ancestor” which is very important in deductive reasoning as is depicted in Figure 19. Additionally, as you can see in the figure, since Matthew “has an ancestor” named Peter and Peter “has an ancestor” named William then it holds that Matthew “has an ancestor” named William. A geographic example of a transitive relationship would be “encompasses” as in “Virginia encompasses Prince William County and Prince William County encompasses Manassas”. A symmetric relationship is something that holds in both directions. For example, if Mary is “married to” Bill then Bill is “married to” Mary. One final advanced modeling technique is the ability to model types or classes of things using set theory primitives like distinct, intersection and union. This is a very powerful technique for mathematically determining when a logical anomaly has occurred. For example, if a user has an alerting application that is scanning message traffic for the location of a violent criminal on the loose, he/she needs a precise model of a violent criminal as opposed to non‐violent criminals (as depicted in Figure 20) and a person cannot be both (or there is an anomaly). Additionally, to create these advanced domain models there are even free tools, like protégé at http://protege.stanford.edu, and many tutorials on the web to educate agencies on these topics. In conclusion, curation is the process of selecting, organizing and presenting the right items in a collection that best deliver a desired outcome. Curation of data is preparing data so that it is more usable and more exploitable by more applications. In that light, the semantic web techniques previously discussed are the next logical step in the widespread curation of data. In particular, it is a leading edge, potential best practice in Federal data management. A good example of the benefits of such curation is the Wolfram Alpha website (http://www.wolframalpha.com). Wolfram Alpha exclusively uses curated data in order to calculate meaningful results to queries. For example, returning to our crime scenario, a user could input to Wolfram Alpha, “violent crime in Virginia/violent crime in the US” and it computes the information in Figure 21. Other benefits of using semantic web techniques include cross‐domain correlation, rule‐based alerting and robust anomaly detection. While out of scope for this document, it should be obvious that increasing the fidelity of data increases its applicability to solving problems and increases its value to the Data.gov developer and end‐user. The Semantic Web Roadmap -- Semantic web techniques are not yet widespread in the Federal government. Given our principle of program control, Data.gov takes an evolutionary approach to implementing these techniques. Such an evolution involves pilots, a piece‐meal transition and a lot of education. The result will be to demonstrate the value proposition, establish end user demand, and empower data stewards to adopt semantic web techniques. In order to accelerate evolution, an experimental semantic‐web‐driven site will be established as depicted in Figure 22. In addition to agency pilots, the semantic.Data.gov site will leverage lessons learned from the United Kingdom’s version of Data.gov (soon to be released) which will be built entirely on semantic web technologies. An ancillary benefit of piloting techniques like unique identification and explicit relationships is that the lessons learned will assist the more traditional implementations of these techniques on Data.gov. It is envisioned that as the benefits and applications based on semantic Data.gov datasets increase, a migration and transition plan will be developed to merge the efforts. d8c3570c-5590-4990-a916-db1fd6704cfb 924992b8-9433-427b-9e92-87e0e8b49dd2 3.7 Other Government Websites Working with Other Government Websites In general, Data.gov will be the source location to access structured data behind some of the government’s most significant websites. Existing and newer websites such as USA.gov, the Federal IT Dashboard, USASearch.gov, FBO.gov, USAspending.gov, Geospatial One Stop, FedStats, and Grants.gov all have major presentations of data using search and presentations technologies. The structured data behind these websites will be part of the inventorying and metadata harvesting process as previously described. These other initiatives are expected, like the agencies, to register their data and tools with Data.gov so that Data.gov includes the most appropriate inventory of data and tools available to the public. Data stewards who previously published to these sites may continue to do so as these sites, once they register their data and tools with Data.gov, will be integrated with the Data.gov solution. Additionally, any of these sites that require reports from agencies should also move to require reports in machine‐readable formats. Agencies that have geospatial data are in many cases publishing that data to Geospatial One Stop (GOS) today. The harvesting process used by GOS is mirrored in the conceptual solution architecture described above and points to a roadmap for further integration. The Data.gov team will work with the GOS team to pursue further integration of GOS into Data.gov. In addition to working with other Federal agencies and initiatives, the Data.gov team is working with the National Association of State CIOs (NASCIO) to share standards and arrive at compatible concepts of operation. The Data.gov PMO will look to expand similar relationships in the US and internationally. These relationships may be modeled on the formal structure that OMB and the Data.gov team are using to engage and establish a long‐term collaborative relationship with other federal entities. 77af5683-bf92-4e91-81ef-f59be16326e9 0a18dd93-e0ca-4f88-b3d7-f16f90222cda The current state physical architecture for Data.gov consists of a website and a relational database that serves as the metadata catalog containing the site’s content. The evolution of the physical architecture will be based on the conceptual architecture that is depicted in this document. The conceptual architecture has been developed based on feedback from the user community, feedback from the data producing Federal agencies, and overall alignment with Data.gov’s strategic intent and core design principles outlined in section 1 of this document. The feedback on the current Data.gov architecture has been fairly uniform in that: * Federal agencies that produce data want an easier way to make their data available on Data.gov * End users of Data.gov want easier ways to use the metadata from Data.gov and the actual agency data represented on Data.gov The conceptual architecture for Data.gov has been evolved in response to this feedback. Like any architecture effort, there are many ways to architect the solution and still satisfy most areas of feedback. Several architectural alternatives were established and reviewed in the context of the core design principles outlined in Chapter 1 of this document: (1) providing a solution that focuses access to data and facilitating third‐party application and website developers, (2) leaving control of data dissemination to the programs that produce the data, (3) providing mechanisms to rapidly disseminate data, (4) providing mechanisms to receive and act on feedback, (5) leveraging common solutions,(6) implementing a component or module based solution, and (7) implementing a solution that aids in developing and extending best practices. This vision can be fulfilled through the delivery of several technical components, including six core modules and four data infrastructure tools that are described in further detail in section 3. The core modules will include (1) the website, (2) the Data Management System (DMS), (3) the metadata catalog, (4) a performance tracking and analysis engine, (5) an audit tool, and (6) a hosting service. The initial data infrastructure tools include collaboration, feedback, agency and site performance dashboards, and search related tools. It is expected that the Data.gov team will focus on building or acquiring the core modules and partner with others on the development and operations of the data infrastructure tools. The core modules and data infrastructure are to be based on open standards and aligned with web trends and patterns. Further, the conceptual architecture supports innovation by the Data.gov team and others to increase the number, scope, and operating or business models for data infrastructure tools. The modular architecture, and the ability to leverage other governmental and other entities with respect to data infrastructure tools, enables the Data.gov team to iteratively build out the solution in line with allocated budgetary resources while still accelerating realization of the end‐to‐end vision. 4 Agency Next Steps Agencies will be expected to include the following basic steps [objectives] as part of their work plans. 4.1 Additional Datasets Expose Additional Datasets via Data.gov Each agency should identify and publish online in an open format new high value data sets and register those data sets via Data.gov. These should be data sets not previously available online or in a downloadable format. Where agencies already have high value data sets online, the focus should be on registering them with Data.gov. High value datasets are those that can be used to increase agency accountability and responsiveness; improve public knowledge of the agency and its operations; further the core mission of the agency; create economic opportunity; or respond to need and demand as identified through public consultation. In addition, agencies are expected to document their inventory of data and tools that are currently available to the public and ensure that these data and tools are registered in Data.gov as appropriate. Agencies should continue to diligently work towards the goal of having their currently public data and tools exposed via Data.gov to ensure that the investments in making these data and tools public is met with maximized discoverability by the public. As discussed in Chapter 3, there are three mechanisms by which an agency can present data and tools on Data.gov. In the near term, agencies should leverage DMS core module to expose their data or tools on Data.gov. Once the other metadata input mechanisms are developed, agency data stewards will have the option to use the API or metadata feed mechanisms as well as the already established DMS mechanism. Agencies are encouraged to communicate the existence of the DMS and promote its use. Once the other two metadata entry mechanisms are developed, agencies will be notified and will be encouraged to promote and use any or all of the three metadata input mechanisms. 14bdc9b1-dbdb-4edd-a896-a47274bc32a6 b02d8da1-8a4e-4847-b0f5-0acb0e71742e 4.2 Compliance Ensure Compliance with Existing Requirements Many statutory responsibilities are applicable to Data.gov and should be considered when agencies are formulating their strategies and tactics for data dissemination. The Dataset Management System does include the ability for agencies to self‐certify the compliance of their data and tools. The following existing legislative requirements are highly applicable to Data.gov. Information Quality -- In accordance with Section 515 of the Treasury and General Government Appropriations Act for Fiscal Year 2001 (Public Law 106‐554), OMB has published guidelines to help agencies ensure and maximize the quality, utility, objectivity, and integrity of the information that they disseminate. In addition, all Federal agencies are required to issue their own implementing guidelines that include administrative mechanisms allowing affected persons to seek and obtain correction of information maintained and disseminated by the agency that does not comply with the OMB guidelines. Data.gov requires that agencies ensure that data disseminated via Data.gov are consistent with their agency’s information quality guidelines. Security and Privacy -- Data.gov maintains the President’s commitment to protecting the privacy of the American people. Therefore, all applicable privacy protections are enforced when the public interacts with the Data.gov environment. Data.gov requires that data shared by agencies conform to all applicable security and privacy requirements including the Privacy Act of 1974, the E‐Government Act of 2002, applicable Federal security standards including NIST 800‐39, and other guidance as issued by OMB (See: Office of Information and Regulatory Affairs (OIRA) Information Policy). Paperwork Reduction Act -- The Paperwork Reduction Act (PRA)19 applied to Data.gov provides functionality by which the public is solicited to provide ratings of datasets. Prior to Data.gov releasing this functionality, OMB review and approval was obtained based upon an evaluation of the need to collect this information, the practical utility of the information, and the minimal burden imposed on the public in responding to the requested ratings. Future PRA‐oriented approvals could be sought for asking the public additional questions, engaging the public in new collaborative ways, and allowing the public to self‐register for email and other notices. Further, the PRA lays out statutory requirements for agencies with respect to information dissemination and, among other things, requires to the extent feasible, that each agency should, when disclosing information in electronic format, provide timely and equitable access to the underlying data. E‐Government Act -- The E‐Government Act of 200220, requires OMB to issue policies, identified previously in the document, requiring “the adoption of standards, which are open to the maximum extent feasible, to enable the organization and categorization of Government information in a way that is searchable electronically, including by searchable identifiers”. Accessibility -- Section 508 of the Rehabilitation Act requires that Federal agencies provide individuals with disabilities who are either Federal employees or members of the public seeking information or services with access to and use of information and data that are comparable to the access to and use of the information and data by such Federal employees or members of the public who are not individuals with disabilities. The Data.gov website is designed and tested accordingly to ensure conformance to the requirements for Section 508. 6dbbea61-cb19-4fad-9ac2-59a544dc116d 1e85fa00-95e1-415a-bd61-779beb81a13b 4.3 Feedback and Evolution Evolve Agency Efforts based on Public Feedback Although agencies are receiving feedback through Data.gov, agencies are expected to provide the public with additional opportunities to provide feedback via their own agency web pages. These pages should incorporate a mechanism for the public to give feedback and assessment on the quality of published information and provide input about which information to prioritize for publication. In addition to public feedback, agencies will receive feedback from the Data.gov PMO as previously discussed in this document. Agencies will receive performance results based on pre‐established metrics and results from scans of previously released data in the dot gov domain. b66c6e0e-c82c-4f1c-be28-72c70fba84be 96e9733c-706d-45ce-ab24-4aa45ad27d6e 4.4 Training Sign‐Up and Complete Data.gov Training POCs Data Stewards Program Executives Agency Stakeholders The Data.gov team currently offers training on the use of the DMS application. In the near future, training will be ramped up and offered to POCs, data stewards, program executives, and other agency stakeholders. OMB and the Data.gov team will leverage and extend the model used during the recent launch of the IT Dashboard, whereby focused, hands‐on training sessions were held on a continual basis in dedicated training facilities. The training will start with practical aspects of integrating existing and new data sets into Data.gov, and will extend into a forum for sharing data management best practices to help accelerate agency activities. Over time, there will be a family of classes with established curriculum and online resources that can be reused within agencies and by others. 80a9bea4-befc-4c05-b9e2-faec1055c863 e68b30df-957c-4772-b2a1-810881e3b0ee 4.5 Working Groups Participate in Data.gov Working Groups The Data.gov team will leverage working groups to continue the evolution of the Data.gov related core modules, data infrastructure tools, and standards. Existing and new teams will have the responsibilities of evolving the metadata standards, sharing best practices, coordinating cross domain relationships, and developing requirements for the Data.gov shared services. Agencies are encouraged to actively participate in these new and existing working groups. Data.gov team will provide open and transparent ways for the public to collaborate with the Government as they embark on workgroups, including a focus on participating in activities of relevant standards organizations. 84072d29-56d2-4982-a0cd-cafd627cb152 51023686-f306-43f5-9b1e-c708cc454243 4.6 Policies and Procedures Evaluate and Enhance Policies and Procedures Agencies have numerous management policies, procedures, and activities in place – agency wide, specific to bureaus, and embedded in programs. Agencies should open these up to review and evolve them based on the need to institutionalize the activities, processes, and responsibilities described in this Concept of Operations. This includes looking at agency overall and Information Resource Management Strategic Plans; priorities for resource allocation, and efficiencies possible through rationalizing and improved federation of existing activities around the information dissemination and sharing value proposition described earlier. One specific area to highlight is agency, bureau, and program data management and architecture activities. Refocusing these to support dissemination and sharing of high value agency information assets, either with the public in general (via Data.gov) or with specific mission partners (e.g., other agencies; State, local, or tribal; international partners, private sector, or with individuals through delivery of services) could provide an opportunity to more tightly align currently disparate and/or loosely coupled activities. Leverage of the shared solution components and explicit participation in Data.gov working groups could offer economies of knowledge and accelerate agency integration of best practices. Finally the explicit, measured connection to the public including feedback offers quantifiable and attributable measures of value creation. 56aa27c0-5038-4e79-803e-a14fc5be1458 09e7431c-52ec-453a-b9f3-51115b6055be 4.7 Semantic.Data.gov Pilots Initiate Pilots for Semantic.Data.gov As previously discussed in section 3.6, the evolution of Data.gov will include a progression towards the semantic web, a fast moving space that will fundamentally transform the web. It is expected that the UK version of Data.gov will be using a semantic web approach. The U.S. Library of Congress is a best practice example of a Federal organization that is already moving towards the semantic web with its “Authorities and Vocabularies” service. Agencies can study approaches like that used by the Library of Congress in anticipation of semantic.data.gov. An agency that owns/defines authoritative domain data will eventually be asked to put the domain specifications (metadata) and the corresponding instance data on the web using semantic techniques. Working groups under the Senior Advisory Council will focus energies on establishing the relationships (the links) between these authoritative datasets. In some instances, relationships may already exist and simply need to be adopted by the data stewards. 7b87cbc4-43fe-433c-9a7a-6ad1e3c1621a a2cc9e4d-1bcc-4a2d-9e8f-33cedf3815e0 Agencies will be starting from many different places as they begin to implement their data dissemination strategies including participation with Data.gov. Although some agencies will already be further along than others, all agencies will be expected to include the following basic steps as part of their work plans. 2009-12-03 2009-12-08 http://www.ideascale.com/userimages/sub-1/736312/ConOpsFinal.pdf Owen Ambur Owen.Ambur@verizon.net Submit error.