Opening Up New York City’s Public Data
The New York City Transparency Working Group seeks to make New York City’s digital data more easily available and usable by the public, so as to promote more open government, economic development, and technological innovation.
Advances in technology have made it affordable for New York City to proactively disclose information by posting it online. Additionally, New York’s Freedom of Information Law (FOIL) states that government information must be disclosed to the public unless there is a specific reason it cannot be.
Accordingly, we urge New York City to post online all non-personal, government data subject to disclosure under FOIL, and to create laws and policies which accomplish this. Our groups will support legislation which moves the City towards this goal. We have created a short list of recommendations we believe successful legislation should include, and a more detailed explanation below.
The member groups of the New York Transparency Working Group applaud the work of the Mayor, City Council, City Council Technology Committee, and the Department of Information Technology for working to promote new open data standards and practices. We appreciate your consideration of our recommendation and look forward to working with you.
* * *
Summary: Ten Steps NYC Open Data Legislation Should Include
- Strong statement of why open data is good for New Yorkers.
- A public data directory or inventory.
- Open data officers as internal advocates within City government.
- A public process to identify valuable data and create priority data release.
- An Open FOIL tool to track FOIL requests and compliance.
- A public feedback forum extending beyond developers.
- A public data release schedule and compliance plan.
- A public Open Data Dashboard, to track the progress of data release.
- An Open Government Commission.
- Broad and universal definition of terms.
The New York City Transparency Working Group recommends that NYC open data legislation includes:
1. A statement of value and purpose
For instance, “It is the goal of the City of New York to post all non-personal, government data subject to the Freedom of Information Law on the internet, in a standard, non-proprietary format, easily usable by the public.” This legislation is a step towards achieving that goal.
Opening New York City government data is consistent with the Freedom of Information Law, and will foster more open government, economic development and technological innovation, and will improve government efficiency while reducing the cost of government.
2. A public data inventory or directory
DoITT should create a Public Data Directory, or its functional equivalent, which provides a complete listing of public data sets subject to disclosure under FOIL. The public data directory is a prerequisite for opening NYC data. Without it, neither government nor the public know what data the City has to release.
The public data directory should include both easily publishable datasets and datasets that have personal identifiable information. The agency should publicly disclose datasets which have personally identifiable information which cannot be easily extracted from otherwise public data. (Note, as proscribed by NYS FOIL law, all databases should be designed so that personal identifying data is easily segregated.)
The public data directory or its equivalent should include specific descriptions of the contents, format and methods for accessing public datasets subject to FOIL, and include the name, title, office address, and the office telephone number and email address of the official in each agency responsible for receiving inquiries about such information. All City agencies should be required to provide DoITT with the information it needs to create the Public Data Directory.
3. NYC open data officer within DoITT and open data officers at each agency.
The City needs a publicly identifiable internal advocate to promote open data. The NYC Open Data Officer manages and promotes the City’s open data efforts, oversees the data catalog, helps identify public priorities and encourages and tracks agency data disclosures, and solves problems with data sets. Additionally, each agency should designate an open data officer, who works with the NYC open data officer and DoITT.
4. A public process to identify valuable data and create priority data release.
DoITT should be charged with establishing a clear, public process for identifying which data is most valuable to the public, and should be released first. That data should be used to help create a data release schedule. Our groups believe it is both possible and necessary to “prioritize” which data is released first. We think City agencies will have a very difficult time finding the resources, technical capacity, and political will to release all of their public data in a data openness “big bang.” We would note that there is extremely high public interest in policing and education data, yet those agencies have been among the most resistant to opening up their data. Thus, we seek a process that focuses public attention on a data release schedule that is attainable, incremental, and contains information of high public interest.
Methods for determining the order data is released in:
- Information already posted on agency websites, but not downloadable in a usable format.
- 311 information requests. (An API for 311 data will make this even more powerful.)
- FOIL disclosure requests of a public nature.
- Online requests via “Feedback” sites.
- In person surveys of “super user” groups like businesses, advocates, and journalists. (This has been done successfully by the Comptroller’s staff as they develop their transparency site.)
5. An Open FOIL web tool to help identify priority data sets.
FOIL disclosure requests are potentially a powerful tool for identifying “high priority data sets.” The City should create Open FOIL, a DoITT administered web tool which includes:
- A centralized online process for submitting FOIL requests.
- A dashboard or visualization tool similar to the federal IT Dashboard which makes it obvious what City databases are getting the most requests for public data disclosure.
- A site which posts agency responses to non-personal FOIL requests.
Open FOIL will improve the City’s administration of FOIL, help reduce compliance costs, track the type of FOIL requests, and timeliness of responses, and reveal to the public and our elected officials agency reasons for refusing to disclose information.
In any event, all agencies should be required to create and keep a public, online, log of FOIL disclosure requests. This is intended to show what information is of greatest interest to the public, and how well agencies are complying with FOIL. The online log should not include anything identifying the person or organization making the FOIL disclosure, and identify only the information they seek.
6. Public feedback forum extending beyond developers
The interested public and developer community can help provide valuable feedback on technical issues and practical problems to City agencies releasing large volumes of data. New York City should create a site similar to the City of Portland’s CivicApps.org, but intended to encourage participation from interested New Yorkers beyond the developer community. The forum should provide an easy way for New Yorkers to request new data sets and comment on current and planned data releases. Agencies should use this forum to report on planned data releases, data issues, and to request apps they would like built or data visualized.
7. A public data release schedule and compliance plan
We recommend DoITT create an online schedule for data release which is prioritized, achievable, incremental, and timely. The schedule is best updated on an ongoing basis, and should be mandated to be updated quarterly. The schedule should include a list of all City data sets in order of future release dates, and identify data sets of highest public interest, highest governmental interest and describe the policy, legal or technical issues that maybe delaying the release of certain data sets. The schedule should be accompanied by a quarterly compliance plan which describes how data release efforts are progressing, and identifies problems and solutions. The compliance plan should include quantifiable targets, and be read in part as a more detailed report card accompanying the open data dashboard and data release schedule.
8. A public Open Data Dashboard, to track the progress of data release.
To promote public and internal accountability, the City needs a public, online, visualization tool dedicated to open government. This site should track which data has been put online in a downloadable format or made directly accessible via direct public access to databases, and how the public is using this data in applications. This site should make it clear which agencies are meeting data release goals per: inventory of data sets released, scheduled data releases by agency, data releases behind schedule etc. Examples of such sites include the White House Open Government Dashboard.
9. An Open Government Commission
Within a short time after open data legislation is passed, the Mayor and Council should jointly appoint an Open Government Commission similar to the New York City Lobbying Commission. This small advisory body, consisting of experts on open data and open government from within and outside of government would make recommendations for improvements to the City’s open government laws and procedures, and report on the progress of open data efforts.
10. The broad definitions of terms, such as:
“Data” is digital records owned, created or used by the City of New York.
“Data Set” is related digital information stored together, or identifiable as a retrievable unit or file.
“Public Data” is digital information subject to public disclosure under the Freedom of Information Law and related case law, plus all information found in public “records” and as further defined by privacy and security acts such as FERPA and HIPAA.
“Metadata” is data providing information about one or more aspects of other data including: means of creation of the data; purpose of the data; time and date of creation; creator or author of data; placement on a computer network where the data was created; data standards used. City legislation should seek to conform to international metadata data standards such as those implemented by other data catalogs and the 15 properties of Dublin Core.
Metadata should also be used to describe documentation for the data model used in the dataset including definitions for the column headings in tabular data, foreign keys and linked tables in relational data, and the spatial projection used for geospatial data.
When possible, metadata defining the different pieces of data should be structured to allow semantic markup and linked-data best practices.
“Record” means any information including that found in reports, statements, examinations, memoranda, opinions, folders, files, books, manuals, pamphlets, forms, papers, designs, drawings, maps, photos, letters, microfilms, computer drives or media, rules, regulations or codes, machine readable materials, or other documentary materials, regardless of physical form or characteristics made or received by an agency of the City of New York under City law or in connection with the transaction of public business and preserved or appropriate for preservation by that agency or its legitimate successor as evidence of the organization, functions, policies, decisions, procedures, operations, or other activities of the City or because of the informational value of data in them.
“License” is any records created by City of New York shall be made available under a CC-Attribution or CC-Zero license while any records not created by the City shall retain their original copyright and license
“Voluntary consensus standards” should be redefined as “Open Standards” using the Perens definition adopted by Vermont and New Hampshire law. The definition reads:
“Open standards” means specifications for the encoding and transfer of computer data that:
- Is free for all to implement and use in perpetuity, with no royalty or fee;
- Has no restrictions on the use of data stored in the format;
- Has no restrictions on the creation of software that stores, transmits, receives, or accesses data codified in such way;
- Has a specification available for all to read, in a human-readable format, written in commonly accepted technical language;
- Is documented, so that anyone can write software that can read and interpret the complete semantics of any data file stored in the data format;
- If it allows extensions, ensures that all extensions of the data format are themselves documented and have the other characteristics of an open data format;
- Allows any file written in that format to be identified as adhering or not adhering to the format; and
- If it includes any use of encryption, provides that the encryption algorithm is usable in a royalty-free, nondiscriminatory manner in perpetuity, and is documented so that anyone in possession of the appropriate encryption key or keys is able to write software to unencrypt the data.
“Voluntary Consensus Standards Bodies” should be clarified to reflect that they are bodies which establish “open standards.” The term can be changed to reflect this relationship and read: “Open standards bodies” means domestic or international organizations which plan, develop, establish, or coordinate open standards using agreed-upon procedures.
“Web application programming interface” (web API or web service) should read: a set of rules and specifications that software programs (including websites and mobile applications) can follow to communicate with each other over the Hyper Text Transfer Protocol (HTTP).