The State collects, manages, and disseminates a wide range of data. As organizations classify data and catalog their publishable state data, they should be mindful of legal and policy restrictions on publication of certain kinds of data. Following are general guidelines regarding disclosure to consider as partners begin to identify and review data tables.
The California Health and Human Services Agency (CHHS) Data Subcommittee commissioned the development of Agency-wide guidelines to assist departments in assessing data for public release. The CHHS Data De-Identification Guidelines are focused on de-identification of aggregate or summary data. Aggregate data means collective data that relates to a group or category of services or individuals. The aggregate data may be shown in table form as counts, percentages, rates, averages, or other statistical groupings. Refer to the CHHS Data De-Identification Guidelines for the specific procedures to be used by departments and offices.
Security, Privacy, Regulatory & Aggregate Data
The public release of some State data might result in the violation of laws, rules, or regulations. Some data may not be appropriate to release because it can compromise internal departmental processes, such as procurement. Other data may contain personally identifiable information. Finally, even if detailed data appear innocuous, it may be possible to combine it with other public information – in any medium and from any source –to reveal sensitive details (commonly known as the mosaic effect).
Common kinds of data with personal information include: real estate records, individual licensing databases (MD, RN, contractors, lawyers, etc.), marriage records, news (and other) media reports, commercially available databases (data brokers, marketing), court documents, etc. See the ‘Publicly Available Data’ section in the CHHS Data De-Identification Guidelines for more information.
Even if there are no legal impediments to publishing the data, releasing it may have unintended or undesirable effects. For example, posting anonymized arrest records on a weekly basis might inadvertently reveal where police are concentrating enforcement efforts.
Various statutes and regulations, such as HIPAA and California’s health information privacy laws, have very exacting requirements for determining whether data have been sufficiently de-identified so as not to compromise individual privacy. For example, the presence of medical conditions by geographic location might constitute high value, useful, and sought-after data; however, exposing it might identify individuals and their medical conditions.
Another example is the Family Educational Rights and Privacy Act of 1974 (FERPA). Under FERPA, the Federal Government has established guidelines for data privacy to prevent individuals from being identified indirectly from aggregation of data. Partners that deal with student educational data should be aware of guidelines that restrict publication of some data.
Even in the absence of specific legal prohibitions, government entities should beware of outlier conditions or rare events that could lead to identification of individuals. For example, identifying a single arrestee who is a minor of a certain age in a certain county without providing any other information, might nonetheless serve to identify that particular individual.
All data needs to be assessed for potential risk of identification of individuals represented in the data for whom there are laws that protect the privacy of those individuals. Laws include both federal and state laws. In order to assist partnering organizations in this process, CHHS has created the Data De-identification Guidelines. These Guidelines discuss various methods for assessing potential risk associated with data sets proposed for release and various statistical methods that can be used to mask data and protect individuals from being inappropriately identified in the data tables. For example, if a cell in a particular data table goes below a certain number of individuals, the value in that particular cell may be hidden. It is important to balance desires to publish accurate, complete, and valuable tabulations against the need to guard against unwarranted invasions of personal privacy. Refer to the CHHS Data De-Identification Guidelines for the specific procedures to be used by partnering organizations to assess data for public release.
Public Records Act (PRA) Applicability
Under the Public Records Act the presumption is that government records shall be open to the public, unless excludable under a narrow set of specific exemptions including such concerns as invasion of personal privacy, impairment of contractual or collective bargaining negotiations, exposure of protected trade secrets, interference with law enforcement or judicial proceedings, endangering life or safety, and others. Organizations should confer with their PRA officers for advice as to whether data might cause the harms described in the PRA law, and therefore would not constitute “publishable state data” for an open data portal.
In some circumstances, a organizations may not possess all the necessary rights to be able to publish a specific data table. For example, if the data were collected or compiled (completely or partially) by a third party there may be a contractual or intellectual property limitation which prevents it from being made public. If the third party approves publication of the data the appropriate documentation of the permission must be secured and additional disclaimers may be required. Organizations should ensure that their legal counsel is aware of a potential ownership issue and/or that the data were compiled or collected by a third party when vetting data through the approval process.
The Federal Open Data Policy states: “Agencies must apply open licenses, in consultation with the best practices found in Project Open Data, to information as it is collected or created so that if data are made public there are no restrictions on copying, publishing, distributing, transmitting, adapting, or otherwise using the information for non-commercial or for commercial purposes.”
Works produced by outside parties which are created or obtained for use by the State may need open licenses applied to them: “When information is acquired or accessed by an agency through performance of a contract, appropriate existing clauses 22 shall be utilized to meet these objectives”
The Project Open Data Metadata Schema provides a license field which is defined as “the license or non-license (i.e. Public Domain) status with which the dataset or API has been published” and must be provided as a URL. Guidance and example URLs can be found below for properly documenting the license or non-license of your agency’s data in accordance with the open data policy.
Open License Definitions
For the purposes of Project Open Data, the term “Open License” is used to refer to any legally binding instrument that grants permission to access, re-use, and redistribute a work with few or no restrictions. While technically not a “license,” worldwide public domain dedications such as Creative Commons Zero also satisfy this definition. An “Open License” must meet the following conditions:
- Reuse. The license must allow for reproductions, modifications and derivative works and permit their distribution under the terms of the original work. The rights attached to the work must not depend on the work being part of a particular package. If the work is extracted from that package and used or distributed within the terms of the work’s license, all parties to whom the work is redistributed should have the same rights as those that are granted in conjunction with the original package.
- Redistribution. The license shall not restrict any party from selling or giving away the work either on its own or as part of a package made from works from many different sources. The license shall not require a royalty or other fee for such sale or distribution. The license may require as a condition for the work being distributed in modified form that the resulting work carry a different name or version number from the original work. The rights attached to the work must apply to all to whom it is redistributed without the need for execution of an additional license by those parties. The license must not place restrictions on other works that are distributed along with the licensed work. For example, the license must not insist that all other works distributed on the same medium are open. If adaptations of the work are made publicly available, these must be under the same license terms as the original work.
- No Discrimination against Persons, Groups, or Fields of Endeavor. The license must not discriminate against any person or group of persons. The license must not restrict anyone from making use of the work in a specific field of endeavor. For example, it may not restrict the work from being used in a business, or from being used for research.
Examples of Open Licenses can be found here: https://project-open-data.cio.gov/open-licenses/
Inaccurate Data: Despite an organization’s best efforts, it is possible that some data will be inaccurate and analyses may turn up issues that the public was unaware of and the press covers. When any concerns about inaccurate data are brought to the attention of the organization, they should look into the matter and make corrections as appropriate.