RSS

Privacy, Secrecy, and Data

With the growing importance of AI, we need to change our current restraints of privacy and secrecy to improve data quality and access.

In January of 2020, I made a post to Facebook. It did not get the level of feedback I had hoped for.

I had hoped to get feedback on the idea to inform and clarify my thoughts. Without that, I have tried to expand the idea and try to cover not only the positives from the idea but also consider many of the negatives.

This is a draft document. I have placed it on the internet for reviewing purposes. Mainly it is incomplete.

Let’s first clarify some terms.

Data and information
I will use these two terms interchangeably even though there are some subtle differences. Data and information refer to things such as:
  • point measurements such as a time series of heart beats which may be averaged to a heart rate,
  • the arrangement of pixels in an image which includes color information,
  • a person’s name or date of birth,
  • reviews given to a movie, and
  • submarine plans stolen by a spy.

These are just a few examples of data and information.

Record
Data and information are often in record sets. For example: a medical record might contain a person’s name, their date of birth, smoking history, family history of heart disease, and their death from lung cancer.
Knowledge
Knowledge is a conclusion inferred or deduced from records. We often use knowledge as a shorthand for connecting data to decision-making. An example of knowledge is the statement that cigarette smoking causes cancer.
Quality
When discussing quality data, I mean data that is accurate, timely, and complete. Low quality data may be caused by low resolution in measurement, intentional skewing of data, poor communication of data, random errors in recording data, and delays in data reporting.
Privacy
This is a complicated term and one that will be discussed at length later. Generally, privacy is something that individuals but not organizations have. Privacy typically concerns the information of that individual. Often, privacy is considered a good thing and stated in terms of the right to be left alone.
Secrecy
A secret may be held by individuals and organizations. Often secrecy is considered a bad thing but there are often times when it is necessary and good. Coca Cola’s formula is a trade secret which is not only allowable under the law but recognized as a proper way to conduct business. The invasion plans for D-Day were a secret and it should be clear that this is a legitimate use of secrecy.

Why data?

Data informs much of our decision making. If I know store A sells a gallon of milk for less than store B, I can save money. We also regularly use knowledge drawn from data. Knowing that smoking is often a cause of lung cancer, a person can choose to improve their health. If we know that a college degree leads to lower crime rates, as a society, we may choose to invest in better educational opportunities.

In the stock market, the price of a share of stock is determined by what a buyer is willing to pay and what a seller is willing to accept. Buyers and sellers are acting upon how they expect the company to perform in the future. So the pricing is determined based upon expectations of earnings, valuation, rates of return, market conditions, and emotions. Having accurate information about those items is necessary to properly price a share of stock..

With the introduction of Deep Learning methods in machine learning, the need for data has increased exponentially. These algorithms require big data to get good results. In many cases, more high quality data is the difference between an algorithm that barely performs as well as the average person and one that out-performs even the experts.

All of these are examples of ways in which having data may lead to better decisions for individuals and societies. Having data is not sufficient for making quality decisions but it is necessary. Those having more high quality data have the opportunity to improve their decision-making. Data is power especially if there exists a disparity with the information commonly known.

We have unequal access to data

Using the stock market example, a person directly involved with the company will often know of dramatic changes in future earnings or big problems before the average investor. An insider acting on their information can make large profits. Insider trading on private information reduces the trust investors have in the company. If this reduction in trust becomes widespread, the markets no longer operate efficiently and profits accrue only to those with access to private information. Inefficient markets and lack of trust in the information provided by companies harms all. This has clearly been recognized by regulators and is why insider trading is a crime.

There are many other situations where data disparity is normal. Sometimes that disparity is related to having the skills to use the information, such as in the professions of medicine, law, and engineering. We rely on the skills but also on the ethical obligations of such professionals. Each of the mentioned professionals are expected to abide by standards and codes of ethics. Each of these professions also requires a license to practice and those licenses may be revoked for not following the ethics of providing fair and honest service. Again, we recognize that having an advantage in information is powerful and may be abused.

Holding information others do not have, gives the holder easy-to-abuse power over others. Blackmail is an obvious example. Militaries seek advantage by having more complete information about their opponent than the opponent has on them. Many seek advantage by holding secret information. The problem is that secret information, especially when abused, leads to mistrust, questioning of motives for actions, and guessing explanatory information.

Let’s talk about quality

Quality data is accurate, timely, and complete. We rely on data to inform our decisions. Having quality data allows us to make better decisions faster and with a better understanding of all the things that might go wrong. Quality information is the new currency for opportunity.

Anything that degrades the quality of the data might be considered harmful. Sometimes data corruption is unavoidable. A heart rate monitor may be worn incorrectly or have a power failure. The clocks used to record separate but related data might not be synchronized.

It may be difficult to even know a problem exists if we do not record certain types of information. In 2016, I read The Vanishing of Canada’s First Nations Women. This article highlighted the problem of lack of data, “Pearce enrolled in a doctoral program in law to research missing and murdered women but soon found that “there was nothing available to the public in terms of data” because police had never published national statistics.”

Then, from the Urban Indian Health Initiative report from 2018, “As demonstrated by the findings of this study, reasons for the lack of quality data include under reporting, racial misclassification, poor relationships between law enforcement and American Indian and Alaska Native communities, poor record-keeping protocols, institutional racism in the media, and a lack of substantive relationships between journalists and American Indian and Alaska Native communities”.


Urban Indian Health Institute

These articles highlight the need for quality data to determine if problems even exist. For data to have high quality, it must also be collected uniformly. Uneven data collection is a real problem especially when there are strong incentives to suppress correct reporting. Crime data is the obvious example of reporting discrepancies. Different jurisdictions report data differently and there are often incentives to under-report or reclassify certain types of crime. See Measurement Problems in Criminal Justice Research and A Journal Sentinel investigation found the Milwaukee Police Department has underreported thousands of violent assaults, rapes, robberies and burglaries and failed to correct the problem while presenting flawed statistics to the public..

These articles also highlight the intersection of data power and data disparity.

Intentionally corrupt data

Certainly any type of intentional corruption of data should be unacceptable. Modifying data seriously harms the usefulness of the information derived. Futhermore, any decisions made based upon that data are likely to be wrong.

This might seem an unlikely problem but really it is everywhere. People regularly lie when filing out survey forms. In fact, many surveys have validation questions to correct and/or eliminate dishonest responses.

The Global Positioning System (GPS) was originally a Department of Defense project. When permitted for civilian use, the signal was intentionally degraded to prevent high accuracy. More recently, competition has forced the GPS signal to provide more accurate position.

Why would data be intentionally corrupted? There are many reasons but these include, to maintain an information advantage, to cause bad decision-making, and to maintain privacy.

Privacy often limits the amount of data collected, the timeliness of the data, and the accuracy of the data. Certainly, any types of anonymization techniques reduce the completeness of the data. Definitively, we can state that privacy reduces the quality of data and intentionally low quality data can cause harm.

The dark side of secrecy

In government

The following from Schoenfeld very eloquently states my general thoughts on secrecy in government.

“A basic principle of our political order, enshrined in the First Amendment guarantee of freedom of speech and of the press, is that openness is an essential prerequisite of self-governance. Indeed, at the very core of our democratic experiment lies the question of transparency. Secrecy was one of the cornerstones of monarchy, a building block of an unaccountable political system constructed in no small part on what King James the First had called the “mysteries of state.” Secrecy was not merely functional, a requirement of an effective monarchy, but intrinsic to the mental scaffolding of autocratic rule.

Standing in diametrical opposition to that mental scaffolding was an elementary proposition of democratic theory: Legitimate power could rest only on the informed consent of the governed. Along with individuals at liberty to give or to withhold approval to their government, informed consent requires, above all else, information, freely available and freely exchanged. Official secrecy is anathema to this conception. No one has put this proposition more forcefully than James Madison, who tells us that “A popular government, without popular information, or the means of acquiring it, is but a Prologue to a Farce or a Tragedy, or, perhaps both. Knowledge will forever govern ignorance: And a people who mean to be their own Governors must arm themselves with the power which knowledge gives.” "

There are situations when secrecy is needed, most notably in cases of national security. Secrecy should be the exception and not the rule. It should require a clear statement of why something should be secret and then it should be made public as soon as the requirement for secrecy has passed.

Secrecy hides the decision-making process and consolidates power to those holding the secrets. It keeps the people uninformed, limits participation, and allows for corruption to take root and grow unchecked. We must, in order to remain in a free and functioning democracy, vigilantly limit secrecy in government at all times.

Governments have a preference for secrecy and the ability to act without the people’s oversight. Thus secrecy is a slowly encroaching action of government and must be constantly guarded against. Yes, it is increasing now as Aftergood’s article from March 2020 states, “The Department of Defense is quietly asking Congress to rescind the requirement to produce an unclassified version of the Future Years Defense Program (FYDP) database.”

Limiting secrecy applies not only to the government but also the great influencers of government and to the tools used by governments. For influencers of government, I include things like: lobbyists, donations, political action committees, and those groups or individuals that gain influence using money or shared secrets. Finally, increasing citizen participation in local governance is an excellent way to both keep aware of encroaching secrecy and to also reduce it. See more at Global Answers for Local Problems, Lessons from Civically Engaged Cities.

In the business world

Honestly, this section needs more thought and development but here goes. Some ideas may be rather controversial because they feed into other not-fully developed thoughts I have on taxation policies. Another article in the future might address that issue but that is not as much in my core competencies as data is.

There are different types of businesses ranging from sole proprietorships to publicly traded corporations and the rules applying to them often differ greatly. I will limit discussion here to publicly traded corporations. Generally, there is already a lot of transparency in these business due to the required reportings to shareholders and government but businesses can be complex which gives opportunity for secrecy. Even with that transparency, there still exist many areas for improvement. The areas for improvement mainly cover influencing actions toward government and collection and handling of individual’s data.

Lobbying should be fully disclosed. Sometimes, a business participates in lobbying to push forward legislation in an area where the business is an acknowledged expert. This is reasonable but their participation should be checked by participation for citizens or groups which might oppose the legislation. Lobbying that is of a political nature only should be prohibited. There are gray areas between purely expert and purely political and this is why their lobbying activities should be fully disclosed and scrutinized.

Charitable activities should be curtailed entirely as these are typically either marketing or lobbying activities in disguise. If executives of a business wish to be charitable, they should use their own funds to purchase the services of the business for donation and not impose their charitable preferences on their diverse shareholders.

Let’s now discuss the handling of individuals data. For some businesses, this is just a byproduct of interacting with customers but for others, this data is their lifeblood source of revenue. A company earning revenue based upon their database of individual users is not really paying for their access to raw material. They are also building barriers to entry to other companies based not upon their prowess or technical advantage but upon their access to the raw material.

The raw material is individual person’s data, often data a person would consider private. The business considers this data as their property with the rights to sell, use, or keep it secret within lawful limits.

Privacy’s offsetting benefits

A lot has been written in support of privacy and the right to privacy. In fact, until recently driven by my interests in machine learning and my understanding of the harm caused by low quality data, I was a strong supporter of the right to privacy. I put both time and money into supporting privacy rights. So, let’s examine the reasons for privacy.

I’ll base this on Solove’s Conceptualizing Privacy and on Magi’s Fourteen Reasons Privacy Matters: A Multidisciplinary Review of Scholarly Literature, shown in the References section.

Solove identifies six general types of definitions of privacy:

  1. the right to be let alone,
  2. the ability to limit access to the self by others,
  3. secrecy or concealment of certain matters,
  4. the ability to control information about oneself,
  5. the protection of one’s personhood, individuality and dignity, and
  6. control over one’s intimate relationships or aspects of life.

The problem of corrupted data is mainly about information generated by a person or information about a person and not imposing upon or controlling the person.

Let’s look at this a bit deeper. Magi lists fourteen reasons. I’ll list them here and discuss a few of them in more depth for better understanding. These fourteen reasons will be addressed further in a later section.

  1. Privacy protects from overreach of social interactions and provides opportunity for relaxation and concentration.
  2. Privacy affirms self-ownership and the ability to be a moral agent.
  3. Privacy prevents intrinsic loss of freedom of choice.

These three reasons point to impositions on our private space to affect or direct our thoughts and ability to act.

  1. Privacy allows freedom from self-censorship and anticipatory conformity and allows people to explore their “rough draft” ideas.
  2. Privacy helps prevent sorting of people into categories that can lead to lost opportunities and deeper inequalities.
  3. Privacy prevents being misjudged out of context.
  4. Privacy provides a physical space in which an individual can control the artifacts that support the narrative of her/his life.
  5. Privacy preserves the chance to make a fresh start.
  6. Privacy allows individuals to be authentic and to play appropriate roles in various contexts.
  7. Privacy supports intimacy and the building of relationships.
  8. Privacy supports the common good.
  9. Privacy protects from power imbalance between individuals and government/ organizations.
  10. Privacy supports democracy, political activity, and service.
  11. Privacy provides space in society for disagreement.

How much privacy do we have?

Over the past few years, many articles have lamented the erosion of personal privacy. Our every click may be monitored by our favorite website or social media company. With technological innovations, governments are able to track and monitor individuals at an unprecedented level. To prevent money flows to terrorist organizations, we have instituted rules and regulations to make financial transactions more traceable. The Health Insurance Portability and Accountability Act of 1996 (HIPAA) was enacted to protect the privacy of our health records. Cameras records our actions at intersections, walking down the street, and in both public and private spaces. Our current location is readily surrendered by the smartphone device we all carry. Every email, photo, and online interaction we engage in is recoded and saved for posterity.

So yes, your data, much of which you may consider private, is held by some faceless government, business, or organization. Do the faceless have your best interests in mind? I think not and that is why I fought against this intrusion for many years.

Its not really a matter of what information is out there but how consolidated and cohesive it is. The government has or can gain access to all of your data, and it may legally require that you never be informed. As this issues section from the Electronic Frontier Foundation points out, “The USA PATRIOT Act broadly expands law enforcement’s surveillance and investigative powers and represents one of the most significant threats to civil liberties, privacy, and democratic traditions in US history.”

The proposal

But the algorithm has overcome all.

Some years ago when cameras were initially being installed in many public spaces and were being monitored by public officials, or more likely by algorithms, I saw a piece that suggested the only way to achieve detente was that viewing of the cameras should be equal access to all.

I propose the creation of a public data lake to hold this information. There could be different sub-module lakes that include financial or health data. Read access to the lake would be credentialed with some sort of credit/debit scheme. Changes and updates to the data would be though a pull request method. Initiation and oversight of this data lake would be done by some sort of government, citizen, and business consortium.

How is this possibly a good idea?

There is the problem of theft or use of information for nefarious purposes.

Share the database such that everyone has access to the data.

Problems with localism and fragility?

What to do here?

Impacts on privacy

Let’s look at this a bit deeper. Magi lists fourteen reasons. So let’s address each in turn.

  1. Privacy protects from overreach of social interactions and provides opportunity for relaxation and concentration.
  2. Privacy affirms self-ownership and the ability to be a moral agent.
  3. Privacy prevents intrinsic loss of freedom of choice.

Quality data collection should not affect these three reasons. If we are speaking of the intrusion of unwanted people into social interactions, this may be a problem. Generally though, this is a problem that can be addressed by other legal means that might be supported by data. Stalking is an example of this. A stalker might try to inject themselves based on available information but their location might be legally used to prohibit and prosecute their actions.

  1. Privacy allows freedom from self-censorship and anticipatory conformity and allows people to explore their “rough draft” ideas.

Without absolute privacy, people often engage in self-censorship and anticipatory conformity. Some self-censorship is beneficial but too much is harmful to society. Decreasing overall privacy will increase self-censorship, therefore; we will need mechanisms to correct this imbalance. This imbalance may be somewhat offset by clear and strong laws to protect against official or societal curtailing of thoughts and ideas. We might also engage in positive reinforcement of diversity. Finally, the creation of strong anonymous channels may allow for the appropriate expression of ideas without the oppression of anonymous gang-banging.

A question to consider is: are we able to measure how much is lost to self-censorship and conformity? If the loss is great and we are not able to mitigate that loss, that is a point upon which to reinstate strong data privacy.

  1. Privacy helps prevent sorting of people into categories that can lead to lost opportunities and deeper inequalities.

There may be some sorting of people into categories but at the same time opportunities will likely remain the same and inequalities should be lessened. In fact, the reduction of disparity and inequalities is one of the benefits of good data.

  1. Privacy prevents being misjudged out of context.

Initially, a person’s data will be judged out of context. Having context to the data is generally an improvement such that the data will seek context. People with access to the data may not exercise the same discretion about including context with data. Perhaps this is an aspect that will take a little bit of time to find equilibrium.

  1. Privacy provides a physical space in which an individual can control the artifacts that support the narrative of her/his life.

An individual will not be able to control the digital artifacts in their life. False narratives will be very difficult to support. At the same time, true narratives will be easier to support and recall as the data is readily available to the person. If we talk only about physical spaces, then; improved data should have little impact.

  1. Privacy preserves the chance to make a fresh start.

Higher quality data will likely make it more difficult to make a fresh start. We have already seen that just based on the longevity of static data. What was once forgotten is now stored. There may be some solutions for this which include legislation that rolls certain types of data in archival storage. Recently, some AI algorithms have taken steps to forget certain aged information in order to improve predictions.

  1. Privacy allows individuals to be authentic and to play appropriate roles in various contexts.

This basically states that how I behave and who I am depends on the context in which I am acting. There will be changes in this mutability since the context, often other people’s image of you, will be better informed of your overall role. Now when we enter a new context, we often assume other people have little to no knowledge of us and this allows us develop our relationships unimpeded. This may or may not be true.

I think that we currently enter new contexts with uncertainty of what others know about us. With more readily available information, we could enter a new context with less anxiety, presuming that they already know some things about us but are willing to judge us based on our new context. In other words, I do not think expanding data access degrades this privacy.

  1. Privacy supports intimacy and the building of relationships.

There may be a small effect upon this reason. It will be easier to find information about a person but the information is not imposed into the relationship.

  1. Privacy supports the common good.
  2. Privacy protects from power imbalance between individuals and government/ organizations.
  3. Privacy supports democracy, political activity, and service.

Quality data collection should improve these social ends. In fact the expansion of quality data is intended to improve these social ends.

  1. Privacy provides space in society for disagreement.

This is closely related to point number 4 and I think may be treated similarly.

Work in progress.

References