Until Data Do Us Part: Controversy and Progress in U.S. Big Data Policy

Written by Katia Grütter, Embassy of Switzerland in the United States

Big Data is controversial – and that might be one of its few known components in the midst of its ever-changing vastness. Only recently, the European Court of Justice ruled that the “Safe Harbor” data-sharing pact between the European Union and the United States is invalid – the agreement and framework that previously allowed more than 4,000 companies to transfer data from the EU to their servers in the United States. The main reason underlying the court’s decision was that the agreement did not completely safeguard the EU’s fundamental rights to privacy with regard to U.S. law enforcement agencies.

The argument revolving around the transfer, storage, and protection of Big Data is an ongoing one in the United States and an indicator of the sensitivity of the subject as well as its political importance. After whistleblower Edward Snowden had revealed in June 2013 that the U.S. National Security Agency (NSA) had secretly worked on a worldwide espionage network that targeted electronic communication from citizens to high-ranking government officials all over the world, the connotations of Big Data suddenly took on a rather negative turn. Especially the European public heavily criticized the U.S. government for allowing such an infringement on private data for the sake of national security, and the fact that President Barack Obama defended the program further reinforced mistrust.

Illustration by Sarah A. King

Illustration by Sarah A. King

However, even though the Washington Post had helped publish some of the classified information, it forgivingly named Obama the “Big Data President” a few weeks later due to his pre-controversy engagement in taking on questions surrounding Big Data. Indeed, the Obama Administration is responsible for launching data.gov in 2009, an extensive catalog of public data from the U.S. government that is meant to make governmental information more transparent and accessible. In addition, the Administration launched a set of “My Data” initiatives, which were aimed at facilitating American citizens’ secure access to their health, tax, energy usage, and student loan information. Obama further announced another Big Data Initiative in 2012, which dedicated $200 million to new Research & Development investments and promoted collaboration between the National Science Foundation (NSF), the National Institutes of Health (NIH), the Department of Defense (DoD), the Department of Energy (DoE), and the U.S. Geological Survey (GSGS). On the one hand, the initiative sought to advance and employ technologies to better manage and analyze Big Data, and on the other, it proposed to expand the workforce so that these tasks could be carried out. Moreover, in May 2013, the President released an executive order that demanded all future government information to be open and machine-readable by default.

The immensity of digital data that is available nowadays becomes evident when it is compared to the range of “traditional” data that the U.S. Census collects in the form of censuses, surveys, or administrative records. As Cavan Capps, Big Data Lead for the U.S. Census impressively points out, Big Data differs in five respects decisively from census data: Firstly, it needs galaxy-type prefixes to describe its size (e.g. measured in tera-1012 up to yotta-1024); secondly, Big Data can be retrieved almost every second and the set will be different each time; thirdly, Big Data is often a byproduct that is not demanded explicitly – very unlike official statistics; fourthly, it comes in a digital, “cheap” form; and lastly, Big Data is far from being a careful data collection, but is rather spiked with many unknowns. Considering these properties, the possibilities and chances of Big Data seem manifold, but equally delicate and potentially dangerous.

A map of the internet, circa 2003. Image: The Opte Project

A map of the internet, circa 2003. Image: The Opte Project

President Obama further pursued the topic by calling for a 90-day review of Big Data and privacy in early 2014 and by installing a work group under John Podesta, his Counselor at the time. As a consequence, in May of the same year, two comprehensive reports were published, one discussing the practical implications of Big Data and the other one providing a technological perspective. The former document held:

While big data unquestionably increases the potential of government power to accrue unchecked, it also hold within it solutions that can enhance accountability, privacy, and the rights of citizens. Properly implemented, big data will become an historic driver of progress, helping our nation perpetuate the civic and economic dynamism that has long been its hallmark.”

More specifically, the report emphasized the many benefits of Big Data, for example how it can help develop predictive medicine by analyzing large-scale health data sets or how it creates understanding for the way students move through a learning trajectory at university, but also how it can be of use in the private sector by driving the development of products and services for citizens’ – consumers’ – daily lives. At the same time, the report also found that “big data analytics have the potential to eclipse longstanding civil rights protections in how personal information is used in housing, credit, employment, health, education, and the marketplace”, pointing towards the dangers of ever-growing data sets. The technological review by the President’s Council of Advisors on Science and Technology (PCAST) provided more concrete insight on this matter and explained the shortcomings with regard to volume, variety, and velocity – the 3 Vs – of Big Data:

“The challenges to privacy arise because technologies collect so much data (e.g., from sensors in everything from phones to parking lots) and analyze them so efficiently (e.g., through data mining and other kinds of analytics) that it is possible to learn far more than most people had anticipated or can anticipate given continuing progress. These challenges are compounded by limitations on traditional technologies used to protect privacy (such as de-identification). PCAST concludes that technology alone cannot protect privacy, and policy intended to protect privacy needs to reflect what is (and is not) technologically feasible.”

Thus, the discernible message was that government policy needs to fulfill the duty of guarding privacy where technological protection has reached its limit in the face of the immensity of available data and that in this context the U.S. government needs to take on the role as a data protector.

It is understood, however, when the U.S. government talks about the dangers of Big Data and privacy, that it is concerned with protecting extensive sets of private data from outside impacts, but generally excludes potential advances coming from within the surveillance agencies in their own ranks. This is made clear with the statement that “Big data will naturally – and appropriately – be used differently in national security”. Even though President Obama installed an independent Review Group on Intelligence and Communications Technologies to analyze NSA practices after the leak in 2013 and their report produced 46 recommendations on how to change the situation, almost no political or legal consequences followed. After all, the USA Freedom Act came into effect in June 2015, regulating the storage and handling of data, prescribing amongst other things that the U.S. government must file propositions with the Foreign Intelligence Surveillance Court (FISC) to analyze and verify data, and that the NSA is no longer allowed to hold acquired information after 180 days.

On the business side, the NSA controversy provoked an increase in security standards: companies like Google, Yahoo, and Facebook countered by investing in data encryption technologies. In North America, encrypted web traffic suddenly rose from 2.29 percent of all peak hour traffic to 3.8 percent, rising more than 50 percent within a year, as a research piece by network equipment company Sandvine showed. Overall, while Big Data does not quite cease to be afflicted with uneasiness when used in the context of government policy, it is viewed in a generally positive light with regard to business and innovation. US companies generously invest in Big Data technologies, such as IBM, which announced in 2015 to dedicate $3 billion to establish a new Internet of Things unit (mobile and virtual machine-to-machine communication), aimed at building a cloud-based open platform designed to help clients and ecosystem partners to build their own solutions.

Big Data remains a hot topic in the United States, both economically and politically. While Forbes estimates that the use of social media and social data might determine who the next president is going to be, Obama keeps dedicating his attention to Big Data and seems to already have proven this very theory. In February 2015, he appointed the U.S. government’s first Chief Data Scientist, Dr. DJ Patil, pointing out that technical talent needs to be incorporated into the federal government “to harness the power of technology and innovation to help government better serve the American people.”


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s