Understanding Data ideologies, extraction and surveillance: The trap of Big-Data Dystopia

Data behaves in different ways — — the context, platform, economy, techno-social built, algorithmic structuration and archival potential determines data usage and outcome. In the broad spectrum of media anthropology, data is not only complex to fathom but also tricky to study. This deliberative note, raises questions around how data is increasingly used to represent or describe specific groups or communities, its unapologetic capitalist means and on how big-data is being precariously generalized, removing a person’s context and sensitivities, to cluster assorted individuals who have no idea how their digital footprints are being surveilled, recorded and invested to create a big-data driven technological infrastructure.

Data and Media Anthropology: A pertinent exploration

Nightingale (1993) once asked — “What’s ethnographic about media ethnography?” mainly pertaining to studying media audiences (Nightingale, 1993). Questions have been raised, and classical ethnography’s boundaries have been broadened from time to time. It’s been a while since the shift to multi-sited ethnography through television and audience studies that redefined the intimate connection to a particular site. It is further compounded now, especially over the last decade, within the digitally networked world; as cyberspace is both the place and medium through which research is undertaken (Murphy, 2011).

Traditional Anthropology’s epistemological attachment to fieldwork, thick descriptions, participant observation and elaborate field notes, resonates with the various studies undertaken within the space of media anthropology. Disciplinary boundaries do shift, the interdisciplinary immersions may make digital media ethnography perplexing. However, if there is an allegiance to observe, immerse within a specified digital platform for a considerable amount of time, maintain a reflexive journal, understand the cultural patterns of digital embodiment and communicate an understanding of that space the researcher negotiates with, why move away from embracing such efforts as ethnography? The space has a digital marker, just like how a site has a physical marker in the material world. Contextualizing digital domains, spending sufficient time to decipher the site and providing thick descriptions about the multi-sited nature of the webbed world, only enhances research rigor.

Ethnographers present ethnographic data, and ironically, the online world makes researchers indulge in a space where data is text, data can be produced, manipulated, crowd-sourced or algorithmic; media ethnographers then present their ethnographic data, which builds an understanding about datafication within the digital world.

Data is mediated through technology, and yet it foregrounds culture in its own way. It produces digitally mediated communities. Datafication involves understanding ideology, culture, power, identity, privilege, representation, caste, gender, rituals, agency, politics, rights…. they are still relevant themes and will remain so. The cyberspace is our lived reality too, not an imagined fantasy. Studying the digital space, especially to understand data, does have challenges, and yet the spirit of ethnography, particularly media ethnography remains intact.

Digital sites, loop back into the material world, data for instance is intricately connected to policy formulations and governance mechanisms. Big-data, especially is increasingly used in bureaucratic decision making and thus needs to be studied. Its relevance in our contemporary times, cannot be overlooked and scholarship needs to prioritize the big-data spectrum.

Media anthropology scholarship gives specific emphasis to immigrants, diasporic groups and persons in exile and there is a lot of data that could go on to be considered empirical, descriptive or factual. Like immigrants or persons in exile, when considered as a categorization is a powerful one, there are other such thematic reckonings that big data could pave way to, arguably it already has!

Ideological Masking of Data and Communities

In the world of research, specifically in positivist or scientific research, data is deterministic, it becomes the backbone of a research effort and there is considerable acceptance to such data. In social sciences and humanities, data may be construed with contextual applicability. In every discipline, the understanding of data varies. In traditional ethnography, field immersions or fieldwork produces rich insights and data. Similarly, navigating through the digital platform which is also essentially data can be mined to produce insights or trends. In addition, one’s preferences, attitudes, ideology, opinion or musings, that gets recorded in social media platforms, for instance, a thematic categorization in Twitter, familiar to us as a hashtag (#), is also data. The advent of data-driven arguments in social media spaces have also led to a trend of active interpreting, re-interpreting and even misinterpreting data. Data is imagined as an empirical, factual or qualitative evidence, if one wants to refute any data or use it to further their argument, digital social media users do so to support their set of ideologies. Data can also have ideologies, be ideological and the cultural and institutional forces can shape a community’s data attributes (Poirier, Fortun, Costelloe-Kuehn, & Fortun, 2020).

“Data ideologies are thus a complex set of assumptions and understandings, both tacit and explicit, that form a meta-discourse about data, how it functions, what needs to be done to and with it, who should handle it and how, and why it is valued — and might be rendered still more valuable” (Poirier, Fortun, Costelloe-Kuehn, & Fortun, 2020, p. 214).

Data can be shaped, reshaped and sometimes programmed to build an understanding or perpetuate a belief. Mapping the bodily self and digitally embodied self, isn’t a new thing. However, the ideological masking of individuals based on a few tags or keywords, that one may have knowingly or unknowingly left as a digital imprint, gets mapped back to the individual. Those who are part of an individual’s network, then get mapped as a virtual networked community (within the larger network), who will also be exposed to such tags, which are often cultural or political and also inherently ideological. The pattern mapping and algorithmic generation can be tweaked to suit political, religious or cultural groups. These communities, have participants who have no idea that their ‘like’ or ‘share’ in such spaces, could lead to larger interpretations and become a micro-unit within a big-data cluster.

Digital Footprint as Raw Data: The impenetrable construction of Big Data and the generalization scare

Digital space and social media platforms are interesting yet a complex space to explore. This is intensified when we consider the space of data and data surveillance, in which big data is the ‘raw material’, which yields revenue through ‘mechanisms of extraction’ (Zuboff, 2015). The most interesting part about big data and related models, is the generalizability. One is essentially, separating a person — with perhaps an authentic digital identity or marker — from the person’s context, then adding that person to a cluster of such individuals.

Now collectively, patterns are deciphered, trends are predicted and such big data analytics is then bundled as data. This data not only can generate revenue but can also control opinions, institutions or markets. At this juncture, considering issues of immigration and citizenship could help decipher the mirage of big data. The magnitude of how sensitive information can get compounded as big data, can be comprehended when immigrants or migrants are considered as a community. Zuboff (2015) also highlights how such mechanisms maps and controls ‘persons in exile’ or migrants and immigrants, challenging democratic norms (Zuboff, 2015, p. 1). Immigrants are digital beings too, often tagged as digital immigrants in big data clusters, who potentially leave digital footprints. These digital footprints, can be understood as critical data markers that help explicate their movement, their consumption patterns, preferences, physiological parameters, media usage and even political or ideological leanings. Data is then used by countries to navigate and manage people as clusters, who could be labelled as immigrants, asylum seekers, immigrant communities classified based on religion or other parameters and as illegal migrants and so on.

An immigrant’s connection with home country, participation in diasporic events and their imagined digital belongingness can all be generalized too with big-data generated through data surveillance. Likewise, any community can be categorized to create specific data sets, which produces an assemblage that necessarily generalizes a community or a geo-cultural group that gets spawned as algorithmic sets (like templates) that can then be coded in various digital spaces. The source code as it is often known in programming circuits, becomes the base on which related data infrastructure could be built on. The racist or discriminatory results search aggregators produce could also be a result of such algorithmic generalizations.

Techno-Optimism and Data Extraction: Deliberative and participative inputs crucial before policy formulations

While technology has a role to play in effective governance, it is also equally pertinent to assess how uninformed data dependency could do more harm than good. Big data, perpetuates the culture of collecting data, like how it has normalized shoppers giving their mobile number, residential address, date of birth or anniversary details to salespersons or promotional agents in a super-market. What happens to the data then, is a question often considered, yet this culture is persistent.

Techno-optimism is often imagined and retailed to build technological infrastructure, which can help the State in governance. When one connects technology to a nation’s sovereignty, then highlighting any foreseeable peril, brands the person undesirably (as anti-national or a nihilist) and makes it difficult to address the anomalies. Often state apparatuses, control or access data citing border control and protection or to guard one’s sovereignty, in such scenarios, any perspective put forth, even as a good-willed citizen may not hold a chance to be heard.

Data is extracted continually and it will communicate certain trends that could be inherently biased, yet since it concerns something sensitive, for instance if we consider- terrorism, it then becomes an impermeable zone. Data pertaining to “terrorism” is collated, built on an initial frame of logic, it eventually gets codified as a program or code. The logic could have been fragile, it could have been something that is transient, yet it becomes a norm. It creates a data deluge, that human intervention then has no scope. To fathom through such voluminous data-sets or detect biases, coding of normal actions or movements as abnormal or vice versa and using such inferences to fuel decision-making then traps the State into a delusional mode of governance, often mistakenly perceived as data-driven and impartial. While ‘data deluge’ could aid in specific detections or predictions based on data-sets, it however exponentially ‘normalizes surveillance’; while it may seem like it could help control terrorist movements, its “prediction and detection rates are extremely low and the rate of false positives is enormous, wrongly accusing and breaching the human rights of many innocent civilians” (Leurs & Ponzanesi, 2018). If a country does not want to factor in the ‘false positives’, it onsets humanitarian recession instead of reformist humanitarianism. Human Rights cannot and should not be brushed under the carpet.

Big data driven algorithm is hungry for data. This kind of meta-data generation, results in hegemonic data ideologies, certain sets are auto-generated, and certain crowd-sourced while others are consciously produced. In the South Asian context, Arora (2016) highlights how databased legitimizations can lead to stereotyped identities built through ‘reenacting, reproducing, and reinforcing prevailing cultural codes’, and how then it becomes impossible to separate “online and offline values, social practices, and power relations as they mutually reconstitute each another” (Arora, 2016). While the stereotypes, cultural codes and ideologies permeate through digital spaces, big-data captures and strengthens such identities. The space is also abound with consent, access and privacy issues. Even if an individual produces content online, understands this space and wishes to address such anxieties, writes about it extensively, the individual will not be noticed or heard unless there is an operative privilege he or she enjoys.

Influential narratives in the digital space are produced only by those who have certain set of privileges, so it is important to see it from the prism of algorithmic oppression. The potential that tags and keywords have, to be exploited by those who have power, especially bureaucratic, political and capitalistic powers, paves way to algorithmic manipulation and assembly. The role of agency, the intertextual flow of text and algorithmic misrepresentation also has to be considered when countries rely on big data for decision making.

Policy formulations, especially the Personal Data Bill, 2019 in the Indian context, needs to factor in how supervised and unsupervised machine learning varies, the former can be fathomed with effort, while the latter is essentially a black box. Developing a deliberative and participative environment in policy formulations could tone down the intensity of data-masking, ideological misrepresentation of big-data and stereotypical structuration of algorithmic perpetuation. Bureaucrats, technical and data science experts, analyst and analytical practitioners, coders, academicians, other citizens who want to share what they know, along with policy formulators can craft and draft effective data governance mechanisms. In the world of neural nets it may still not be fool-proof but it could have enhanced pragmatic value and effectiveness. It could be a stepping stone to data justice.

Big data, technological surveillance and algorithmic mediations perpetuating specific data ideologies is definitely a bewildering zone, however if we endure through its initial dystopian interface, we will hopefully make sense of it better. This standpoint I admit, is expectant yet is entrapped within the bigger trap of big data dystopia.


Arora, P. (2016). Bottom of the data pyramid: Big data and the global south. International Journal of Communication, 10, 1681–1699.

Leurs, K., & Ponzanesi, S. (2018). Connected migrants: Encapsulation and cosmopolitanization. Popular Communication, 16(1), 4–20. doi:https://doi.org/10.1080/15405702.2017.1418359

Murphy, P. D. (2011). Locating media ethnography. In V. Nightingale (Ed.), The handbook of media audiences (pp. 380–401).

Nightingale, V. (1993). What’s “ethnographic” about ethnographic audience research. In G. Turner (Ed.), Nation, culture, text: Australian cultural and media studies (pp. 164–177). London.

Poirier, L., Fortun, K., Costelloe-Kuehn, B., & Fortun, M. (2020). Metadata, Digital Infrastructure, and the Data Ideologies of Cultural Anthropology. In J. W. Crowder, M. Fortun, R. Besara, & L. Poirier (Eds.), Anthropological Data in the Digital Age: New Possibilities — New Challenges (pp. 209–238). Switzerland: Palgrave Macmillan.

Zuboff, S. (2015). Big other: Surveillance capitalism and the prospects of an information civilization. Journal of Information Technology, 30(1), 75–89.

Vasupradha Srikrishna teaches at the Department of Communication, Madras Christian College. She’s also the founder of Research Culture™ (researchculture.co), a research consultancy firm that caters to both industry and academia. She can be reached at srikrishnavasupradha@gmail.com

Blog of the Media Anthropology Research Collective — South Asia