The Presidential Chair in UCLA IS will speak at a book signing for “Big Data, Little Data, No Data: Scholarship in the Networked World” at UCLA on Feb. 25.
In her new book, “Big Data, Little Data, No Data: Scholarship in the Networked World” (Cambridge: MIT Press, 2015), Christine Borgman seeks to demystify the differences between “big” and “little” data. More importantly, the professor and Presidential Chair in UCLA’s Department of Information Studies illustrates through a theoretical framework of people, practices, technologies, institutions, material objects, and relationships, that the value of data, whether big or little, or even nonexistent or inaccessible, lies in the capability to share them between people and over time. Data sharing is difficult, incentives to share are minimal, and data practices vary widely across disciplines.
“What inspired this book is my recognition that research data were being treated largely as technological objects, particularly in information science policy, whereas in the social sciences, we understand how much miscommunication occurs in how people transfer ideas and data,” says Borgman.
Through case studies of data practices in the sciences, the social sciences, and the humanities, “Big Data, Little Data, No Data” delineates the implications of Professor Borgman’s findings for scholarly practice and research policy. She says that she aims to bring together the technological, social, and organizational aspects of data management policy into one place.
“Many stakeholders are involved in research data – not only scholars and students, but also funding agencies, publishers, university administrators, librarians, archivists, business people, government policy makers, and other players who would wish to use the data that researchers produce,” says Professor Borgman. “Each of these stakeholders has different concerns. For example, the library, information, and archival worlds are concerned with managing scholarly objects, which now include data. Information managers from these fields tend to think, “We know how to manage publications, so we’ll treat [data] as publications,” which leads to misunderstanding across that entire arc.”
Borgman is concerned whether “librarians, archivists, and publishers comprehend the many ways in which data are different from publications.”
“Similarly, technologists tend not to understand how socially embedded data really are,” she says. “Those who study the social sciences also need to understand how data are treated like technological objects in a policy domain. People trying to manage data need to realize the array of perspectives in these other arenas.”
Borgman, who studies a number of research teams in astronomy, earth sciences, and other fields, says that the concept of “big data” is largely hyperbole, much of it driven by the perspective of business and industry.
“Big data is not a matter of numerical size; it’s something that’s big relative to your ability to handle it,” she notes. “Just as you can drown from two tablespoons of water in your lungs, you can drown in small amounts of data if you don’t have the tools and the skills to handle them.”
Borgman says that having the right data is usually better than having more data, that little data can be just as valuable as big data, and that both pose unique challenges.
“In the book, the concepts of ‘big data’ and ‘little data’ are set up as points of departure,” she says. “It is, in some sense, a qualitative distinction between highly numerical and computational approaches. When you’ve got terabytes of data coming off an astronomy instrument, you can’t look at every little observation. You’ve got to rely on computational models. But when you are out in the field doing an ecology study [from] samples of dirt and water, you have to inspect every little thing. There’s a fair amount of qualitative judgment that’s involved in evaluating little data.
“You also can have small amounts of quantitative data. There’s a well-known characterization from the business world, saying data can be big in terms of volume, velocity, or variety. If you think about astronomy, those data are big in volume and velocity, but they’re low in variety because a standard set of observations is are captured. You can also have large amounts of qualitative data. For example, when you look at educational research data [from] interactions between teachers and students or between pairs of students, those would be considered little data. They are low volume and velocity, but with high variety.”
Borgman says that there is also the dilemma of “no data,” when it is assumed that certain kinds of data exist, but are not available for use because they are withheld for proprietary reasons, or are only accessible with the use of obsolete or unavailable software.
“I have a good example of ‘no data’ from someone who works in transportation engineering,” says Borgman. “These engineering researchers are trying to plan cities and traffic, a particular problem in Los Angeles. To get the full picture, they need data on the movement of cars, trucks, taxis, and all forms of public transport. Government agencies such as cities are required to release most of these kinds of data. That’s why you can get apps that tell you when the buses are coming, because cities release bus data and what they know about cars on the road.
“Taxis are regulated by the government, so you can get data on how many taxis are on the road and their traffic patterns. However, you cannot get data from Uber or Lyft, or from the trucking industry, because those are treated as proprietary data under current regulations. As a result, transportation engineers are more limited in the models they can build of city traffic flow. The more popular Uber and Lyft become, the less reliable the models for public data may become.”
“Another good example of the difficulty of exchanging data between groups comes from a large project we are beginning to study, in which researchers are trying to map the faces of zebrafish, how they’re formed similarly to faces of primates and humans, and to understand where the facial deformities like a cleft palate and far worse, come from,” says Borgman. “But these three groups don’t agree on how a face is divided, so it is very hard to map how a jaw, nose, or forehead forms. Each of the three communities has its own standards and practices. To merge data across these groups, their descriptions would be incompatible with their respective communities. These researchers have been working on this project for years and the whole point of it is data sharing, but the merging models across classes of species may be an intractable problem for them.
Professor Borgman says that “Big Data, Little Data, No Data” seeks to explain what aspects of data use are technical, social, conceptual, and political.
“I want people to understand that data handling is not just a problem for the sciences, but one that also spans the social sciences and humanities,” she says. The point I’m trying to make in the book is that we’ve got a really mixed landscape of what data are required or recommended for release and which aren’t, who has access to them, and what kinds of skills and tools they need to have access to them.”
A panel discussion, book signing, and celebration for “Big Data, Little Data, No Data” will be held on Wednesday, Feb. 25, 4-6pm, in the Presentation Room (11348) at UCLA’s Charles E. Young Research Library. Speakers will include Professor Borgman, Jonathan Furner, Chair of UCLA’s Department of Information Studies; Virginia Steel, University Librarian; Kent Wada, Strategic IT Policy and Chief Privacy Officer; and Janice Reiff, Professor of History and Chair, Academic Senate, 2013-14.
This event is co-sponsored by the UCLA Library and UCLA Ed & IS, and is free and open to the public; light refreshments will be served. For more information, please contact firstname.lastname@example.org or call (310) 206-0375.
For more information, click here.
Photo by Todd Cheney