Since 2009, when smartphones appeared into our lifes, numerous mobile applications for qualitative research have been put out in the market and at our disposal. These apps are revolutionizing the ways to conduct researches, changing the way we make questions, how we select participating subjects for a study, or how we produce and store data. This penomena can be named as Smartphone ethnography.
These platforms offer the usual functions of sending notes, photos, videos or audios. But they are also specifically designed for qualitative research, allowing a comfortable communication between researchers and participants, facilitating the comparison and categorization of data and presenting all of it in comfortable and intuitive interfaces. Forget the times when you had to walk around with your hands full of notes, documents and folders! Now you can have all your research material in your pocket.
Some mobile applications for qualitative research
Here you can find some mobile applications for qualitative research. These are just a few examples, but there are many more. If you decide to try them, you will have to make your own journey through this particular world to find the one that best meets the needs of your study. You have to understand that each of these apps has some particularity. Some are thought to be used in market researches, to study the experiences of users with a product. Others are quite expensive, so they are probably not used by many particulars, but by people who work in university departments, private companies, etc …
In 2009 a group of ethnographers, film makers and agency planners quietly created the world’s first mobile ethnographic research platform. They present their product as: “An ethnographer in your pocket. It’s intimate, in-the-emotion and in-the-moment. But only if users capture when you need them to and instinctively. Whether a research participant or a client side R&D manager, our app allows you to send content without fuss and quickly. Our app will never get in the way of an event being captured”.
The platform allows you to send content of all types and sizes quickly It also simplifies the task of organizing and categorizing data. They have pricings adjusted to your needs, from monthly pays to day rates. And they have free trials for students!
After 414 projects in 36 countries using their Smartphones to conduct qualitative research, a group of anthropologists decided to create a company that makes digital research tools available to others.
In 2011 OvertheshoulderCompany came to light. “The future does not fit into the containers that held the past. What we mean is that there is a real change, great techniques, and we challenge ourselves and our customers to do new things. It’s not about making collages with the images that people send us and that’s it. Now we are in their pockets! They have a camera for making instant videos, etc. Explore new ways to do more powerful research”.
In 2008 a group of friends started a research project called myResearchFellow. A few years later, in 2014 this project becomes ExperienceFellow.
You just have to create a project and invite your participants (that can enter through a QR code). Participants can be grouped into segments by age, gender, and other classifications of your choice. Your participants can start documenting their experiences through the mobile app. You can compare the data of the participants in real time from any mobile device.
This is your choice if your preffer your platform to be in French, German or Dutch. In the same line as the previous one, this company offers applications of qualitative research mainly focused in market studies. (They are also available in English).
Robbie Blinkoff, a professor at Goucher University has created a mobile application to help his students capture their study experiences abroad from an anthropological perspective. This application offers students a way to structure and document their experiences through the categories: Daily life, Aha Moments, Cultural walks and Reflections. Students post photos and stories of culture shock, confusion, destroyed assumptions and many other fascinating findings. Unfortunately, from what I’ve found, this brilliant application is not available to the general public, but I thought it was such a fantastic initiative by this Goucher College professor that I wanted to share it with you.
Here you can see him talking about the project in Ted Talk:
Certainly, anthropology can not remain outside the revolution of Smartphones. They are a human reality. Shaped by humans, this technologies are shaking and shaping the ways we do things in our lifes and communities.
You dont need to have an app like the ones listed here. Simply by incorporating your own smartphone and tablet tools into your research, you are also introducing smartphone ethnography in a way. Undoubtedly, new formats, new techniques and new methodologies, will generate new epistemologies. Time will tell. What we must not forget is the importance of the investigator. Remember, this apps can help us with our researches, but it still our task to seek insights and develop answers and solutions. The data is not only the entries that the participants send. The decision itself to send an entry is also “data”. So understanding individuals is just as important as understanding their input.
These are just a few examples. Pleease share with us if you know any others or you have experienced using them before!
As more and more aspects of everyday life are turned into machine-readable data, researchers are provided with rich resources for researching society. The novel methods and innovative tools to work with this data not only require new knowledge and skills, but also raise issues concerning the practices of investigation and publication. This book critically reflects on the role of data in academia and society and challenges overly optimistic expectations considering data practices as means for understanding social reality. It introduces its readers to the practices and methods for data analysis and visualization and raises questions not only about the politics of data tools, but also about the ethics in collecting, sifting through data, and presenting data research. AUP S17 Catalogue text As machine-readable data comes to play an increasingly important role in everyday life, researchers find themselves with rich resources for studying society. The novel methods and tools needed to work with such data require not only new knowledge and skills, but also a new way of thinking about best research practices. This book critically reflects on the role and usefulness of big data, challenging overly optimistic expectations about what such information can reveal, introducing practices and methods for its analysis and visualization, and raising important political and ethical questions regarding its collection, handling, and presentation.
Rees explores the opportunities and risks that cutting-edge science presents.
02/21/2017 10:24 am ET
Lord Martin Rees is an astrophysicist and the former master of Trinity College, Cambridge. He sat down with The WorldPost for a wide-ranging interview, which has been edited for clarity and brevity.
Alexander Görlach: Out of all great transformations we are going through, from climate change to artificial intelligence to gene editing, what are the most consequential we are about to witness?
Martin Rees: It depends on what time scale we are thinking about. In the next 10 or 20 years, I would say it’s the rapid development in biotechnology. We are already seeing that it’s becoming easier to modify the genome, and we heard about experiments on the influenza virus to make it more virulent and transmissible. These techniques are developing very fast and have huge potential benefits but unfortunately also downsides.
They are easily accessible and handled. It’s the kind of equipment that’s available at many university labs and many companies. And so the risk of error or terror in these areas is quite substantial, while regulation is very hard. It’s not like regulating nuclear activity, which requires huge special purpose facilities. Biohacking is almost a student-competitive sport.
I am somewhat pessimistic, because even if we do have regulations and protocols for safety, how would we enforce them globally? Obviously we should try and minimize the risk of misuse by error or by design of these technologies and also be concerned about the ethical dilemmas they pose. So my pessimism stems from feelings that what can be done, will be done ― somewhere by someone ― whatever the regulations say.
Görlach: Do you fear that this could happen not only in the realm of crime ― if we think of so-called “dirty bombs,” for example ― but could also be used by governments? Do we need a charter designed to prevent misuse?
Rees: I don’t think governments would use biotech in dangerous ways. They haven’t used biological weapons much, and the reason for that is that the effects are unpredictable.
‘Over the next 10 or 20 years, the greatest transformation we are likely to live through is the rapid development in biotechnology.’Lord Martin Rees
Görlach: That brings recent Hollywood blockbusters like “Inferno” to mind, where one lunatic tries to sterilize half of mankind through a virus.
Rees: Several movies have been made about global bio-disasters. Nevertheless, I think it is a realistic scenario, and I think it could lead to huge casualties. Disasters such as the one from “Inferno,” as well as other natural pandemics, could spread globally. The consequences of such a catastrophe could be really serious for society. We have had natural pandemics in historic times ― the “black death,” for example. The reason that governments put pandemics ― natural or artificially produced ― high on their risk register is the danger of societal breakdown. That is what worries me most about the possible impact of pandemics. This is a natural threat, of course. The threat is aggregated by the growing possibility that individuals or small groups could manufacture a more lethal virus artificially.
Görlach: So when speaking of the age of transformation, aspects of security seem paramount to you. Why is that?
Rees: We are moving into an age when small groups can have a huge and even global impact. In fact, I highlighted this theme in my book Our Final Century, which I wrote 13 years ago. These new technologies of bio and cyber ― as we know ― can cause massive disruption. We have had traditional dissidents and terrorists, but there were certain limits to how much devastation they could cause. And that limit has risen hugely with these new bio and cyber-technologies. I think this is a new threat, and it is going to increase the tension between freedom, security and privacy.
Görlach: Let’s look at another huge topic: artificial intelligence. Is this a field where more uplifting thoughts occur to you?
Rees: If we stay within our time frame of 10-20 years, I think the prime concerns about A.I. are going to be in the realm of biological issues. And everyone agrees that we should try and regulate these. My concern is that it will be hard to make effective regulations. Outside biological consequences, in the long term, of course we need to worry about A.I. and machines learning too much.
In the short term, we have the issue of the disruption of the labor market due to robotics taking over ― not just factory work but also many skilled occupations. I mean routine legal work, medical diagnostics and possibly surgery. Indeed, some of the hardest jobs to mechanize are jobs like gardening and plumbing.
We will have to accept a big redistribution in the way the labor market is deployed. And in order to ensure we don’t develop even more inequality, there has got to be a massive redistribution. The money earned by robots can’t only go to a small elite ― Silicon Valley people, for instance. In my opinion, it should rather be used for the funding of dignified, secure jobs. Preferably in the public sector ― young and old, teaching assistants, gardeners in public parks, custodians and things like that. There is unlimited demand for jobs of that kind.
‘Some of the hardest jobs to mechanize are jobs like gardening and plumbing.’Lord Martin Rees
Görlach: But robots also potentially could take on the work of a nurse, for that matter.
Rees: True, they could do some routine nursing. But I think people prefer real human beings, just as we’ve already seen that the wealthiest people want personal servants rather than automation. I think everyone would like that if they could afford it, and everyone in old age would like to be cared for by a real person.
Görlach: In your opinion, what mental capacities will robots have in the near future?
Rees: I think it will be a long time before they will have the all-round ability of humans. Maybe that will never happen. We don’t know. But what is called generalized machine learning, having been made possible by the ever-increasing number-crunching power of computers, is a genuine big breakthrough. These structures of machine learning are a big leap, and they open up the possibility that machines can really learn a lot about the world. It does raise dangers though, which people may worry about. If these computers were to get out of their box one day, they might pose a considerable threat.
Görlach: In your opinion, what sparks new innovation and ideas? Will A.I. and machines foster these processes?
Rees: Moments of insights are quite rare, sadly. But they do happen, as documented cases suggest (laughs). There is a great saying: “Fortune favors the prepared mind.” You have got to ruminate a lot before you are in a state to have one of these important insights. If you ask when the big advances in scientific understanding happen, they are often triggered by some new observation that in turn was enabled by some new technological advancement. Sometimes that happens just by a combination of people crossing disciplines and bringing new ideas together; sometimes just through luck; sometimes through a special motivation that caused people to focus on some problem; sometimes by people focusing on a new problem that was deemed too difficult previously and therefore didn’t attract attention.
‘Fortune favors the prepared mind.’
Görlach: Would you say a collective can have an idea or that only individuals have ideas?
Rees: Many ideas may have depended on the collective to even emerge. In soccer, one person may score the key goal. That doesn’t mean the other 10 people on the team are irrelevant. I think a lot of science is very much like that: the strength of a team is crucial to enable one person to score the goal.
Görlach: Do natural sciences and humanities have the capability to tackle the challenges occurring from these transformations?
Rees: The kinds of issues we are addressing in Cambridge involve social sciences as well as natural sciences. As I said before, because of the societal effect, the consequences of a pandemic now could be worse than they were in the past, despite our more advanced medicine. Also, if we are thinking of ecological problems like food shortages, the issue of food distribution is an economic question, as well as a question of what people are ready to eat. All these things involve fully understanding people’s social attitudes. Are we going to be satisfied eating insects for protein?
Görlach: With the rising amount of aggregated data, it becomes increasingly difficult for the humanities to keep up with natural sciences. How can we synchronize the languages of different academic fields in this era of big data?
Rees: Great question! There are impediments caused by disciplinary boundaries, and we have to encourage people to bridge these. I am gratified that we have some young people who are of this kind: philosophers who are into computer science or biologists who are interested in system analysis. All these things are very important. I think here in Cambridge, we are quite well-advantaged because we traditionally have the college system whereby we have small academic groups in each college. Each of these colleges is a microcosm, so all disciplines cross somewhat. It is therefore particularly propitious as a location for the development of cross-disciplinary work.
How can we synchronize the languages of different academic fields in this era of big data?
Görlach: The blessings of modern innovation seem to be ignored by many policymakers; we see a retreat from globalization and a retreat from digitalization. Is it a disconnect between science and the rest of society?
Rees: The misapplication of science is a problem, of course. As well as the fact that science’s benefits are irregularly distributed. There are some people that don’t benefit, such as traditional factory workers. If you look at the welfare of the average blue-collar worker and their income in real terms ― in the U.S. and in Europe ― it has not risen in the last 20 years; in many respects, their welfare has declined. Their jobs are less secure, and there is more unemployment. But there is one aspect in which they are better off: information technologies. IT spreads far quicker than expected and led to advantages for workers in Europe, the U.S. and Africa.
Görlach: But surely globalization made many poor people less poor and a few rich people even richer.
Rees: Sure, I guess this statement can be made after 25 years of globalization. But it should also be addressed that we now witness a significant backlash in many places in terms of Brexit or the presidential election in the U.S.
Görlach: How drastically do you think these developments will affect science, the attitude toward it and its funding?
Rees: Many of the people who use modern information technology, such as cellphones, aren’t aware of the immense technological achievements. Back in the day, developments could be traced back to scientific innovations decades ago, which were mainly funded by either the military or the public. They may not be aware of it, but they appreciate it. So it’s unfair to say people are anti-science. They are worried about science because indeed there is a risk that some of these technologies will run ahead faster than we can control and cope with them. So there is a reasonable ground for some people to be concerned ― for example, about biotech and A.I.
But we also have to bear in mind that for technology to be developed, it’s necessary ― but not sufficient ― for a certain amount of science to be known. We can take areas of technology in which we could have forged ahead faster but haven’t because there was no demand. Take one example: it took only 12 years from the first Sputnik to Neil Armstrong’s small step on the moon ― a huge development in 12 years. The motivation for the Apollo program was a political one and has led to huge expenses. Or take commercial flying ― today, we fly in the same way we did 50 years ago, even though in principle we could all fly in supersonics.
These are two examples where the technology exists but there hasn’t been a motive ― neither political nor economic ― to advance these technologies as fast as possible. In the case of IT, there was the obvious demand, which exploded globally in an amazing way.
‘There are areas of technology in which we could have forged ahead faster but haven’t because there was no demand.’Lord Martin Rees
Görlach: Living in a so-called post-factual era, what are “facts” to you as a scientist?
Rees: In the United Kingdom, those who voted for Brexit voted that way for a variety of reasons. Some who voted for it wanted to give the government a bloody nose; others voted blatantly against their own interest. The workers in South Wales, for example, benefited hugely from the European Union. There is a wide variety of different motives but I don’t think people would say that they voted against technology.
Görlach: Still, there is this ongoing narrative about the fear of globalization and digitalization, and that would also imply the fear of technology.
Rees: Sure, but that is oversimplified. We can have advanced technology on a smaller scale. I don’t think you can say that technology is always correlated with larger-scale globalization. It allows for robotic manufacturing, and it allows for more customization to individual demand. The internet has allowed a lot of small businesses to flow.
Görlach: But there seems to be an increasing disconnect in many societies regarding the consensus on which facts matter and how facts are perceived.
Rees: To understand this attitude you are expressing, we have to realize that there aren’t many facts that are clear and relevant in their own right. In most cases, I think people have reason to doubt. Most economic predictions, for example, have pretty poor records, so you can’t call them facts.
In the Brexit debate, there were a lot of valid arguments on both sides, and you can’t blame the public for being skeptical. This is also true for the climate debate. It is true that some people deny what is clear. But the details on climate change are very uncertain. Even those who agree on all will differ in their attitudes toward the appropriate policy. That depends on other things, including ethics. In a lot of recent debates, people agreed about the science. They disagree about the appropriate policies deriving from that facts. For instance: how much constraint are we willing to exercise, in order to facilitate the life of generations to come? Opinions differ hugely.
‘In the Brexit debate, there were a lot of valid arguments on both sides, and you can’t blame the public for being skeptical.’Lord Martin Rees
Görlach: But how then do you judge the developments we now see in many Western societies?
Rees: I think these developments are partly caused by new technologies that have led to new inequalities. Another point is: even if it hasn’t increased, people are now more aware of inequality. In sub-Saharan Africa, people see the kind of life that we live, and they wonder why they can’t live that kind of life. Twenty-five years ago, they were quite unaware of it. This understandably produces more discontent and embitterment. There is a segment of society, a less-educated one, that feels left behind and unappreciated. That is why I think a huge benefit to society will arise if we have enough redistribution to recreate dignified jobs.
Görlach: What political framework do you think of as an ideal environment for science?
Rees: In the Soviet Union, they had some of the best mathematicians and physicists, partly because the study of those subjects was fostered for military reasons. People in those areas also felt that they had more intellectual freedom, which is why a bigger fraction of the top intellectuals went into math and physics in Soviet Russia than probably anywhere else ever since. That shows you can have really outstanding scientists surviving in that sort of society.
Görlach:So the ethical implication is not paramount to having “good” science after all?
Rees: I think scientists have a special responsibility to be concerned about the implications of their work. Often an academic scientist can’t predict the implications of his work. The inventors of the laser, for instance, had no idea that this technology could be used for eye surgery and DVD discs but also for weaponry. Among the most impressive scientists I have known are the people who returned to academic pursuits after the end of World War II with relief but remained committed to doing what they could to control the powers they had helped to unleash.
In all cases, the scientists supported the making of the bomb in the context of the time. But they were also concerned about proliferation and arms control. It would have been wrong for them to not be concerned.
To make an analogy: if you have teenage son, you may not be able to control what he does, but you sure are a poor parent if you don’t care about what he does. Likewise, if you are a scientist and you created your own ideas, they’re your offspring, as it were. Though you can’t necessarily control how they will be applied, because that is beyond your control, you nonetheless should care and you should do all you can to ensure that your ideas, which you have helped to create, are used for the benefit of mankind and not in a damaging manner. This is something that should be instilled in all students. There should be ethics courses as part of all science courses in university.
‘How much constraint are we willing to exercise, in order to facilitate the life of generations to come? Opinions differ hugely.’Lord Martin Rees
Görlach: What, then, is your motivation as a scientist?
Rees: I feel I am very privileged to have consistently, over a career of nearly 40 years now, played part in debates on topics that I think are writing the history of science in this period. As we make great, collective, scientific progress, we are able to confront new mysteries, which we couldn’t even have addressed in the past. Many of the questions that were being addressed when I was young have now been solved. Pressing questions couldn’t even have been posed back then.
Of course the science I do is very remote from any application, but it’s of great fascination and a very wide audience is interested in these questions. It certainly adds to my satisfaction that I can actually convey some of these exciting ideas to a wider public. I would get less satisfaction if I could only talk about my work to a few fellow specialists, so I am glad that these ideas can become part of a broader culture.
Görlach: What is the best idea you ever had?
Rees: I don’t have any sort of singular idea, but I think I have played a role in some of the ideas that have gradually formed over the last 20 or 30 years about how our universe has evolved from a simple beginning to the complex cosmos we see around us that we are a part of. For me, the social part of science is very important ― many ideas emerge out of discussion and cooperation and, of course, out of experiments and observations.
The symbiosis between science and technology ― the old idea is that science eventually leads to an application ― is far too naïve! It goes two ways, because advancements made in academics are facilitated by technology. We only made advancements beyond Aristotle by having much more sensitive detectors and being able to explore space in many ways. If we didn’t have computers or ways of detecting radiation, etc., we would have made no progress because we are no wiser than Aristotle was.
Görlach: Lord Rees, thank you very much for your time.
The SAGE Handbook of Social Media Research Methods offers a step-by-step guide to overcoming the challenges inherent in research projects that deal with ‘big and broad data’, from the formulation of research questions through to the interpretation of findings. The handbook includes chapters on specific social media platforms such as Twitter, Sina Weibo and Instagram, as well as a series of critical chapters.
The holistic approach is organised into the following sections:
Conceptualising & Designing Social Media Research
Collection & Storage
Qualitative Approaches to Social Media Data
Quantitative Approaches to Social Media Data
Diverse Approaches to Social Media Data
Social Media Platforms
This handbook is the single most comprehensive resource for any scholar or graduate student embarking on a social media project.
The Google Books N-gram corpus contains an enormous volume of digitized data, which, to the best of our knowledge, sociologists have yet to fully utilize. In this paper, we mine this data to shed light on the discipline itself by conducting the first empirical study to map the disciplinary advancement of sociology from the mid-nineteenth century to 2008. We analyse the usage frequency of the most common terms in five major sociology categories: disciplinary advancement, scholars of sociology, theoretical dimensions, fields of sociology, and research methodologies. We also construct an overall index deriving from all sociology-related key words using the principal component method to demonstrate the overall influence of sociology as a discipline. Charting the historical evolution of the examined terms provides rich insights regarding the emergence and development of sociological norms, practices, and boundaries over the past two centuries. This novel application of massive content analysis using data of unprecedented size helps unpack the transformation of sociocultural dynamics over a long-term temporal scale.
The emergence of big data has opened many research opportunities and topics for the field of social science. As a lens on human culture (Aiden and Michel, 2013), big data offer enormous possibilities to detect historical trajectories, human interactions, social transformations and political practices with rich spatial and temporal dynamics. Forecasting the next five decades of social science research, King (2009: 91) has predicted a ‘historic change’ in which the profusion of gigantic databases and their investigation will promote ‘our knowledge of and practical solutions for problems of government and politics to grow at an enormous rate’.
One particularly promising new tool for massive content analysis is the Google N-gram corpus, a digitized books repository containing enormous volumes of digitized data. Michel et al. (2011) have described the construction of the first edition of the Google N-gram Corpus with approximately 5 million books and examined the usage frequency of words in order to quantitatively analyse human culture trends in ways unimaginable even a decade ago. Following this seminal study, the Google N-gram corpus has been used to explore the politics of disaster (Guggenheim, 2014), the language of contention (Tarrow, 2013), the transformation of economic life (Bentley et al., 2014; Roth, 2014), patterns of poverty and anti-poverty policy (Ravallion, 2011), linguistic and written language development (Twenge et al., 2012), and the psychology of culture (Greenfield, 2013; Zeng and Greenfield, 2015).
Notwithstanding this recent profusion of academic texts employing digitized texts, sociologists have yet to fully explore the possibilities offered by this new dataset. Whereas almost a decade ago the ‘coming crisis of empirical sociology’ related to sociologists’ failure to engage with the vast proliferation of social data (Savage and Burrows, 2007), sociologists need to think seriously about the challenges and opportunities posed by big data. As Burrows and Savage recently point out (2014: 2):
Sociologists generally used and refined rather familiar methods, talked mainly to each other about esoteric theoretical pre-occupations, and had not caught up with the fact that sociology was no longer an avant-garde discipline which had attracted legions of critical students and scholars in the 1960s and 1970s but had become fully part of the academic machine.
This absence is particularly striking given that the establishment, expansion, and influence of sociology is particularly reliant on words and phrases, rather than figures, functions, equations or other mathematical expressions, as compared to any natural science. Books serve as one of the most telling embodiments of a society’s knowledge over time, and the majority of sociology’s most canonical achievements have seen publication in book form. It seems only appropriate, then, to seize upon the opportunity provided by the Google N-gram corpus to identify and examine the long-term trends and themes that have characterized the field of sociology itself.
Sociology, as one of the core disciplines of the social sciences, is ‘like a caravansary on the Silk Road, filled with all sorts and types of people and beset by bandit gangs of positivists, feminists, interactionists, and Marxists, and even by some larger, far-off states like Economics and the Humanities, all of whom are bent on reducing the place to vassalage’ (Abbott, 2001: 6). Yet, notwithstanding this statement on the complexities of disciplinary advancement of sociology, there is virtually no empirical sociological research that can attest to the development of different ‘sorts and types’ of sociological norms, practices and boundaries. In the current study, we conduct the first empirical analysis, to our knowledge, in the field of sociology to use the corpus of digitized books. We analyse the evolution of the usage of the most common words and phrases in terms of disciplinary advancement, sociology scholars, sociology theories, sociology fields and sociology research methodologies between the 1850s and 2008. We also employ the data extracted from the corpus to quantitatively testify theories of the development of sociology. Our results show that the annual usage frequency count of a particular term based on big-data strategy not only gives clues as to the historical emergence and progress of sociology – indicating, for example, the longevity or popularity of a particular sociology field or method – but also sheds light on the linkage between the development of sociology and broader sociocultural dynamics over centuries.
Data and method
Since 2004, Google has been engaged in digitizing books printed as early as 1473 and representing 478 languages from 40 top universities worldwide (Michel et al., 2011). The first edition of Google corpus for analysis consists of about 5 million volumes of books between 1550 and 2008, excluding journals and serial publications (around 40 per cent of all scanned publications), which represent a different aspect of culture than do books. To avoid data duplication, the team of Google corpus converted billions of book records from over 100 sources of metadata information provide by libraries, retailers, and publishers in order to generate a single non-redundant database of book editions (Michel et al., 2011, Supplementary Online Material).
Following exactly the same procedure described in Michel et al. (2011), the second edition of Google corpus (2012) consists of about 8 million books, representing 6 per cent of all the books printed from the 1500s onward (Lin et al., 2012). Compared to the first edition, the 2012 Google corpus has a larger underlying book collection and higher quality digitalization (Lin et al., 2012). The English corpus alone comprises 4.5 million volumes of books and around half a trillion words (Table 1).
The Google Books corpus provides information about how many times per year an ‘n-gram’ appears in all the books included in the corpus, where an n-gram is a continual string of n words (uninterrupted by a space). A 1-gram could be a single word, for example, ‘sociology’, or numbers ‘1.234’. An n-gram is a sequence of 1-grams, such as the phrases ‘sociology theory’ (a 2-gram) and ‘field of sociology’ (a 3-gram). Punctuation and capitalization are preserved in the data set. By searching the Google corpus for a key word or phrase, one can obtain information about the annual occurrence of that keyword or phrase for a given time period. Although the absolute percentage of any individual word is, of necessity, small, the traces of such words, their rise and fall, can help index the most robust sociocultural trends over a long-term timeline.
In the present analysis, we focus on the English-language books corpus. We also analyse some specific terms in both American English and British English books to make a further comparison across different social contexts.1 In terms of time frame, we restrict our research to between mid-1850 and 2008 (inclusive) for two reasons. First, the profession of sociology emerged as a scholarly discipline in the early part of the nineteenth century and only really started to flourish in the mid-1850s,2 with Karl Marx, Herbert Spencer, and other early generation scholars to publish their works in the field of sociology (Boudon, 1989). Second, digitization of written texts is a cumulative process. Contemporary holdings of books published in the early 1800s are often incomplete and scant, meaning that information extracted from books before the 1850s could be from a biased sample. At the other end of the timeline, books published after 2008 are still being digitized and included in the Google Books corpus. Thus far, there is no data match beyond the year 2008 (Lin et al., 2012).
This language and year restriction can substantially alleviate the potential problem of data accuracy because more than 98 per cent of words are correctly digitized for modern English books (Michel et al. 2011, Supplementary Online Material). Still, two concerns may be raised regarding the representativeness of the Google corpus analysed in the present paper.
First, the corpus was constructed using OCR (optical character recognition) technology. As Michel et al. (2011) mention, books with poor OCR quality (due to size, paper quality, or the physical condition) were filtered out. This could lead to a potential sample problem. Second, the corpus is most likely to be biased towards recent books, since more books are published in more recent years, leading to skewed results of word usage. Regarding the first issue, however, books filtered out due to poor OCR quality only accounted for around 4 per cent of all scanned volumes (Michel et al., 2011, Supplementary Online Material) – a considerably small fraction. As for the second concern, we normalized the total number of appearances of a key word using the frequency of ‘the’ in the same year rather than the total number of all words.3 Thus, we obtained the normalized annual frequency of the word usage of our search terms as:
where Rit denotes the word usage of the key word i in year t, Cit represents the total number of appearance the word i in year t, and Ct is the total number of ‘the’ that appeared in all books published in year t. Conceptually, a higher Rit indicates higher frequency of word usage and thus higher cultural and social influence for the time period in question.
Drawing on various sociology textbooks, including A Dictionary of Sociology (Scott and Marshall, 2009), Sociology (Giddens and Sutton, 2013), we conducted a panoramic search of the disciplinary advancement of sociology in five major categories: academic significance, masters of sociology, theoretical dimensions, fields of sociology, and analytical methodologies. ‘Academic significance’ refers to the historical position of sociology in human knowledge as a subject related and compared to other subjects; the key word for this is ‘sociology’ or ‘sociological’. For ‘masters of sociology’, sociologists’ full names serve as the search terms and the goal is to chart key figures’ rise to fame and their academic reputations. The key words for ‘theoretical dimension’ are the names of relevant sociological theories and schools; ‘fields of sociology’ focuses on the sub-branches of sociology and popular research topics; and ‘analytical methodologies’ focuses mainly on the comparison of qualitative and quantitative research methodologies in sociology. Finally, we constructed an overall index deriving from all sociology-related key words using the principal component method to demonstrate the overall sociocultural influence of sociology in two centuries’ books.
Academic significance of sociology
We first counted the appearance of the key word ‘Sociology’ in the corpus since 1850. As a control group we also ran a similar search on the four subjects of ‘Philosophy’, ‘Economics’, ‘Anthropology’ and ‘Psychology’. It is worth noting that we did not run a test on ‘Political Science’ due to the fact that ‘Political’ or ‘Politics’ could be interpreted in numerous ways and thus would likely include non-academic related materials in the results.
The x-axis of Figure 1 demonstrates the year label from 1850 to 2008, while the y-axis stands for the word frequency statistics of different subjects. From Figure 1, one can observe that the word ‘Philosophy’ accounts for approximately 0.007 per cent of the total word count. Compared to other subjects, phrases associated with ‘Philosophy’ appeared earlier and more frequently. However, around the turn of the nineteenth to the twentieth century, the curve for ‘Philosophy’ plunged drastically and did not rise again until the early twentieth century. This finding corresponds with the collapse of classic German philosophy, especially the Hegelian school of philosophy in history (Solomon, 1988). It is noteworthy that from 1890 to 1920, as the word frequency statistics curve for ‘Philosophy’ dropped, the respective curves for the other subjects rose.
In fact, the word frequency statistics for ‘Sociology’, ‘Economics’ and ‘Anthropology’ rose steadily between mid-late nineteenth century and the 1930s, especially in the case of ‘Economics’, which saw the most drastic uptick in frequency, developing a wide lead over ‘Sociology’, ‘Psychology’ and ‘Anthropology’.
Our analysis yields interesting insights regarding the impact of major world events. For example, during World War I (1914–1918), the statistics for ‘Sociology’, ‘Psychology’ and ‘Economics’ did not drop, but in World War II (1939–1945) the statistics dropped dramatically and only began to increase again with the end of the war. This seems to indicate that WWII had a much greater impact on these disciplines than did WWI. The effect of WWII was reversed, however, in the case of ‘Anthropology’, which saw no decline during WWII; indeed, if anything, it saw a slight rise in its statistics. We believe this can be linked to the expansion of conflict beyond Europe to include Asia, Africa and Oceania, thus increasing states’ demand for strategic knowledge about non-Western countries. A broader war has, on one hand, secured funding on anthropology from government based on strategic purposes to study nationalism, internationalism, racial supremacy and anti-totalitarianism, on the other hand anthropologists themselves were able to shift their research horizon from traditional subjects such as African and Indian tribes to Eastern Europe and Southeast Asia (Price, 2002). Anthropologist Ruth Benedict’s 1946 study of Japan, The Chrysanthemum and the Sword, stands as arguably one of the best-known examples of such state-driven academic research.
The curves for ‘Sociology’, ‘Economics’, ‘Psychology’ and ‘Anthropology’ all peaked during the 1970s and 1980s, then began another round of slow descent in the 1990s. The descent for each subject might simply represent the dilution of knowledge in a constantly expanding corpus: with the total amount of knowledge possessed by human beings constantly on the rise, the percentage increase year to year for each subject or field might understandably be decreasing. However, for ‘sociology’, the decreasing word frequency does not necessarily mean the decline of the importance of sociology as a discipline. We will analyse this further in a later section.
We conducted searches for the full English name of 30 major Western sociologists in the Google N-gram corpus. Figure 2 illustrates the top 12 sociologists in word frequency statistics.4 They are (chronologically): Karl Marx, Herbert Spencer, Max Weber, Emile Durkheim, Georg Simmel, Herbert Marcuse, Talcott Parsons, Erving Goffman, Zygmunt Bauman, Jürgen Habermas, Pierre Bourdieu and Anthony Giddens. From Figure 2, we conclude three major findings.
Dilution effect: From Karl Marx to Anthony Giddens, it seems that each new sociologist is destined never to surpass his predecessors’ academic significance. This phenomenon does not necessarily suggest that the influence of one sociologist cannot surpass his predecessor. For instance, the influence of Pierre Bourdieu after the 1980s exceeded his predecessors Georg Simmel and Emile Durkheim and reached 0.00005 per cent around 2003, next only to Karl Marx and Max Weber. However, if we categorize sociologists into different generation group, we can see that later generation peaked at 0.00008 per cent in the 1970s represented by Talcott Parsons and none of the descendants could ever pass that point, let alone to reach the statistics of earlier sociologists like Herbert Spencer and Karl Marx. Thus conceived, it is almost impossible for later generation sociologists to surpass the fame of the earlier generation.
This phenomenon is due to the explosive growth in the total amount and categories of human knowledge. In other words, sociology constituted a bigger share of given knowledge during the nineteenth century, as that body of knowledge was still being amassed. When it comes to the twentieth and twenty-first centuries, in contrast, though sociology itself has continued to develop and more and more people have become professional sociologists, the discipline’s relative influence in human knowledge has decreased – not unlike the dilution of a substance mixed with ever larger quantities of water. To the extent that Talcott Parsons appears to be the last sociologist with the same level of influence as the generations that came before him, this may well have as much to do with the changing size of the ‘reservoir’ of all human knowledge as it does with Parsons’ work itself.
Exogenous effect: Compared to other sociologists, the word frequency curves with the highest average upward slope were those of Herbert Spencer and Karl Marx. In other words, Spencer and Marx enjoyed the most rapid ascent to positions of authority within the field in terms of influence. The speed of their rise, however, was supported by strong exogenous forces other than academic factors. Herbert Spencer was a generalist – a combination of philosopher, biologist, anthropologist, sociologist, political theorist, and a classic man of letters. He interacted with social elites throughout his life and was connected to many important ideologists and dignitaries. Spencer utilized his high-status social network to gain authority and audience as a generalist, enabling him to become extremely influential in the late nineteenth century, when the total amount of knowledge was still limited. Karl Marx, in comparison, did not enjoy such success in his lifetime; instead, his influence peaked between the 1920s and 1940s, and then again in the 1960s to the 1970s – precisely when Marxism and Communism were becoming influential beyond the academic world and actually changing the course of twentieth-century history.
Acceleration effect: Whereas most of the first generation of sociologists had to enjoy their fame posthumously, twentieth-century sociologists have become influential much earlier in their careers. With the exception of Herbert Spencer, all of the great names of sociology born in the nineteenth century became most reputable after their death. Karl Marx became most famous some 20 years after his death; Max Weber’s name began to rise exactly after his death in 1920; and, likewise, none of Emile Durkheim, Georg Simmel or Herbert Marcuse lived to see the years in which their numbers truly blossomed. In contrast, sociologists born in the twentieth century were much luckier. For instance, when Talcott Parsons began to gain fame in the 1940s, he was no more than 40 years old. Anthony Giddens became famous at the same age. Jürgen Habermas and Pierre Bourdieu became highly influential slightly later, but both began their ascent when they were in their fifties, around the 1980s–1990s, and Habermas is still alive today.
This acceleration effect can be ascribed to the development and standardization of sociology as a subject. In the late nineteenth century, as the discipline was still being established, there were fewer scholars and academic standards were, if not lower per se, at the very least less formalized, with greater room for flexibility. Sociology, too, was still in the process of legitimating its claim as a science. All these factors contributed to a longer ‘wait time’, so to speak, for a sociology scholar to reach notable fame. Today, both the discipline and the academic field in general are well established, enabling sociologists can make use of better disciplinary infrastructure and pre-existing channels to increase their influence.
The contribution of sociology towards human knowledge lies in a series of inspiring and explanatory concepts and theories. As such, we conducted key word searches for classic theories of sociology in order to explore their relative impact. Because most nineteenth-century sociological works are more general in nature – concerned as they were with establishing the basic parameters and goals of the discipline – we focused on the most famous, more specific sociological theories of the twentieth century. As Figure 3 illustrates, we concentrated on the ten most famous sociological theories: Conflict Theory, Social Exchange Theory, Structural Functionalism, Structuration Theory, Symbolic Interactionism, Rational Choice Theory, Ethnomethodology, Neo Functionalism, Strength of Weak Ties, and Structural Holes.
Lifetime trajectory of a theory: We noticed that each theory, from its birth to maturity, from its peak popularity to its point of diminishing returns, has its own life trajectory. In the mid-late twentieth century, the majority of the theories reached a peak in their growth-rate and usage about 30–40 years after their introduction. After that point, their influence begins to diminish. Interestingly, even though the sample of theories is relatively small, this life-cycle pattern fits that found for words more generally by researchers in linguistics. For example, Petersen et al. (2012) have identified universal growth-rate fluctuations in the birth and death rates of words: new words reach a pronounced peak about 30–50 years after the originate, after which point they either enter the long-term lexicon or fall into disuse.
The metabolism of a theory: We also noticed that the influence of earlier theories was superseded by that of newer theories. For instance, the growth rate of Structural Functionalism began to decrease in the mid-1990s while the usage of Structural Holes, a theory 20 years younger, superseded the former. Ethnomethodology and Symbolic Interactionism also appear to be on their way out. Meanwhile, Rational Choice Theory is still increasing in frequency, but now at a slower rate. Furthermore, when we grouped Strength of Weak Ties and Structural Holes together, we found that their total influence had already surpassed that of Structuration Theory and Social Exchange Theory around 2008. In other words, the cultural influence and academic significance of newly developed social capital and social network approaches has already gone beyond that of ‘classical’ sociological theories. Whether they will continue this growth, however, remains to be seen.
Explanatory scale of a theory: Generally speaking, a grand theory possesses stronger generalization ability and a larger scale of utilization. Yet, we found that since at least the mid-twentieth century, the theoretical world is no longer dominated by grand theories. For instance, Anthony Giddens’ Structuration Theory and Talcott Parsons’ Structural Functionalism have fallen significantly below Ethnomethodology, Symbolic Interactionism and Rational Choice Theory, all of which focus on micro-level interactions in society rather than large-scale macro functions of societal structures and institutions. Moreover, as time progresses, there seems to be less and less room reserved for grand theories: theories that thrived after the 1970s, such as Strength of Weak Ties and Structural Holes, all adopt micro or meso perspectives in order to understand human behaviour. While the relative pros and cons of ‘micro’ versus ‘macro’ theories are still the subject of much debate today, we speculate that the ambitious nature of grand theories may have, over time, become a disadvantage, actually limiting their appeal for contemporary theorists. Indeed, it may well be as many postmodern theorists have already declared, that sociology has entered a ‘post grand theories’ era.
Fields of sociology
Sociology is subdivided into many specialized fields and these fields are constantly changing over time. For this analysis, we looked at the shifting pattern of these fields in sociology in order to capture the larger discipline’s related social change. We conducted a key word search for eight of the most prominent fields, namely: Educational Sociology (Sociology of Education), Rural Sociology, Urban Sociology, Political Sociology, Economic Sociology, Sociology of Law, Sociology of Religion and Historical Sociology.
A few interesting findings can be observed in Figure 4. First, Educational Sociology emerged early as the most prominent field, but was replaced by Sociology of Education in the late 1960s. The shift was not merely semantic. Educational Sociology focused primarily on the social and cultural factors affecting relatively smaller social groups, thus neglecting larger societal influences on education in the post-industrial period. The Sociology of Education, on the other hand, turns its interest to the social function of education and thus investigates the role of education as a social institution (Shimbori, 1972). Second, after the 1990s, both the Sociology of Religion and Historical Sociology progressed at a relatively aggressive pace, particularly when compared with all the other fields, which demonstrated signs of descending. Third, Rural Sociology emerged as a sub-field of the discipline in the early twentieth century and exhibited a very high growth rate from the 1950s to the 1980s. This reflects the fact that Rural Sociology is the earliest and the most prominent sub-discipline of American sociology as an outgrowth of the response to the pronounced differentials in rural and urban social organization of the late nineteenth century, with its development peak around 1950s to 1960s (Brunner, 1957; Nelson, 1969).
In addition to the various fields within sociology, we were also interested to see shifts in terms of substantive research topics, which subjects were deemed ‘hot’ and when. In Figure 5 we compare eight representative terminologies within the social stratification and mobility, and social capital and network areas: Social Identity, Social Movement, Social Mobility, Social Stratification, Social Capital, Social Network, Social Class and Social Strata.
From Figure 5, we can observe that the growth-rate fluctuation of Social Mobility and Social Stratification peaked around 1975 and then started to decline. The popularity of Social Network rose rapidly from the late 1980s and surpassed Social Mobility around 1997. As Freeman (2004) argues, with the development of desktop computers and computer programs to manage network data, social network research finally took off from the mid-1980s onwards, shifting from ‘network as metaphor’ to ‘network as a mathematical expression’. Around the same time, research on Social Capital exceeded Social Mobility and finally surpassed Social Class around 2003. In other words, research on each of Social Capital and Social Networks is currently on the rise, while research on each of Social Mobility and Social Stratification is declining. Meanwhile, research on Social Movements started proliferating around the mid-1960s when waves of new movements organized around race and gender emerged in both America and Western Europe (Kriesi et al., 1995; Lovenduski, 1986).
Research methodologies of sociology
Which methods are used most by sociologists – quantitative or qualitative methods? To answer this question, we focus on shifts in the relative balance between the two major research methodologies in sociology over the past century.
We first calculated the average score of annual frequencies of each method in both quantitative and qualitative approaches from 1950 to 1980. Then we normalized the two groups of average scores into Z values and use ZQN – ZQL to obtain an index of quantitative analysis for each year. Figure 6 shows a plot of this index.
From Figure 6, we can see that both methods took turns ‘in the lead’ across different time periods. From 1950 to 1980, qualitative methods were more prominent, while the usage of quantitative methods surpassed that of qualitative approaches in the 1980s and 1990s, except for a short period around 1995–1997. After 2000, quantitative methods dominate in a majority of scholarships. It is noteworthy that scholars who utilize qualitative methods are also more likely to publish their research in book format, in contrast to quantitative researchers who are more likely to publish in journals and other formats; therefore, if anything, it is likely that our calculation underestimates the ‘lead’ of quantitative over quantitative methods.
An overall index: influence of sociology
In this section we use the word usage of relevant sociology-related key words in the above categories (except for methodology) to generate an overall measure for the sociocultural influence of sociology in millions of books. We carry out a Principal Components Analysis (PCA) to extract as much information as possible from the corpus while preserving degrees of freedom. We prefer the PCA method to applying the average score of normalized annual frequencies because PCA can ‘concentrate’ much of the sociological signals into the first few factors by ‘screening’ the later factors that are dominated by noise. This is important given that we generate the list of sociology-related words without establishing any theory about how closely the selected signals capture the meaning of ‘sociology’. The factor-predicted score S is calculated by:
where m denotes the number of factors with eigenvalues larger than 1, and is the cumulative proportion of explained variances larger than 90 per cent.
We report the factor loadings, variances, as well as correlation of signals in Table 2. The KMO measure of sampling adequacy, and the SMC between each signal and all other signals strongly suggest that these signals pick up sociology-related dynamics in the corpus. The first three principal components account for around 91 per cent of the variance. Using the first three factors and their respective proportion of variance, we can predict the index for influence of sociology.
Table 2. Factor loadings on and correlations of sociology signalsa
Notes:. The KMO reports the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy, and SMC reports the squared multiple correlations (SMC) between each signal and all other signals.
Factors with an eigenvalue less than 1 are not presented.
aIn robustness check we added more sociology-related words to the list (e.g., middle class, working class, social status, and etc.) and obtained almost identical PCA results.
Social Exchange Theory
Rational Choice Theory
Strength of Weak Ties
In Figure 7, we further present the time series of the z-score equivalents of the overall index for sociology, as well as the time series of the word usage of ‘sociology/sociological’. As the figure shows, the influence of sociology as a discipline took off in the 1970s. Although the word usage of ‘sociology/sociological’ began to decline in the 1980s, the overall usage of sociological terms, including sociological theories and topics, began to skyrocket in all other respects. This reflects the extent to which sociology has come to penetrate and influence other domains and disciplines. For example, theories of weak ties and structural holes have been widely applied in the study of business management, while social capital has become a popular topic in research on economic development, political participation and public health. Further, we believe that the impact of sociology will continue to expand in the foreseeable future.
A research case beyond description
With the help of Google corpus, we are able to conduct more substantial research into the development of sociology beyond simply describing the rise and fall of the usage of sociology-related words. We use the case of the early development of sociology in the USA as an example to illustrate how the data extracted from Google corpus can be used to conduct quantitative study.
Upon the creation of American sociology as a professional discipline circa the 1890s (Cortese, 1995; Young, 2009), the tenets of the social gospel movement made sociology an acceptable course of study in many American denominational colleges. This has led to considerable debate among students of history of sociology regarding the nature of the connection between sociology and social gospelism (Henking, 1993; Morgan, 1969; Williams and MacLean, 2012). As Morgan (1969: 42) has indicated, ‘the Social Gospel and early sociology were often indistinguishable in terms of both ideas and leading personnel. This close parallelism is seen as a major factor in the early acceptance of sociology as an academic discipline in the nineteenth century universities.’ Research on this question, however, has only looked at individual case studies and thus lacks the support of hard data.
Digitized written texts provide a statistical solution to this dilemma. We searched using the key words ‘Sociology’, ‘Social Gospel’ and ‘Hull House’,5 with ‘Anthropology’ as a control group, and compared the results from the American English corpus and the British English corpus. As demonstrated in Figure 8, ‘Social Gospel’ and ‘Sociology’ both show signs of growth from 1890 to 1930 in America, with their respective growth rate close to each other; meanwhile, ‘Anthropology’ shows no visible signs of growth. By contrast, the correlation between the growth of ‘Sociology’ and ‘Social Gospel’ was far less obvious in England.
The above findings based on visual inspection of the data provide only preliminary evidence of the effects of the social gospel movement on the development of sociology in America. We thus proceed to use the time series (1890–1930) of ‘Sociology’, ‘Social Gospel’, ‘Hull House’ and ‘Anthropology’ to perform a Granger causality test to formally test the proposed connection between sociology and social gospelism. In the language of time series analysis, X is the Granger-cause of Y in the sense that Y can be better predicted using the histories of both X and Y than it can be predicted using the history of Y alone.
Using time series with persistence displayed by a unit root process in a standard ordinary least square equation can lead to spurious results of correlations. Therefore, we first performed stationary tests for all four time series using the Dickey–Fuller General Least Square (DFGLS) method and the Phillips–Perron (P-P) method. We found that all of them are integrated of the first order. We therefore used their first differences to fit a vector autoregressive (VAR) model to examine the relationships among them. The results from the American English corpus in Table 3 clearly show that ‘Social Gospel’ is the Granger-cause of ‘Sociology’ at a 0.05 alpha level, and ‘Hull House’ is the Granger-cause of ‘Sociology’ too at a 0.09 alpha level. In addition, the identified time lag suggests that the social gospel movement within the past 4 years can effectively affect the development of sociology at any given time. However, neither of the two words are the Granger-cause of ‘Anthropology’ even at a 0.1 alpha level. Furthermore, results from the British English corpus demonstrate that there is no Granger-relationship among the time series of ‘Social Gospel’, ‘Hull House’, ‘Sociology’, and ‘Anthropology’ at all. In general, our findings based on time series analyses lend support to the argument that there was a close relationship between the early development of sociology and the social gospel movement in the USA.
Table 3. Granger causality tests for the potential connections between sociology and social gospel movement using two different corpus
Notes:. The lag length was chosen according to the Schwarz Bayesian information criterion (SBIC), the Hannan and Quinn information criterion (HQIC), and Akaike information criterion (AIC).
Sg does not Granger cause Soci
Hull does not Granger cause Soci
Anthr does not Granger cause Soci
Sg does not Granger cause Anthr
Hull does not Granger cause Anthr
Socil does not Granger cause Anthr
This paper is the first of its kind to use the Google Books N-gram corpus, perhaps the largest electronic corpus yet constructed, to map out the disciplinary advancement of sociology in terms of the discipline in general and its major scholars, theories and research fields from the mid-nineteenth century to 2008. The intention of this research is in no way to suggest an evaluative ranking of the theories, scholars, schools, or methodologies that make up sociology. Instead, our goal has been to respond to Back and Puwar’s (2012) call for a ‘live sociology’ to deal with ‘lively data’, or the challenge posed by big data, the knowledge economy and the digitization of everyday life. As such, the aim and, it is hoped, the contribution of this study has been to show that massive content analysis from digitized books can provide rich insights regarding the historical evolution of professional disciplines and long-term sociocultural changes at a macro level.
Conceptually, examination of high frequency use of a specific term in a representative sample of the written texts is particularly important because it helps ‘identify the dynamics of historical emergence, decline, and comparative significance of a political concept’ (Hassanpour, 2013: 299). This gives corpus methodology significant advantages over traditional survey methods in which the sheer quantity of data and the availability of data are limited (Beer and Burrows, 2013; Lin et al., 2012). The use of newspaper data from one or more localities also tends to produce validity and reliability problems and there is no standard solution to correct for potential description and selection bias (Earl et al., 2004; Oliver and Myers, 1999). So far, however, the use of corpus data analysis has barely started among sociologists. With the exploding scale of digitization, more and more materials will be included in the historical corpus in the years to come. This will fundamentally change our scope of research and open venues for sociologists to employ new and creative approaches to social research.
Of course, there is still room for improvement in the present research. First, the full dataset analysed here only accounts for around 6 per cent of all books ever published from 1500 onwards. This means that it may be biased relative to the ensemble of all surviving books. The scanned and digitized books in particular were mainly borrowed from university or public libraries, retailers and publishers, and thus the composition of the corpus reflects the acquisition practices of the participating institutions. Although the assembled collections of books from various participating institutions could still be argued to be representative, the results here are tentative and should be treated with some caution.
Second, there are so many searchable sociology terms and we only cover a small proportion of the sociologists, theories and research fields. Therefore, the phenomena and the patterns observed might not represent the most universal versions. For example, we have only addressed some classic, traditional and established research fields of sociology such as social class, social movements or social capital; other important new fields such as globalization, migration, gerontology, gender, and race or ethnicity are increasingly popular among contemporary sociologists but may be under-represented here. The goal of this study has been to use novel data and visualization methods to shed light on the history of sociology itself, not by any means to summarize over a hundred years of sociological research.
Third, the advanced search function of the database was still limited and, therefore, the accuracy of the search results was far from perfect. For instance, different names may be attached to the same sociological terminology and words are sometimes used in ways that do not convey the same single sociological concept as the one intended in the analysis. Even though we have used Google search engines as a control group and chose the version with the highest level of representation, the accuracy of the results may still be lacking.
Despite its drawbacks, our research strategy is sufficient to show that written literary data in human history can help reinvigorate a sociological imagination able to extrapolate the historical trajectory of a sociological practice. Michel et al. (2011) proposed the concept of ‘culturomics’ to refer to the use of high-throughput digitized resources to study sociocultural trends and the human cultural genome. Similarly, we also suggest to open up a new field – ‘socialomics’ – to study the current state of a dynamic, fluid social world with massive digitized data collection and analysis. The value of establishing such an energetic and forward-thinking approach lies in the fact that the amount of human knowledge accessible to sociologists via physical reading is, in fact, very limited. This glass ceiling of academic research could result in a form of myopia, blinding us to the development of social science within and across media and forums not limited to the book format. With ‘genetic’ analysis of word frequency usage in a digitized era, we are likely to achieve theoretical inspirations and academic knowledge that the early generation of sociologists could not even have imagined.
1We also examine whether the pattern we find in the main analysis can be applied to the narrative-of-event corpora of newspapers. We searched the same key words in the field of sociology in the corpus of the New York Times and the results show similar general trends. Results of the relevant tests are available from the authors upon request.
2Although sociology’s exact timeline as a field/profession/discipline remains contested, this general time period works for the purposes of the current paper.
3Here we follow Bentley et al. (2014) and Acerbi et al. (2013), both studies that use this strategy. According to Acerbi et al. (2013), the word ‘the’ stably accounts for around 6 per cent of all words per year, and is thus a good representative of real writing and real sentences.
4The curve for the other 18 sociologists were all beneath the statistics curve of Jürgen Habermas. They are: Herbert Blumer, Charles Cooley, Alfred Schutz, George Mead, Harold Garfinkel, Max Horkheimer, Niklas Luhmann, György Lukács, C. Wright Mills, Robert Merton, Ralf Dahrendorf, Gerhard Lenski, Peter Blau, Randall Collins, Jeffrey Alexander, James Coleman, Immanuel Wallerstein and Norbert Elias.
5Hull House was the most famous ‘good-neighbor’ centre in the social gospel movement. Its founder Jane Addams later won a Nobel Peace Prize.
Please quote the article DOI when citing SR content, including monographs. Article DOIs and “How to Cite” information can be found alongside the online version of each article within Wiley Online Library. All articles published within the SR (including monograph content) are included within the ISI Journal Citation Reports® Social Science Citation Index.
Reorienting sociology is a phrase that builds on the endless ‘turns’ that prefigured the current phase of defining the discipline. It is a navigational frame and needs to be treated with caution, lest it be applied with unreflexive haste. The recent explosion of interest in the emerging contours of digital society and new forms of data have been cited as a signal moment for sociology and the social sciences more generally.
Some commentators have argued that a step change in sociology (and aligned social sciences) is warranted in the face of big data and the opportunities that it offers for the generation of significant insights and a move towards ‘Big Science’ – where discrete analytic practices and work are broken down into networked, distributed and digitally organized collaborative tasks amongst groups and interdisciplinary ‘data labs’.
This form of knowledge organisation is also predicated on the emergence of new occupational groups (e.g. data scientists) and the harvesting of crowd intelligence through large scale annotation exercises that make use of distributed digital labour through the exploitation of crowdsourcing data coding platforms, often without demographic selection or sampling, in order to generate classifiers for machine learning and algorithm development.
The theoretical, methodological and empirical opportunities for conducting social science and understanding the emerging contours of digital society represent exciting topics of inquiry and offer new ways for conducting research. For example, the possibility of using new forms of data and organizing research in different ways in digitally networked times and the claim that rapid socio-technical change is acting as a catalyst for social theorization of advanced technologies-in-action as they move out of the research and development lab and into peoples hands (literally) in an accelerated fashion.
Perhaps reorientation to new problems and phenomena is best understood as a constant feature of sociological inquiry; social change is, after all, a key question and concern for the disciplinary enterprise. The perception that we are living through intense social, technical and environmental change is both widely shared and empirically documented. The social is being reassembled but key questions and concerns remains the same. How do things associate and organize? Why do these assemblages change over time? How do agents relate and integrate within a given social form? New forms of association between both human and non-human ‘societal’ integrators, mediators and disruptors remain key concerns.
To this extent, we might step back from claims surrounding a ‘digital turn’ in sociology. The turns will keep coming, thick and fast due to the inexorable pace of social change, social (re-) organisation and our (and possibly other actors) position within the warp and weft of social relations. To this extent, one course of action would avoid privileging the ‘digital’ but rather seek to consider it as an opportunity to track, trace and interpret social life as a process and ‘structuring’ matrix of relations. In this short piece I will identify three interrelated well established analytic frames that have been operationalized in order to examine the digital as such an opportunity and in ways that might tell us something about social life in general as well as in particular. These are the study of breaches in social and normative order, the mapping of scientific and social controversies and finally, transformation and world making.
Digital Technology: Disruption as Breaching
Ethnomethodology and the early work of Garfinkel noted how ‘breaching experiments’ revealed the import of background expectancies and the normative foundations of situated interaction order. For example, behaving as a lodger in the family home would generate humour and, eventually, upset; the role expectancies of being a parent, partner or family member being disrupted by requests to use the bathroom or offers of weekly rental for sleeping space. At a more granular level and within the context of Goffman’s studies of face work and Sacks’ fine grained documentation of repair and alignment within talk – the study of breaches and repair have been central to the empirical enterprise of understanding how social (interaction) order is accomplished, moment by moment, in real time. In the sense that breaches in social and moral order provide a window for the analyst to peer in and document how it is that social actors themselves keep the social on the move.
The arrival of mobile telephony, social media and other body proxemic digitally networked, communicative devices present social actors with a number of problems and opportunities. Being networked on the move has to be managed with the co-presence of others, being hailed from afar or augmenting co-presence with networked information requires modification, management and routine repair of interaction order. Digital devices can disrupt shared expectations within routine interactional flows, but they are also realigned and incorporated at the same time by people in ways which render visible the character of these new mobile social technical affordances and the organizational and mundane features of everyday social life. In other words, the study of how new technologies breach established social relations tell us much about the socio-technicality of both.
Digital Technology: Disruption and Controversy
The arrival of data generating digital technologies, such as social media, provide new avenues for exploring controversies at scale and in near real time. Science and Technology Studies have routinely explored scientific controversies as a naturally occurring opportunity through which to observe the social organizational characteristics of science as practice. In a way that is distinct, but similar, to normative breaches of interaction order, controversies bring into view a range of occluded practices that do not always form part of the official account of how science might be understood to actually work and operate. Social media as data can inform the development of new ways of mapping controversies within networks and in ways that can document how different tropes, narratives, claims and groups are mobilized in relation to things like climate change, often in ways that aid powerful forms of visualization in relation to actor networks.
However, ‘disruptive’ digital technologies (inclusive of social media) can also be understood to be controversial objects in their own right, due to the way in which they are designed and deployed in order to disrupt markets, cultures and social relations. The ‘digitally disruptive’ are inherently controversial due to the way in which these technologies are designed to cross boundaries and established turf with ease, at relatively low cost and in ways that can potentially generate significant value above and around traditional circuits of capital, governance and regulation. As a consequence, the examination of the digital as controversial renders visible established (and changing) forms of social organization and ordering in moments of repair, realignment and reaction.
Digital Technology: Disruption and Transformation
In addition to looking at the ways in which digital technology can be followed as a socio-technical associational frame that generates analytic insight through the routine breaching of the interaction order and the generation of controversy amongst groups, networks and structures it might also be understood to be transformative. For example treating social media as data can augment traditional social research methods, the use of networked mobile telephony can enhance the ways in which we interpret people, places and spaces whilst on the move, algorithms can generate unintended consequences in the ways they automatically categorise social media communications at scale in terms of race and class in relation to real or perceived social problems and so on.
A central issue here is the way in which digital technologies disrupt social categories and categorization practices by augmenting and re-orienting shared understandings and what is that we take for granted. In this sense they have the potential to make new ‘social worlds’ by constituting transformative social relations and connections. Again, these are avenues for further study in the first instance but include inquiry into the way in which, for example, we communicate and interact via Web 2.0, the transformative effects of mobile telephony, the arrival of networked, distributed, digital ‘butlers’ and ‘personal assistants’ with voice recognition capabilities and so on.
To conclude, perhaps, it is the reorientation and reassembly of the social that is important here – the topic of inquiry if you like. How social explanations and associations are being built around new technologies and other mundane and exotic objects is not knew but is perhaps now more clearly discernable in the face of commercially driven technological disruption and change – a rolling assemblage of breaching experiments and a generator of multiple socio-technical controversies – that allow us to render visible the social-in-action in all its mundane and sometimes exceptional ontological and organizational force.
William Housley, is a sociologist, based at the Cardiff University School of Social Sciences, who works across a number of research areas that include language and interaction, social media, the social aspects of disruptive technologies and the emerging contours of digital society, economy and culture. Professor Housley was a co-founder of COSMOS and is currently working on a number of ESRC funded projects that relate to digital society and research; he co-convenes the Digital Sociology Research Group at Cardiff University, is co-editor of Qualitative Research (SAGE) and serves on the editorial board of Big Data and Society (SAGE).
Social scientists have, overall, been slower to tap into the ever-increasing flow of “big data” than their peers in the physical and medical sciences. That lethargy is a tad ironic given that so much of the big data available, whether it be government administrative data or social media feeds like Twitter, don’t have to be imagined and created, but exist essentially ripe for the picking.
As Martha Sedgwick, SAGE’s head of project innovation, notes, “In the past two years over 90 percent of the world’s data has been created. The digital trails produced by us all as we go about our daily life (via smartphones, transportation, payment interactions) contain huge potential for social research. These vast data sets offer new ways to understand our world and look to solve societal problems and it looks like we are at the cusp of a major turning point in the social sciences as researchers work with these data to answer new research questions.”
Perhaps the earliest questions to address, however, are more meta: How will social science research and teaching evolve to meet the challenges and opportunities big data creates? How can we bring down barriers to make this new computational social science accessible for all social researchers? That was the subject of a panel discussion SAGE Publishing held in conjunction with the Campaign for Social Sciences as part of the recent ESRC Festival of Social Sciences 2016. The November 9 panel, titled Big Data, Social Media Research and Innovations in Research Methods, was chaired by Sedgwick and featured guests Sharon Witherspoon, head of policy, Academy of Social Sciences and Campaign for Social Science; Luke Sloan, senior lecturer at Cardiff University and deputy director of Cardiff Q-Step; and Mark Kennedy, director of the KPMG Centre for Business Analytics.
The full video appears in the video below, which in turn is followed by encapsulations of their remarks provided by each of the panel’s participants.
This promise of big data is not without its challenges, and the social sciences have been slower than other fields, like biology, astronomy and physics, in working with big data. Social researchers face a number of hurdles as they look to develop the capacity to collect and analyse these vast and varied datasets, potentially produced in real time. New tools are needed to collect and process these data, including volumes of unstructured text requiring new ways to bring together qualitative and quantitative research skills. New statistical and programming skills are needed, and are emerging both within the social sciences and through new interdisciplinary collaborations (universities like Cardiff and Imperial fostering these collaborations through new interdisciplinary research labs bringing together academics from across the social sciences with computer scientists). Secondary data available through social media channels like Twitter raise questions of representation and bias as well as questions of privacy and informed consent that require us to develop about new ethical frameworks.
‘Big data’ present exciting opportunities for social scientists to further our understanding of the world – and this understanding is a means to making it better. In the case of administrative data – data collected by government in the course of, for instance, administering benefits or tracking exam results – it can illuminate causes and linkages that are otherwise invisible. But to realize this vision the social science community needs to increase its number and data skills, to negotiate access in the face of government reluctance, and most important, to ensure that we have strong and thoughtful safeguards and ethical principles in place. This means we need to take seriously the need for ‘social consent’ and full transparency, as exemplified in the Administrative Data Research Network arrangements. If we take this seriously, it can allow us to examine big and meaty questions: about how social relations between people and environments affect individuals, or how about how culture – patterns of meaning – and social institutions interact.
Twitter presents us with a rich vein of data on opinions, attitudes and behaviors that allow us to ask new questions about the social world, but it is not without its problems. All methods and approaches have their drawbacks and for the traditional tools at the disposal of a social researcher such as surveys, focus groups and interviews, these are well explored, documented and understood. Yet Twitter is new and we need to begin by asking the basic questions around representativeness, research design and what can (and cannot) be measured. In this sense, there is a a body of work to be done around establishing what Twitter data is, what it means, how people use the platform and understanding the relationship between the individual and their online identity. Through providing case studies of how real world event manifest online, through taking what we already know about networks and applying it to retweets and mentions, through maintaining an open and honest dialogue of what works and what doesn’t we can start to make the strange familiar.
With smartphones, social media, and new markets for buying and selling the digital traces of our social and economic lifeways, we social scientists are at the threshold of a period of dramatic change — change that comes with both opportunity and challenge. With new technologies for monitoring and measuring all things social, we are only starting to assemble new high-res datasets of patterned human interaction, and these datasets hold the promise of explain rare but significant sentinel events such as those we have seen in the news this year. Quite simply, we talk about these events using words like ‘jolt’, ‘upset’, ‘eruption’, ‘shock’ and ‘crash’ because we lack theories to anticipate them. In the coming years, innovators will gradually start to better explain these events, identify their antecedents, and even predict them. Both individually and collectively, the question of social scientists is, will we be among these innovators?