WORDS AS DATA: CONTENT ANALYSIS IN LEGISLATIVE STUDIES

by 腦fficial Pragmatist 2024. 1. 19. 06:31

JONATHAN B. SLAPIN AND SVEN-OLIVER PROKSCH

6.1 INTRODUCTION

EVERY year, democratic legislatures produce reams upon reams of documents. The stack of paper includes draft bills, amendments to bills, committee reports, transcripts of floor debates, parliamentary questions, reports of special investigations, adopted legislation, and press releases. Until recently most of this material went unexplored, largely due to the daunting task of compiling it into a useable format and the lack of appropriate content analytic techniques. Sifting through the entire legislative record without the appropriate tools would be a fool's errand. Instead, legislative studies have been largely confined to the analysis of roll-call votes-a small, but relatively easily defined and collected portion of the legislative record. As parliamentary records have gone electronic, with parliaments storing documents in easily searchable on-line databases, their content has become more accessible to researchers and available for large-scale data analysis. Advances in computing power, machine learning techniques, and statistical methods have allowed researchers to develop tools to extract systematic meaning from the large corpus of text that legislatures produce.

Taken together, these advances have enabled political scientists to take advantage of this treasure trove of data to study representation, party politics, policy-making, and legislative behaviour. Over the last decade, advances in content analysis techniques have led to an improved understanding of party politics and legislative behaviour, as researchers have been able to explore previously unanswerable questions. We begin by describing content analysis, in particular recent approaches that treat "words as data, describe some important applications to comparative legislative studies, and then move on to discuss the variety of challenges scholars face moving forward. Lastly, we summarize the continuing trends in the field.

6.2 WORDS AS DATA: CLASSICAL CONTENT ANALYSIS

Kimberly Neuendorf describes content analysis as "the systematic, objective, quantitative analysis of message characteristics" (Neuendorf 2002, 1). The goal of content analysis is to extract meaningful content from an entire corpus of text in a systematic way. To do so effectively, scholars must analyse texts relevant to the research questions they wish to answer, consider the actors and processes that produced these texts, consider the techniques to analyse the content of texts, and consider what can be learned from the analysis.

Content analysis has a long history in the social science and communications literature. It was originally used to explore trends of foreign policy coverage in newspapers, and during the Second World War, it was used as an intelligence tool to gain information about changes in German policy, especially toward the Soviet Union. Following the Second World War, political scientists, sociologists, and communications scholars took up content analysis to study electoral campaigns and media, among other topics (for a detailed history, see Krippendorff 2004). Today, legislative scholars in political science, in particular, use content analytic techniques to learn about the nature of party ideology (e.g. Budge et al. 2001; Laver, Benoit, and Garry 2003; Slapin and Proksch 2008; Schonhardt-Bailey 2008), representation (e.g. Martin 2011; Grimmer 2011), issue salience (e.g. Quinn et al. 2010), or the ideological leaning of news media (e.g. Groseclose and Milyo 2005), to name just a few of the many possible applications. Most of the work in these areas can be classified as quantitative content analysis, in which researchers turn words into numbers-they may look at the overall length of texts, or classify or scale documents based on word counts or sentences. This is in contrast with qualitative approaches, which typically involve reading all documents and offering a synthesis or interpretation of their content.

Before the advent of computer-aided techniques, quantitative content analysis often involved physically measuring the length of text (i.e. taking a ruler to the front page of a newspaper), counting headlines, or manually classifying large quantities of text using human coders, a very labourand time-intensive process (see Krippendorff 2004, 5-9). In recent years, quantitative content analysis has been greatly aided by advances in computer science, statistics, and machine learning techniques. It has become much easier to parse documents into words and sentences, count these units, and extract meaning from their relative frequencies across documents. Of course, computer-based content analysis is not without its problems (and, naturally, its critics). We explore the various advantages and disadvantages of using computers to machine read and extract meaning from texts below, specifically with regard to legislative and party documents.

6.3 CONTENT ANALYSIS AND THE COMPARATIVE STUDY OF PARTIES AND LEGISLATURES

Legislative scholars use content analysis to examine at least seven features of legislatures: party ideology and polarization, government positioning, parliamentary scrutiny, constituency-based representation, policy agenda, quality of debate, and the role of media. Table 6.1 lists these themes together with their main approaches, core assumptions, requirements, typical data sources, and examples.

6.3.1 Ideology

In the vast majority of multiparty democracies, political parties publish manifestos during parliamentary election campaigns. Typically written by the party leadership, manifestos outline the party's program for government should it get elected, and structure the party's election campaign. Perhaps the most ambitious content analysis research program in political science is the hand-coding of these party manifestos by the Manifesto Project (Budge et al. 1987; 2001; Klingemann et al. 2006). Formerly known as the Manifesto Research Group and later as the Comparative Manifestos Project (CMP), the project examines party manifestos and government declarations, and the resulting estimates of party ideology and government positions produced by the project have been used in countless studies of legislative behaviour, coalition formation, policy-making, and representation (e.g. Martin and Stevenson 2001; Adams et al. 2006, 2011; Bawn and Rosenbluth 2006; Walgrave and Nuytemans 2009; Lipsmeyer and Pierce 2011; McDonald and Budge 2005; Warwick 2011). The project attempts to uncover the salient dimensions of party competition in a comparative manner across advanced, industrialized democracies. The fundamental notion of party competition guiding the project "was that parties argued with each other by emphasizing different policy priorities rather than by directly confronting each other on the same issues" (Budge et al. 2001, 6-7). The assumption that policy priorities reflect ideological differences has been referred to as a salience theory of party competition. To capture voters, parties tend to arrive at similar positions on the various issues (e.g. everyone has to agree that environmental protection is good-no one wants to compete on a platform of environmental degradation). However, some parties are perceived as being "better" on some issues than others-Green parties are more credible on issues of the environment than liberal parties, while liberal parties are more credible on business issues. During election campaigns, parties emphasize the issues they are perceived to "own" and ignore others.

The CMP coding scheme seeks to capture this facet of party competition. The project sorts each sentence of each manifesto into one of 56 categories. Most of the categories a e are positive in nature (e.g. positive mentions of the environment on the theory that parties do not make negative mentions of the environment). Thus, the coding scheme captures the relative emphasis a party gives to any one category over the other categories. The goal is to create a single coding scheme valid across countries and time. The most recent iteration of the project coded 1,314 manifestos written by 651 parties over 185 elections in 51 countries (Budge et al. 2006). Each manifesto is coded by hand-each sentence (or quasi-sentence) is read by a researcher and placed into one of the 56 categories (or remains uncoded if it is deemed to fit none of the categories). The end result of the coding process is the percentage of each manifesto falling into each category. These percentages can then be aggregated to produce general left-right party position scales or scales on other policy dimensions (Laver and Budge 1992; Lowe et al. 2011). In recent years, the CMP has been subject to trenchant critiques. The reading and coding of these manifestos is very labour-intensive and prone to inconsistencies, which the CMP does not sufficiently account for (Mikhaylov et al. 2012), and the manual coding protocol does not account for uncertainty in the text generation process (Benoit et al. 2009). Moreover, it is not clear the rigid coding scheme that was devised in the 1980s can be applied in the same way across time and across countries as the manifesto project assumes (Benoit and Laver 2007).

To alleviate some of these problems, researchers have attempted to harness the power of computers to extract positions from text. The first attempts to use computers to categorize political texts (and party manifestos in particular) generated dictionaries of words representing a particular ideology, generally culled from manifestos themselves, and then used computers to count the number of times the words in the dictionary appeared in any other given document of interest (Laver and Garry 2000). Laver, Benoit, and Garry (2003) built on this framework to create an automated technique for estimating ideology from manifestos, or any other political text. In creating Wordscores they developed a technique to compare word counts in reference documents that are carefully chosen by the researcher to word counts in documents of interest (virgin documents in the parlance of Laver, Benoit, and Garry). Based on the relation between the relative frequency of words in the reference documents (and the reference scores assigned to them to anchor the political space) and relative frequency of words in the documents of interest, the documents of interest are placed in a one-dimensional policy space. The assumption is that the relative frequency with which parties use words tells researchers something about their latent positions in a policy space. This technique has been applied successfully to examine legislative behaviour and party positions using as data both party manifestos (e.g. Proksch and Slapin 2006; Debus 2009) and parliamentary speeches (e.g. Giannetti and Laver 2005; Hakhverdian2009; Klemmensen et al. 2007; Bernauer and Bräuninger 2009).

However, its use has also raised some issues and concerns. First, scholars are tempted to use it as a method for estimating long time-series, but the question of how one chooses reference texts to create comparisons over time is not clear, and the choices one makes with regard to reference texts and positions can greatly affect the analysis (Budge and Pennings 2007). Second, the original transformation of word scores into document scores yields estimates of virgin text positions and reference texts on a different and incomparable scale. As a result, there has been some controversy over the best way to transform the virgin text scores to make them comparable with the reference texts (Lowe 2008; Martin and Vanberg 2008a; Laver and Benoit 2008). Lastly, the choice of reference texts is particularly important when attempting to estimate positions on different policy dimensions. Laver, Benoit, and Garry proposed to use all text in the manifestos in both the virgin and reference documents and to estimate new word scores simply by changing the reference values assigned to the reference documents without changing the actual text input. This means, however, that one poorly chosen reference text or reference value can drastically change the results. In the extreme, the estimated dimension might not be related at all to the dimension of interest since the reference texts do not contain language that discriminates on this dimension.

More recent work has attempted to estimate the ideology of parties and individual legislators without relying on dictionaries or reference documents (e.g. Monroe andMaeda 2004; Slapin and Proksch 2008; Proksch and Slapin 2010). Slapin and Proksch (2008) present and estimate a parametric scaling model called Wordfish to place documents in a one-dimensional space based solely on relative word frequencies. Rather than taking advantage of information contained in reference documents, as Wordscores does, Wordfish assumes word frequencies within documents are driven by a statistical model (namely that they are Poisson distributed), and are the result of underlying latent policy positions that can be estimated. The technique is thus applicable in instances where finding reference texts is difficult. To estimate positions on policy dimensions, the authors have proposed to apply the scaling model to relevant sections of the manifesto. Identifying those sections can be done manually as the manifestos are often structured into policy-specific paragraphs. However, this only works for broad policy dimensions (Proksch and Slapin 2008). Alternatively, scholars can rely on keywords from legislative databases that clearly identify policy dimensions to tag relevant sentences in the manifesto (König et al. 2010). Independent of the approach chosen, the important point is that policy dimensions are defined a priori and on substantive grounds.

As the approach is unsupervised and relies on the available data without requiring reference documents or a dictionary, the demands for data quality are quite high. If one wishes to estimate ideology from a set of documents, the principal purpose of the authors in writing the documents must have been to express a particular ideology. In particular, the Wordfish technique (and Wordscores, too, for that matter) best picks up ideological differences between documents when the data-generating process allows authors to emphasize or de-emphasize any issue they want. Researchers need to keep this in mind when applying the methods off-the-shelf. In particular, agenda effects may impact the estimation. Agenda effects may exist over time if politicians or parties use a different vocabulary to express ideology in different time periods, something that seems natural over long time periods but may not be as problematic over shorter ones. For example, parties may change how they express their positions on environmental regulation as new environmental problems arise. Whereas clean air or water standards may be the dominant theme in one decade, issues concerning global climate change may prevail in another. Thus, changes in the policy agenda over a long time period may alter the use of language for this policy domain in a way that distinguishes all parties in one decade quite well from those in the previous decade. Such differences, however, could be simply a function of the changing topics within the policy domain, and not the result of an underlying change in party positions on environmental regulation. Long time-series analysis of political text therefore pose particular challenges that must be taken into consideration.

Agenda effects can also occur when politicians must use the same vocabulary to talk about very specific issues. In such an instance, the agenda is pre-determined and parties or their members are not free to emphasize or de-emphasize issues as they can in a manifesto. For example, this may be problematic in parliamentary debates, where the topic of the debate is predetermined (and procedural language is more common than in manifestos). Finally, agenda effects may occur when researchers mix different types of documents that do not follow the same data-generating process. In this instance, the principal dimension extracted from the text data may simply reflect the different data generating processes rather than different ideologies. In short, techniques such as Wordfish or Wordscores have the potential to estimate positions for large collections of political text documents, but researchers should be aware of the assumptions behind these techniques in applied work.

6.3.2 Policy Agenda

Certainly not all content analysis in legislatures is aimed at estimating party or legislator positions, or relies upon party manifestos as data. An emerging field of research examines the topics of the policy agenda as expressed in legislative speeches or other text data such as blogs and press releases. Often, these techniques make use of so-called topic models to identify latent themes in the text corpus (e.g. Quinn et al. 2010; Grimmer 2010). Quinn et al., for example, explore the policy agenda of the US Senate. They uncover 42 latent topics and examine how attention to these topics vary over time, demonstrating that their measure tracks with the Senate's voting agenda, but unveils more nuanced information about the Senate's attention to various issues. Moreover, they present models to explain why Senators participate in debate on the latent topics they estimate. They consistently find Senators speak more often on topics when they serve on a relevant committee, however committee chairs do not tend to speak more than other committee members. In addition, ideological extremists tend to speak more, specifically on controversial topics such as judicial affairs, labour policy, taxes, and the budget. Topic models can therefore provide interesting insights into the operations of legislatures and representation. The models are unsupervised, and like text scaling mod. els, require the researcher to interpret the estimates ex post. Another requirement for topic models is that researchers need to set the number of latent topics. This may not always be a straightforward task, given that important topics may stay undetected if the number is too small, and be artificial if the number is set too high.

6.3.3 Beyond Policy and Ideology: Oversight, Representation, and the Media

The use of content analysis in legislative scholarship has expanded beyond ideology and political attention to fields as varied as representation, oversight, and the media. Other work, for example, has examined bills to study oversight or the stability of laws. Huber and Shipan (2002) examine legislative oversight of the bureaucracy by looking at the length of bills, assuming that longer bills place greater limits on what bureaucrats can do. To achieve cross-national comparison, they standardize the page formatting of bills and account for linguistic differences by applying a verbosity multiplier capturing the efficiency of different languages. Their underlying theoretical notion is that greater policy conflict in the legislature may lead to more discretion for bureaucrats. Therefore, legislators write more detailed (i.e. longer) bills to prevent agency loss when delegating powers to the bureaucracy. Maltzman and Shipan (2008), in contrast, use page length of legislation as a measure of law-specific complexity. They use this measure to test the hypothesis that complex laws are more likely to be amended in the future. Thus, in these instances bill length serves as a proxy for both detail and complexity, which may or may not be related. One could have a very complex, but vague law covering many different areas, or a bill of similar length that is narrowly focused, but highly detailed. More research on the full text of legislation seems desirable, but it is ultimately very difficult. Bills are of a highly technical nature, their legal language does not compare well to position statements by politicians, and the significance of bills is likely independent of their length.

Other recent work on parliamentary oversight has focused on how members of parliament scrutinize government behaviour by examining parliamentary questions (Proksch and Slapin 2011) and speech (Martin and Vanberg 2008b) rather than bill length. Proksch and Slapin find that members of the European Parliament from national opposition parties use their institutional position at the European level to gather information and scrutinize their home government's behaviour. Martin and Vanberg focus on legislative speech to examine how parties in coalition governments make use of backbenchers to keep tabs on their coalition partners and highlight differences between the partners.

Scholars have also turned to content analysis to examine the connection between MPs and their constituents. Martin (2011) uses parliamentary questions to measure constituency service, while Grimmer (2010) presents a topic model to understand how US senators use press releases to connect with voters. Martin, together with a team of research assistants, hand-coded approximately 124,000 questions asked in the Irish Dáil between 1997 and 2002 to determine whether they addressed local issues. He finds that about 44 percent of all questions asked in the Dáil have a local, constituency service focus. Grimmer (2010), in contrast, uses a Bayesian hierarchical topic model to examine how senators explain their work in Washington to their constituents. Grimmer's modified topic model takes into account the hierarchical structure of press releases issued by representatives (thus, political statements constitute the lower level and authors the higher level). He applies this model to estimate the issues senators emphasize in communication with their constituents. On the basis of the estimates of the expressed agenda, he shows that committee leaders in the US Senate pay more attention to issues in their policy jurisdiction than the average senator and that the senators from the same state tend to have a more similar expressed agenda than senators from other states.

Other work by Groseclose and Milyo (2005) examine US Congress speeches and media reports for references to think tanks. They use these think-tank citations as bridging observations to locate the media in the same ideological space as members of Congress to assess the presence of media bias. The underlying notion is that members of Congress receive higher utility when citing think tanks that hold a similar ideology to them, as do journalists. Their evidence suggests that there is a left-wing bias in the main-stream American press compared with the median member of Congress. Lastly, other content analytic work approaches legislative speeches from a deliberative politics perspective. Rather than estimate ideology or representation, the goal of this work is to examine the quality of debate as captured by levels of persuasion, justification, or respect, and relies on hand-coding of a number of selected speeches (Steiner et al. 2004; Bächtiger, this volume). The argument for this procedure is that higher quality debates lead to better democracy.

6.4 ISSUES IN CONTENT ANALYSIS

While advances in access to legislative documents and content analytic techniques have certainly pushed legislative studies forward, there are issues of research design to which anyone using content analysis must attend. First, and foremost, researchers typically employ content analysis to measure latent variables-variables that are not directly observed and can only be indirectly measured through indicators. Party ideology, for example, is a latent variable. The researcher can never directly observe a party's ideology. Instead, the researcher may observe indicators, such as the content of speeches made by party leaders in parliament, the direction of votes cast by party MPs, answers by party supporters to issue questions in surveys, and so on. From these indicators, the researcher infers the ideological stance of the party. Even this is not so simple, as it involves decisions regarding the dimensionality of the policy space, which itself is latent (Benoit and Laver 2006). When examining the indicators of party ideology (typically extracted from documents via some form of content analysis) how much information should be maintained? Is it desirable to boil down all the information to a single, general dimension? Or is it more desirable to estimate positions separately for economic, social, and foreign policy dimensions? Researchers must address these important questions before even starting to measure the positions of parties or MPs.

Any attempt to measure such latent concepts must take into account the validity and reliability of the proposed measure. Validity refers to the degree to which the given measure captures the nature of the underlying latent concept, and is often thought of as the (lack of) bias. Because the concept we wish to measure is latent, there is no direct way to assess validity. Instead, researchers typically assess validity by comparing two or more different measures of the same latent concept, in the hope that they correlate, or even better they agree (see Mikhaylov, Benoit, and Laver 2012 for a discussion of the difference between correlation and agreement). Alternatively, researchers examine construct validity by asking whether the indicators they are using are theoretically related to the underlying concept they wish to measure. When choosing documents to content analyse, researchers must be concerned with construct validity. Were the documents written in such a way as to provide information about the underlying concept of interest? If one is interested in measuring ideology, for example, it probably makes more sense to content analyse MPs' floor speeches, in which researchers can reasonably assume MPs express their positions with regard to legislation, than to analyse MPs' Twitter statements, in which they may issue brief comments on current events, but not necessarily on current legislation.

These questions of validity suggest that researchers must pay careful attention to the strategic environment that produced the texts they wish to analyse. They must carefully examine the data generating process behind the texts. If a researcher wishes to use legislative speeches to examine party ideology, he or she must carefully consider why it is that legislators give speeches, and what pressures they face when they do so. Are they attempting to communicate with their constituents, their fellow party members, or members of other parties? To what extent does their party leadership control what they say on the floor, or even control their access to the floor? The audience and degree of party pressure will determine the content of the speech, and the extent to which the speech captures the legislator's own ideological position versus the position of some other group (e.g. the party or the constituency). Alternatively, speeches may not be about ideology at all, but may instead provide an opportunity to talk about constituency service and the government funds the legislator has brought back to the district or simply to criticize the government's policy decisions. In this instance, analysing speeches may provide information about the importance of pork-barrel politics and about the difference between government and opposition MPs, but not about a particular ideological stance. In other words, it is impossible to extract information from a document that the document was not meant to convey. Proksch and Slapin (2012) argue, for example, that institutional constraints and electoral considerations greatly affect who may give a speech on the floor of parliament and the content of their message. Unless researchers account for the strategic nature of speech when performing content analysis, they will not have a valid measure of the latent concept they wish to capture. This is why party manifestos have proved so useful when trying to capture party ideology. They are written with the aim to present the party's primary ideological program in the midst of electoral competition. The data-generating process is clearly related to the latent variable researchers wish to measure.

Relatedly, researchers must be concerned with issues of reliability. A reliable method is one that produces the same result each time the analysis is conducted, and is related to the statistical concept of precision (the inverse of the variance). A reliable measure may or may not be valid, but if it is not valid, it is not valid in the same way each time, regardless of who is doing the measuring. These issues, validity and reliability, get to the heart of the debate over whether human coding or machine coding is better when it comes to content analysis. Proponents of machine coding argue that their techniques are 100 percent reliable (even if they may not be perfectly valid). When using Wordscores or Wordfish (or indeed any computer algorithm) to estimate party positions from policy documents or manifestos, researchers making the same research design decisions (e.g. which documents to include in the analysis, whether to stem words to their roots or not, and whether to exclude stopwords, i.e. words that have no ideological content such as prepositions and conjunctions, from the analysis) will obtain the same results every time they conduct the analysis. This is not necessarily true for projects using human coding. Two human coders may read a document in a different way and categorize it differently. Perhaps, on average, they will get the same result as machine coding would, but there is likely to be more noise in the process. Mikhaylov, Benoit, and Laver (2012) find that there is a great deal of noise in the process of human-coding manifestos in the manner of the CMP, for example. This is not to say that machine coding is superior to human coding. Those who argue in favour of human coding often suggest that validity is higher with human coding. Computers may always produce the same answers, but those answers are not necessarily right. Humans are better able to draw on a wide variety of knowledge to "correctly" classify texts, whereas

computers can only draw on the information human researchers give them. Of course, one means of assessing reliability among human coders is to examine intercoder reliability. If two or more people are confronted with the same coding task, to what extent do they arrive at the same answer? If multiple coders are unable to arrive at the same answer, perhaps the coding assignment is too difficult, or the instructions given to coders is not sufficiently clear. This has been a criticism of the CMP, for example. Typically, the CMP has only employed one coder per document, making it impossible to assess inter-coder reliability. Recent work has examined the extent to which multiple human coders are actually able to classify sentences on the basis of the CMP scheme (Mikhaylov, Benoit, and Laver 2012). The results were not encouraging. Common measures of inter-coder reliability (namely Fleiss's Kappa) show that coders were not able to classify sentences in a reliable manner. In its latest installment, the manifesto project

now uses two coders to analyse each manifesto, but it does retain the coding scheme. Lastly, when analysing a corpus of text, researchers must ensure that all texts within that corpus are comparable. For example, it would not make sense to analyse speeches, bills, and manifestos all at the same time as part of a single corpus. As discussed above, these texts were written for different purposes, have different intended audiences, use different language, and are subject to different constraints. In short, they were produced using different data-generating processes and are, therefore, not directly comparable. While it may seem obvious that it does not make sense to compare speeches and manifestos at the same time, oftentimes comparisons are made that are less obviously problematic, but nonetheless raise important issues. For example, one may question whether it is appropriate to compare the content of manifestos across countries and time. The underlying dimensions of political competition are not the same everywhere, and also vary across elections-the most salient dimensions of partisan conflict in the legislature may differ in Germany and Japan; and the most salient dimensions of conflict in the 1970s may or may not be the most salient dimensions of conflict today. Likewise, while parliamentary questions may primarily serve as a means of oversight in one parliament, elsewhere they may serve primarily as a means to address constituency service issues. The differences may be due to institutions governing electoral incentives, party competition, or other considerations. Regardless, even if the type of document is nominally the same, if the underlying strategic environments that produce the documents are different, the content of the documents may not be directly comparable.

6.5 WAYS FORWARD

Recent trends in content analysis and the study of legislatures involve both applying new technological techniques and examining new sources of data to answer new substantive questions. Methodologically, there has been a rapid move toward importing new methods of machine learning from the fields of computational linguistics and computer science (e.g. Hopkins and King 2010; Quinn et al. 2010; Diermeier et al. 2012; Spirling 2012). Other recent advances involve taking advantage of the new legislative databases to examine forms of parliamentary behaviour that were largely ignored previously-e.g. parliamentary questions. The recent studies discussed earlier all use the newly available data to study questions regarding constituency service, representation, and much more in ways not previously possible. These studies have led to new insights regarding party competition, and have also allowed researchers to peer into the black box of intraparty dynamics to explore how backbenchers interact with party leaders (e.g. Martin and Vanberg 2008b; Proksch and Slapin 2012). To use these data in the most effective manner, researchers must pay careful attention to the (often unobserved) processes generating the text they use as data, as well as the assumptions of the methods they employ. Next, we outline a series of points that we feel content analytic studies of legislative politics must address.

6.5.1 Strategic Data-Generating Process

Politicians are strategic communicators. This means that all content analysis needs to consider the data-generating process of the specific textual data being used. Not all political text in legislatures is created equally. When delivering speeches or asking parliamentary questions, politicians may have a specific audience in mind. Draft legislation has its own technical features that makes it look very different from positional statements by parties and their MPs. Text as data approaches ask a lot from the data. For instance, when applying Wordscores or Wordfish the assumption that word counts are generated according to one-dimensional ideological preferences needs to be satisfied if this extracted dimension is interpreted as ideology. If the agenda is prestructured and actors are not free to emphasize or de-emphasize issues (e.g. on the aggregate in parliamentary speeches), this may be more problematic than in a context where actors are free to choose issues (e.g. during election campaigns or in parliamentary debates on specific topics such as the budget). Another implication is that it might not be advisable to apply the same estimation technique across different types of documents, in particular if they are generated differently. For instance, comparing positions based on speeches and manifestos may be particularly difficult because the audience and the context for the message is quite different. Finally, it is important for scholars to consider the institutional context. In some political systems, for instance, it is easier for backbenchers to participate in parliamentary debates than in others (Proksch and Slapin 2012). Thus, there is the possibility that even though the distribution of preferences is identical across political systems, the institutional hurdles in one system prevent the expression of preferences, but allow them in another. Failure to account for such differences is likely to lead to biased inferences.

6.5.2 Level of Analysis

Some research questions involve measurement at a very detailed level. For instance, when the goal is to find out if MPs ask parliamentary questions regarding the oversight of legislation in force, then the measurement process must identify questions asked by individual members and perhaps hand code features of those questions. In other instances, the goal may be simply to find out the ideological dispersion of the party system in parliamentary debates or the topics on the legislative agenda. Here aggregating all speeches by members of the same party and applying unsupervised scaling or classification techniques may be sufficient. The measurement approach used needs to fit the question at hand. In some instances, a more labour-intensive and costly hand-coding approach, ideally involving multiple coders, may be more advisable, whereas in others computer-assisted techniques seem more promising when dealing with large collections of text. Thus, content analysis techniques should be chosen according to the level of analysis, which in turn should be motivated by the substantive research question of the project.

6.5.3 Latent Variables

Ideology and the salient dimensions of partisan conflict are extremely difficult concepts to measure. These variables, and other related concepts (e.g. intra-party and party system cohesion) are fundamentally unobservable, but they are indirectly expressed in various forms by members in parliament. MPs express their latent positions through political speech, voting in parliament, sponsoring legislation, or through statements in media and during election campaigns. This means that the appropriate way to approach the study of ideology is to treat it as a latent variable and apply a relevant measurement model. But developing appropriate measures of these latent concepts also means accounting for the strategic nature in the legislative environment that produces the data-e.g. what pressures are exerted by parties on members when speaking, voting, participating in committees, etc. The latent variable approach stands in contrast to other approaches, for example, the CMP, that assume there is a correct "gold standard" that coders can attain given sufficient training.

6.5.4 Uncertainty

Both human-based and computer-assisted content analysis involves measurement uncertainty. Multiple human coders will never perfectly agree on how to code text. Existing hand-coded approaches typically provide intercoder reliability measures based on selected codings, but rarely incorporate uncertainty into the coding framework. Computer-assisted analysis needs to work with available text. Classical dictionary-based analysis makes it difficult to gauge measurement error, whereas more recent latent variable models offer ways to take uncertainty into account, either in a frequentist or Bayesian framework. Ultimately, uncertainty in political text analysis has multiple components. First, the text generation process itself is stochastic, meaning that MPs or parties can use different text to express the exact same position (Benoit et al. 2009). Second, there is measurement uncertainty associated with the chosen statistical model. Capturing measurement error is, of course, not an end in itself. Ideally, it should be carried over into secondary analysis that use quantities of interest extracted from text as independent or dependent variables (e.g. Benoit et al. 2009; Proksch and Slapin 2010).

6.5.5 Consider What Legislators Do Not Say

All content analysis techniques rely on available political text to make inferential statements, but techniques differ in the way they include information about what is not said. A strategic model of political language may predict that certain parliamentary actors receive less floor time for electoral and institutional reasons (e.g. Proksch and Slapin 2012). Party leaders, for example, may wish to prevent dissidents from taking the floor of parliament to present views that run contrary to the party's primary platform. When leaders systematically prevent certain party factions from giving speeches, parliamentary speeches may not offer an accurate picture of the true views (and diversity of views) within the party. If this is true, then models that fail to account for selection effects will produce results that are invalid or biased (with the direction and extent of bias not necessarily being clear a priori). Therefore, the analysis of missing values may allow researchers to examine under what conditions actors choose to remain silent and what this means for inferences regarding political conflict in parliaments and governments.

6.6 CONCLUSION

Content analysis is a powerful tool in studies of parties and legislatures. New advances in data and methods make it a particularly interesting and dynamic technique for examining latent variables such as ideology. More and more data are becoming available, and new statistical techniques and text as data approaches are being developed to analyse them. To effectively apply content analysis to the study of legislatures, though, researchers must carefully consider the assumptions of the estimation techniques they are applying, and the nature of the data they are using. These considerations include, for example, whether hand-coding or computer-coding is best, whether a supervised or unsupervised learning technique is most appropriate, and whether, given the data-generating process, extracting the desired information from the text is even possible. Further advances are most likely to occur at the intersection of a variety fields. Scholars of legislative politics who apply content analysis will likely greatly benefit from advances in the fields of computational linguistics, statistics, and computer science.

In the absence of a general theory of political speech (or text), scholars must continue to adapt content analysis strategies to the problem at hand. New data mean that existing measurement models for text may be of limited use when applied in a context other than the model's original intent. Furthermore, new types of text data are always becoming available. Comprehensive on-line archives of parliaments have led to better access to text data for content analysis. However, parliaments can change the structure of their databases, meaning the access to data and the way in which it is accessible may change without notice. There is a need for political scientists to capture the information provided by parliaments and store it in a more permanent and stable format through the construction of databases to ensure the replicability of studies using data from parliamentary archives. While text as data techniques hold great promise in the study of legislatures, careful application is required. The five points outlined in the previous section may serve as a guideline for researchers when applying quantitative content analysis techniques. Finally, valid comparisons across different types of documents may be a desirable research goal, but one that is difficult. We see the conceptual and empirical integration of these different data sources, for example, draft legislation and policy statements of parties and legislators, as one of the future challenges in the text-as-data literature.