Ecology and SocietyEcology and Society
 E&S Home > Vol. 26, No. 4 > Art. 9
The following is the established format for referencing this article:
Suan, A., K. M. Leong, and K. L. L. Oleson. 2021. Automated content analysis of the Hawaiʻi small boat fishery survey reveals nuanced, evolving conflicts. Ecology and Society 26(4):9.

Automated content analysis of the Hawaiʻi small boat fishery survey reveals nuanced, evolving conflicts

1Department of Natural Resources and Environmental Management, University of Hawaiʻi at Mānoa, 2Pacific Islands Fisheries Science Center, National Marine Fisheries Service, National Oceanographic and Atmospheric Administration (NOAA)


Manual content analysis provides a systematic and reliable method to analyze patterns within a narrative text, but for larger datasets, where human coding is not feasible, automated content analysis methods present enticing and time-efficient solutions to classifying patterns of text automatically. However, the massive dataset needed and complexity of analyzing these large datasets have hindered their use in fishery science. Fishery scientists typically deal with intermediately sized datasets that are not large enough to warrant the complexity of sophisticated automated techniques, but that are also not small enough to cost-effectively analyze by hand. For these cases, a dictionary-based automated content analysis technique can potentially simplify the automation process without losing contextual sensitivity. Here, we built and tested a fisheries-specific data dictionary to conduct an automated content analysis of open-ended responses in a survey of the Hawaiʻi small boat fishery to examine the nature of the fishery conflict. In this paper we describe the overall performance of the methodology, creating and applying the dictionary to fishery data, as well as advantages and limitations of the method. The results indicate that the dictionary approach is capable of quickly and accurately classifying unstructured fisheries data into structured data, and that it was useful in revealing deeply rooted conflicts that are often ambiguous and overlooked in fisheries management. In addition to providing a proof of concept for the approach, the dictionary can be reused on subsequent waves of the survey to continue monitoring the evolution of these conflicts. Further, this approach can be applied within the field of fishery and natural resource conservation science more broadly, offering a valuable addition to the methodological toolbox.
Key words: automated content analysis; codebook; conservation conflicts; fisheries; Hawaiʻi; text mining; qualitative coding


As with many natural resource issues, managing fishery resources means managing human interactions with fish, so the social context is just as important as biological and ecological factors (Fulton et al. 2010). To gain a better understanding of social issues, fishery management is informed by many sources of qualitative information. Content analysis is a popular research technique to analyze qualitative data and make valid inferences. Such analysis historically has been done manually, where researchers read and interpret textual data and assign thematic codes. Although qualitative data analysis software such as MAXQDA, NVivo, or Atlas.ti are often used to assist this process and for data organization, the initial coding process is still conducted by hand using the software interface. Our current digital age brings substantial changes to content analysis and new possibilities for understanding large volumes of textual data. In the era of “big data,” automated content analysis methods have emerged that can complement manual methods, although they are not yet frequently applied in fisheries or other natural resource conservation contexts. Each method has different strengths and weaknesses where choosing which method is most appropriate for a research question can, in part, be determined by dataset size. Automated methods are well suited for big data where the size exceeds the capabilities of traditional manual methods and where the dataset is large enough for reliable machine learning. However, evaluating whether a dataset is “big” can vary by subject domain, variable type, and research questions (Connolly-Ahern et al. 2009, Luke et al. 2011). For example, 1000 tweets could include a lot less information than 10 one-hour interviews. In a fishery context, the size of the dataset is often in a middle ground that cannot be considered “big” enough for reliable use of automated content analysis alone, but it isn’t “small” enough for traditional manual content analysis to be cost-effective. This article provides guidance for a modified automated approach that can be applied to moderately sized datasets such as those typical in the field of fishery science and other conservation issues.

Examining social issues in fisheries science routinely relies on manual content analysis (MCA) to systematically organize patterns emerging from the narrative text (Kamhawi and Weaver 2003). MCA allows researchers to analyze data and interpret its meaning by manually coding text units and then constructing concepts from the occurrence of coded units. As a research method, MCA represents a systematic approach to describe and quantify salient concepts (Elo and Kyngäs 2008). By distilling long text into fewer concept-related categories, researchers can test theory and enhance understanding of the topic at hand. Categories serve as a conceptual system to reveal the underlying meaning of the content and to develop valid inferences from the text. However, the method has limitations, namely, it is costly and time intensive for larger datasets. The cost of manual content analysis is roughly proportional to the amount of data being analyzed.

Automated content analysis (ACA) can improve efficiency by using the power of machine learning to analyze text. ACA has a high initial cost but once set up, increasing the amount of data requires little additional analytical effort (Trilling and Jonkman 2018). There has been impressive progress of automated tools for digital text analysis to handle ever-growing datasets. Some popular ACA approaches are supervised and unsupervised machine learning algorithms that have expanded the ability to scale up manual classification when dealing with large textual data (Maier et al. 2018). Although supervised and unsupervised machine learning techniques have the capacity to classify textual content in a consistent and scalable fashion, both require a dataset large enough to build accurate and reliable coding schemes. In fields such as fishery science, datasets typically deal with intermediately sized datasets that are not large enough to warrant the development of complex machine learning algorithms.

A dictionary approach is one of the more simple and straightforward ACA approaches that can be used to analyze moderately sized datasets. It mechanizes a custom thesaurus-like dictionary containing keywords and phrases that represent a category or thematic construct. The software then reads the corpus; each time the dictionary encounters a specific word, it is counted and binned into that construct (Deng et al. 2017). Therefore, when using the dictionary approach for moderately sized data, the success will rely on the consistent technical vernacular used within the dataset. The dictionary technique has been a useful tool for social scientists in fields such as tourism research to analyze open-ended responses (Stepchenkova et al. 2009), agriculture to analyze policy statutes (Robson and Davis 2014), political science to analyze floor speeches (Grimmer and Stewart 2013), and medical science to analyze drug reviews on forums (Asghar et al. 2013). However, the technique poses some limitations. The dictionary still requires several subjective steps to develop the dictionary and associated categories. Further, words have different meanings in different contexts, and determining how many different ways a word can be categorized can be challenging. Despite these limitations, the dictionary ACA approach may be well suited for the type of qualitative data we typically see in a fishery setting for two reasons. First, social issues within a fishery are relatively predictable and limited in scope. The limited breadth of conflict topics that typically arise from fishery datasets consistently use key terms throughout the narrative text to denote that specific topic. A fishery-specific dictionary could include the industry-specific terms while being sensitive to the semantics that may vary from common usage in the English language. Second, the moderate dataset size lends itself to a dictionary approach. To manually create the dictionary, researchers only need to read a subset of text and discover the most covered topics for analysis.

To extend ACA applications within a fishery context, we build and test a fisheries-specific dictionary to conduct an automated content analysis of conflicts identified in open-ended responses in surveys of the Hawaiʻi small boat fishery. Specifically, our dataset includes two socioeconomic surveys conducted in 2007 and 2014 by the Pacific Islands Fisheries Science Center (PIFSC), a branch of the National Oceanic and Atmospheric Administration (NOAA). The main purpose of the surveys was to assess the economic, social, and cultural characteristics of the fishery. Survey respondents were fishers holding a State of Hawaiʻi Commercial Marine License who fished using small boats (typically under 40 feet) and sold at least one fish in the survey’s respective year (Hospital et al. 2011, Chan and Pan 2017). Responses were recorded from fishers across all islands of Hawaiʻi and included demographic information, vessel characteristics, fishing activity, fishing motivation, and more. Like many fisheries, the Hawaiʻi small boat fishery experiences conflicts over management options. One open-ended survey question asked, “Do you have any suggestions for how Hawaiʻi’s fisheries should be managed or topics that you feel need further study?” Responses previously were examined for broad resource topics, but indications of conflict between commercial fishermen, non-commercial fishermen, and managers suggested that a more robust analysis of responses could improve understanding of the drivers of conflict. This data set is well-suited to evaluate the utility of a dictionary-based ACA approach as an effective method to analyze moderately sized datasets because it is both large enough to include a range of themes and topics, yet small enough to effectively compare results to manual coding.


To assist in both the MCA and ACA research process, we used Provalis Research as our data analysis platform. Provalis Research is one of the few qualitative data analysis platforms that includes the ability to create a custom data dictionary that interacts with manual content analysis. We used two different types of software from Provalis Research to assist in the coding, annotating, and dictionary building. QDA miner 5.0.23 allowed us to conduct MCA whereas WordStat 8.0.16 was used to create a custom fishery conflict dictionary for ACA. Because the two different types of software are from the same company, they communicate with each other seamlessly to facilitate analysis. Metadata for this project is available through NOAA’s InPort enterprise management system (PIFSC 2021).

Conceptual framework for fisheries conflict

When analyzing any corpus of text, a strong conceptual framework facilitates the content analysis process, providing theoretical underpinning for the research question and analysis, and potentially insightful data classifications, themes, and codes (Green 2014). We use Madden and McQuinn’s conflict model, which classifies conflict into three levels and drivers into three dimensions. Figure 1 illustrates a modified version of the Madden and McQuinn (2014) model classifying conflict into three levels: disputes, underlying conflicts, and identity-based conflicts. The dispute level is straightforward and represents the tangible issue or disagreement. Underlying conflicts add another layer of complexity in which conflicts carry over meaning from previous unresolved disputes that add significance to the present situation. Identity-based conflict involves values, beliefs, and objectives defining an individual’s identity. In resource conflicts, where the use of the resource is deeply intertwined with identity, people will vehemently resist if they feel their social identity or access to a resource is being threatened. Conflict can persist, or even worsen, if these underlying and identity-based conflicts go unaddressed (Madden and McQuinn 2014). However, underlying and identity-based conflicts are not easily articulated, nor accurately identified. People primarily voice the tangible dispute level. This often leads to conflict resolution approaches designed to focus on substantive disputes, which can perversely increase social tension among stakeholder groups, erode trust and common understanding, and ultimately contribute to a decline in fish stocks (Pomeroy et al. 2007, Murshed-e-Jahan et al. 2014, Spijkers et al. 2018). Madden and McQuinn complement their levels with three dimensions of conflict that may be driving environmental conflicts to deeper and often invisible levels and thus must be addressed by management (Madden and McQuinn 2014; Fig. 1): “substance,” directly addressing the dispute level, “relationships,” the personal conflicts and the quality of the relationships (trust or level of respect) between stakeholders, and “process” used to enhance decision-making design, equity, and implementation. We overlaid the dimensions of conflict on the levels of conflict to illustrate that successful management responses need to address substance, relationships, and process dimensions together at each of the dispute, underlying, and identity-based conflict levels. Although it is difficult to accurately decipher the deeper levels of conflict through fishers’ responses, dimensions can be easier to interpret. Therefore, the following methods use the substance, relationships, and process dimensions to guide the manual content analysis processes and indirectly tackle deeper levels of conflict.

Manual content analysis

We first applied manual content analysis to a single dataset, the PIFSC’s 2014 small boat fishery survey (373 responses), through an abstraction process of open coding and creating higher order themes. We coded the open-ended comments using grounded theory (Charmaz 2006, Glaser and Strauss 1967), which develops theory from observations. Following Erlingsson and Brysiewicz (2017), we manually attached notes, headings, and descriptive labels to the content relevant to our conflict framework as they emerged from the data. As we began to see similar ideas repeated, we refined the labels into consistent codes, revisiting the data to ensure that the labels were applied consistently. The final codes represent specific concepts that describe conflict through management suggestions and assist in organizing the underlying meaning of the content.

The next step was to organize the codes into higher order themes by grouping codes that are related to each other in relation to the conflict framework discussed above. Themes related to the direct dispute or conflicts within the fishery were binned under the substance dimension. Themes under the relationship dimension involved the interaction of different actors within a fishery, and process themes related to the decision-making design, as well as perception of equity. Each increasing level of abstraction, from codes to themes to the dimensions of the overarching framework elucidate different concepts from the text and generate either broad or specific knowledge from the data set (Bengtsson 2016, Cavanagh 1997). The final coding scheme includes 20 codes and 7 themes related to the dimensions of conflict framework (Table 1).

The comments from one respondent could include more than one code. For example, the following comment would be coded as access issues, overfishing, and blaming netters, which fall under the substance and relationship dimensions within the conflict framework:

Why do we have deep bottom restricted areas if we have a quota. Personally, I think there should be no restricted areas but a quota. I think Opelu and Akule netting needs to be banned. The amount of fish we see while going night time to catch these two species have severely declined over the years. I have seen the netters scoop entire schools of these fish. Yes there will be about 10 people unemployed if you ban these netting but overall there will be thousands of people that will get to enjoy catching and eating these fish in the future. It is about sustainment and netting is depleting the species.

Multiple codes attached to the same comment are known as co-occurrence. Examining coding co-occurrence can serve as an indication that there may be implicit communication patterns in the text (Armborst 2017) and can provide insight into the relationships between the codes, themes, and broader dimensions.

Development of codes and relationships between them were discussed with the research team throughout the coding process. All coding and analysis was conducted by the first author for internal consistency.

Building and testing the automated content analysis

To construct a custom dictionary, researchers need to identify the right words and phrases within the text of interest and assign them to respective codes. We have identified codes from the narrative text; here we develop a custom dictionary from the manual coding, apply the dictionary to the unclassified 2007 dataset to automate coding classification, and evaluate how well the MCA and dictionary-assisted ACA perform.

Dictionary construction

The first step to build the dictionary involves pre-processing the dataset to prepare the corpus for further analysis. We conducted two types of pre-processing techniques: spelling check and stop word removal. First, spelling mistakes were simply corrected for the entire dataset. Stop words with little semantic meaning (“a,” “the,” “and,” etc.) are put into an exclusion list that instructs the computer to overlook these words (Deng et al. 2017). A processed dataset allows researchers to examine the most frequent keywords and assign them to a particular code based on associated meanings.

Data dictionaries consist of three primary elements: “the entry (words and phrases), the categories, and the association between entries and categories” (Deng et al. 2017:952). The manual coding in the previous section yielded specific codes guided by the dimensions of conflict conceptual framework. In this step, we identified core words and phrases associated with those codes and binned them into the corresponding dictionary codes. Each entry into the dictionary acts as an indicator for the code category. A dictionary entry is known as a “keyword” in Provalis Research and can consist of words or phrases. The dictionary assumes that the meaning of a particular unit of text is dependent only on the occurrence of that specific keyword. For example, if the fishery survey revealed an “economic” category, the keyword entry list for the economic category may have included: “cost,” “expensive,” “price,” “make a living,” etc.

Although we found that keywords and phrases generally encapsulated the correct meaning of the category, automatic categorization using the dictionary can trigger Type I errors (false positives). Custom proximity rules can help mediate the number of Type I errors by increasing the overall precision of concepts. One code in our dataset, “overfish” was likely to trigger a false positive when used in the negative. If fishers’ direct response is: “fish are overfished,” the software will detect the word “overfished” and correctly place it into the predetermined “overfishing” category. However, if the response was “fish are not overfished,” the software will incorrectly categorize the response into the overfishing category. To account for this effect, we created a rule where keywords under the “overfishing” code were only true if they were not preceded by a negation (no, not, never, etc.) within five words of the same sentence.

Finally, to extend the data dictionary’s capabilities, we refined it with synonym and antonym extension. Synonym and antonym extension are relatively straightforward; a feature in WordStat 8.0.16 can add synonyms and antonyms of particular words to the data dictionary. We then assessed each recommendation to decide whether the entry should be added or disregarded to the final dictionary, ultimately increasing applicability to other unclassified data sets.

Validation and accuracy assessment

Before directly applying the dictionary to an unclassified dataset, it should be validated. For the first round of validation, we used the keyword in context tool (KWIC) to assess the performance of the dictionary on the 2014 dataset. The KWIC tool displays each automatically classified keyword in its original context. The researcher can then assess whether each specific word was accurately classified into the appropriate code. The similarity between the automated and human coding results are the primary indicator of dictionary validity (Deng et al. 2019). For each automated classification, we compared the automatically applied code to our manual codes. We then calculated the percent of correct categorizations for each code. Unsurprisingly, the automated coding was accurate when compared with manual coding because the data dictionary was built from the manual codes that emerged in the 2014 dataset. However, this is a necessary step to ensure the dictionary will not trigger false positives because keywords may have a different meaning in a different context.

Ultimately, the purpose of our dictionary is to facilitate future analysis. Once we confirmed the dictionary would perform well on the dataset from which it was derived, we applied it to a similar NOAA small boat fishery survey from 2007 to evaluate how accurately it performed on a separate dataset. Because we did not conduct MCA on the 2007 dataset, we relied on the KWIC tool to display each automated code in relation to the entire response. Instead of comparing manual codes to determine the accuracy, we determined whether each automated classification fit the definition of the codebook from Table 1. We determined the accuracy of each code by calculating the proportion of times it was correctly classified over the total number of times it was detected in the dataset. To confidently apply the data dictionary to a different unclassified text, automated coding needs to successfully classify the entries into their correct category at least 80% of the time (Bengston and Xu 1995, Young and Soroka 2012, Deng et al. 2017).

Analysis of conflict

Once both datasets were classified, the last step of our project was to use the results of our analysis to examine some fisheries management-relevant questions. We first investigated whether the proportion of codes changed over time and/or if codes differed by motivation within the same year. Each survey response was assigned a unique identifier number and categorized by motivation into commercial and non-commercial fishers. Even though all respondents held a State of Hawaiʻi Commercial Marine License, previous work has shown that this does not necessarily mean that they identify as commercial fishers (Leong et al. 2020). Because identity is one of the deeper levels of conflict, we classified fishers based on responses to a question that asked how they defined themselves as fishermen. Those who responded full-time commercial or part-time commercial were classified as commercial fishers. Those who responded recreational expense, purely recreational, subsistence, or cultural were classified as non-commercial fishers. We included recreational expense fishers in the non-commercial category because even though they sell fish to offset expenses such as gas, ice, and bait, their primary motivation is for recreation, not profit. Three respondents did not respond to this survey question and were not included in this analysis. Because of the sample size of fishers’ motivation between the two data sets, we used a multinomial logit model to account for the variation in nested data.

The second question draws from the conflict framework theory, which suggests that deeper levels of conflict are often hidden in easier to articulate surface level disputes. Understanding how the dimensions of conflict are related to each other can offer insight to help accurately decipher intangible drivers of conflict from the tangible substance dimension. To examine the relationship between each dimension, we measured instances of co-occurrences between codes. We used the built-in code co-occurrence tool to examine the proximity of codes within survey responses, as related to the substance, relationship, and process dimensions. We used the Sorensen’s coefficient similarity index, a statistic that compares the similarity between different samples using the following formula:

Equation 1(1)

where a represents cases where both items occur, and b and c represent cases where one item is present but the other one is absent, to determine how often and consistently two codes co-occur or overlap throughout the entire text sample. The coefficient can take values between 0 (indicating no coding overlap) and 1 (indicating perfect overlap).


The results demonstrate the dictionary “proof of concept” in the field of fishery science and is structured as follows: (1) we present the fully developed Hawaiʻi small boat fishery dictionary and assess the accuracy of the applied dictionary to an unclassified survey, (2) we illustrate the specific content analysis results for the two datasets, and (3) we answer fisheries-management related questions.

Data dictionary

We identified 274 keywords in the fully developed data dictionary (Table 2), 146 keywords were in the substance category, 73 in relationships, and 55 in process. Table 2 shows the assignment of keywords to each pre-specified coding category. Words containing the * symbol include all letters following that root word. For example, “enforc*” will match the word “enforce,” “enforces,” “enforced,” “enforcing,” and “enforcement.” The # symbol stands for a single numerical digit where two ## symbols represent any combination of the two digits. For example, “under ##” encompasses phrases such as “under 10” [pounds], “under 15” [pounds], “under 20” [pounds], etc. The only rule embedded in the dictionary was used to account for negations in survey responses specifically for the words under the “overfishing” code. The rule improved the reliability of the 2007 data set from 79% without the rule to 96% accuracy and from 84% to 92% in the 2014 data set.

Assessing the accuracy of ACA

Table 3 shows the accuracy of the automated technique by each abstraction level. The total average accuracy was 92% for the 2014 dataset and 89% for the 2007 dataset. Breaking this down, the substance, relationship, and process dimensions had an accuracy of 92%, 93%, and 88%, respectively in 2014, and 92%, 88%, and 83% for 2007. The higher performance for 2014 is explained by the fact that the dictionary was developed based on that dataset. Narrowing down the abstraction level, we found certain categories were more accurate than others. The “lack of traditional knowledge,” “distrust,” and the “Hawaiʻi small boat fishery is not the problem” categories were accurate only 76%, 78%, and 79%, respectively, below our bar for validity. All other categories in the 2007 and 2014 datasets met the criteria for successful validation.

Thematic analysis

All 373 responses to the 2014 survey were manually coded into seven prominent themes under the substance, relationship, and process dimensions. Under substance, we had policy and regulatory issues, financial hardships, resource sustainability. Relationships yielded external conflicts and internal conflicts. Process involved ineffective decision-making design, and locals feeling marginalized (Table 3). These themes also laid the foundation for the automated content analysis of 281 responses from the 2007 survey. Themes emerging from the automated analysis remained the same between years, but the frequency of cited themes varied across the different survey years.


The substance dimension included three themes: policy and regulatory issues, financial hardships, and resource sustainability. In 2014, the most common code under the theme of policy and regulatory issues was the lack of maintenance toward infrastructure (n = 95; 25% of total statements; Table 4):

The infrastructure for trailer boating is minimal at best. Other states have better boating facilities than our island state. The use of fish aggregate devices (FADs) is critical to providing a destination for fishermen. Suddenly many of these devices are not being redeployed. This causes financial hardships by forcing the fishermen to travel further and burn more fuel to find fish.

However, in 2007 the most common statements were access issues regarding closed fishing areas (n = 72; 26%), e.g., “Bottom Fish closures [are a] bad idea, really unfair, that’s [our] way of life.”

The second most common theme in the substance category in 2014 was to increase the size and weight limits of various fish (n = 73; 20%). Statements under this theme expressed concern that small juvenile fish were being caught. Many felt this was not environmentally sustainable and was promoting overfishing, e.g., “Hawaiʻi needs to get a size limit for all ahi that is sold. No smaller than reproductive size. Far too many baby fish being killed for profit. What about tomorrow?” However, there was a considerable decrease in the policy suggestion to increase the size and weight limit for specific fish in the 2007 data set, with an n = 20 accounting for only 7% of the total responses.

Financial hardships emerged as an economic substance issue where fishers were concerned about their livelihood (2014: n = 64, 17%; 2007: n = 44, 16%):

Nets can wipe out entire schools of fish and leave nothing behind. People say that we would not be able to keep up with the demand for fish and it would drive prices higher. As far as I can remember, we’re still getting the same prices from back in the 80s and expenses have gone up by 400%.

Fishers expressed the gravity of overfishing generally and the need to protect marine resources for the future, which led to the theme of resource sustainability (2014: n = 61; 16% 2007: n = 37; 13%), e.g., “If we want to have fish for future generations then maybe there should be more restrictions on longline fishing.”


The relationship dimension was divided into two themes of external and internal conflicts. External conflicts were defined as a dispute between fishers and management, where fishers expressed distrust toward management. Internal conflicts were defined as a dispute among fishers based on gear type. The distrust code was the most prominent code in external conflicts, generating social tension between fishers and management (2014: n = 34, 9%; 2007: n = 51, 18%), e.g., “Shame on the scientists who should be held accountable for KNOWINGLY skewing the data and having government pay for studies that never required any,” and,

Federal regulations should be updated. Stop fooling the public and show the correct numbers. If they keep this up, who knows what else they might protect and start banning other species. That is the problem when the administration who don’t go in the ocean and make all the rules.

Under the internal conflicts theme, the most prominent statements were by fishers blaming longliners (2014: n = 37, 10%; 2007: n = 64, 23%) and fishers blaming netters (2014: n = 39, 10%; 2007: n = 37, 13%). Statements under these codes blamed longliners and netters for overfishing, increased regulatory actions and the subsequent effect on their livelihood:

Longliners are limiting the number of fish making it to the islands. Maybe the longliners can be capped on the number of pounds being brought in on a daily basis. When I call the automated fish auction recording and hear numbers like 38,000 pounds being hauled in to the auction by one boat, I have to believe that that has to have an effect on the amount of fish making it to the islands. If it’s not the longliners, then some other regulation needs to be in effect to ensure that recreational fishermen have a brighter future.
Stop net fishing in Kaneohe bay! Netters are depleting inshore fisheries by leaving their nets all night long and picking it up in the morning.


The process dimension showed the theme of ineffective decision-making design (2014: n = 43, 11%; 2007: n = 32, 11%) at both the state and federal level, citing equity issues, fishers lacking a voice, and the need for more research before implementing policy. Another main theme under the process dimension is locals feeling marginalized (2014: n = 27, 7%, 2007: n = 39, 14%) because of displacement of locals in fishing, and the lack of traditional knowledge in management, e.g., “More public meetings and interaction so that the fisheries department can get input directly from the fishermen.”

Analysis of conflict

Although all respondents held a State of Hawaiʻi Commercial Marine License, their motivation toward fishing activities varied. In 2014, 211 (57%) self-identified as commercial fishers and 158 (43%) self-identified as non-commercial fishers (purely recreational, recreational expense, subsistence, and cultural). In 2007, 118 (42%) fishers identified as commercial and 163 (58%) as non-commercial. Given variation in motivation between our datasets, and unbalanced datasets across survey years, we tested for the significance of interactions using a multinomial logit model. Finding none, we proceeded with a binomial model. We used binomial logit models to determine if any of the 27 total classifications differed by motivation (commercial vs. non-commercial) and across years. Of the 27 classifications evaluated (see Appendix 1), only four were statistically significantly different for motivation in 2007, and four in 2014 (Table 4). When analyzing differences between all stakeholders in 2007 vs. 2014, 14 codes were statistically significant spanning all dimensions (Table 4).

Coding co-occurrence coefficients were calculated to determine how often each dimension co-occurred with every other dimension across the entire text sample to explore the degree to which these associations changed from 2007 to 2014. All codes were grouped by dimension then the co-occurrence of dimensions was calculated. Coding co-occurrence coefficients varied from a low of 0.228 between process and substance in 2014, to a high of 0.51 between relationship and process in 2007 (Table 5). A higher coefficient number suggests a higher degree of overlap between the two dimensions, i.e., in 2007, 51% of the time an individual brought up a substance issue, they also brought up a relationship issue. The relationship-substance coefficient was higher than the process coefficient across both years, but lower in 2014 than 2007. The process-substance and process-relationship coefficients were lowest in 2014.


Manual text analysis, which is typically used in natural resource contexts, is time-intensive and ill-suited to larger datasets. Automated content analysis offers an alternative, leveraging computing power to analyze large datasets quickly and efficiently. Unfortunately, datasets from many natural resource management contexts are not large enough to benefit from the efficiencies of most ACA techniques. In this study, we demonstrated the benefits of a dictionary ACA as a complement to MCA for natural resource issues. The dictionary approach is meant to enrich, but not replace, the work of human coders, enabling researchers to tackle a larger body of data while remaining theory driven. In the case under investigation here, the method preserved the strengths of MCA while maximizing the efficiencies of ACA, ultimately providing interesting implications into the nature of conflict in Hawaiian small boat fishery.

The total average accuracy of the dictionary approach was 92% for the 2014 dataset and 89% for the 2007 dataset suggesting that this method works well to assist in automating subsequent surveys. Various aspects of the sector factored into the utility of the dictionary approach. First, the conflict domain within a fishery setting is predictable, consists of repeated key technical terms, and topics remain relatively stagnant over time, all of which facilitate the efficacy of a dictionary approach. Typically, with word count methods, small entries within a dictionary generally have too few hits to generate a meaningful outcome. However, a relatively small number of key fishery-conflict words quickly transmute the literal definition into a conceptual framework more relevant to management. Second, the manner in which the dictionary is constructed is important to increase the insightfulness of the dictionary approach (Deng et al. 2019). In our case, we used themes and codes from an initial round of MCA to build a dictionary for the Hawaiian small boat fishery. These themes and codes were inspired by a conceptual framework. This is an inversion of the typical dictionary process, which we argue delivers more deductive power compared with other forms of ACA (Boumans and Trilling 2016). Typically, a dictionary approach first analyzes the frequency of words, then, based on the definition of the most popular words, the researcher decides a category structure for each keyword contained in the dictionary (Deng et al. 2017). Instead, we first manually classified codes to deductively gain specificity from the context of the text, thereby ensuring that the words chosen in the dictionary-maintained sensitivity to contextual nuance while still preserving the strengths of ACA. This process not only proved validity to automate future Hawaiian small boat surveys but can provide salient insights into the nature of conflict within the fishery.

Our results revealed substantive issues that fishers identified, statements directly related to the tangible issue or conflict itself, largely related to policy and regulatory disputes. These results were expected because of the survey question’s specific aim of gathering suggestions for management. Perhaps most interesting given the intent and wording of the survey question is that in 2014 we see a statistically significant decrease in relationship conflicts (48% to 36% of total responses) and a dramatic increase in substance conflicts (53% in 2007 to 74% in 2014). At first glance, it may look like the root of conflicts in the Hawaiʻi small boat fishery are largely related to substance issues. However, when we explore the nature of the conflict using a coding co-occurrence matrix (Table 5), we find that much of the conflict is not solely substantive issues but representative of frayed relationships (underlying and/or identity-based) that make people feel less trusting and more competitive toward other stakeholders. As noted by others, conflicts over substance are often the outward expressions of unresolved deeper levels of conflict (Madden and McQuinn 2014, Crowley et al. 2017). The coding co-occurrence matrix gave a sense of which dimensions co-occurred where the highest association was between relationships and substance with a coefficient of 0.510 for 2007 and 0.421 for 2014 (Table 5). These findings suggest that substance issues that arise in the Hawaiʻi small boat fishery may indeed be deeply seated in underlying and identity-based conflicts where substance issues may surface as symbolic manifestations for unmet social needs.

The improper treatment of relationship conflicts as surface level substance problems may lead to short-term biological gains but conflicts that persist and resurface over time leading to continuous patterns of debate (Christie 2004, Pollnac et al. 2010). For example, by heeding fishers’ suggestions on regulating specific gear (substance), rather than addressing perceptions of distrust and disrespect within the relationship dimension, it may prompt compensatory behavior that deepens aggressive behavior, deteriorates trust, and further cements conflicts in the underlying and identity levels (Reed 2008, Madden and McQuinn 2014).

However, relationships are not only a cause of conflict but are also a solution. Whether improving communication, rapport, or building trust, strengthening relationships both externally and internally within the Hawaiʻi small boat fishery is essential to making issues more tractable. The value here of employing the conflict framework in tandem with an automated approach allows for a more complete representation of the drivers and treatments of conflict in a quick and reliable way. By accurately differentiating real substance conflicts from conflicts where substantive issues are symbolic manifestations of deeper relationship conflict, we can improve the chances of long-term success. Identifying and addressing the hidden roots of social conflict brings the focus away from deeper levels of conflict and back to the dispute level where there is a greater ability to identify areas of agreement that can be used as a social base for negotiating disagreements.

Fewer codes representing the process dimension were identified, and co-occurrences between process and the other conflict dimensions were low. However, this does not mean that process is unimportant, only that fishery participants are less likely to include it as a management suggestion or topic needing further study. Effective decision-making processes can improve the acceptability of management decisions as well as strengthen relationships and build trust (Madden and McQuinn 2014). Attention to process design may be an additional means to improve the frayed relationships that were linked more clearly to substantive disputes.

It is important to note that the dictionary method presented here is not without limitations. First, the dictionary-based ACA still requires several initial subjective steps, and any bias introduced in the first round of manual coding will be applied in subsequent automated analysis. MCA still needs to be performed to develop a predetermined list of categories as well as word lists to assign text to the respective categories. In our case, all coding was conducted by one researcher. Although multiple coders can improve confidence, we did not have two people code the entire dataset, instead relying on discussion amongst the research team for efficiency’s sake. Multiple coders can help to ensure consistency of categorization by measuring the degree of agreement among coders but employing a second coder is resource intensive (MacPhail et al. 2016). The use of multiple coders requires tests of inter-coder reliability to evaluate the extent to which these coders make similar coding decisions (Lombard et al. 2010) and for smaller datasets, the amount of time to train coders for reliability could exhaust the dataset.

Another limitation is that the high accuracy of the fishery dictionary only accounts for Type I errors (false positives) and not Type II errors (false negatives). Type II errors that do not contain the established keywords/phrases will not be detected by the software and the precision may be affected. Type I errors are limited by the iterative process of the 80% validity rule producing reliable results (Bengston and Xu 1995). Type II errors, or responses that are falsely omitted, are more difficult to control. If a respondent illustrates a concept that is not matched by the keyword/phrase within the dictionary, the response will be overlooked in the analytical process. WordStat 8.0.16 contains tools to manage Type II errors but they are rudimentary. One tool enables the researcher to analyze statistics on the dictionary coverage of data sets such as the percentage of sentences, paragraphs, and cases that were automatically coded by the dictionary. A decrease in dictionary coverage may indicate new themes arose, vernacular has changed, or it is time to adapt the dictionary to the new data presented.


NOAA’s 2007 and 2014 Hawaiʻi small boat fishery surveys provided useful datasets to build and test the dictionary-based ACA method in a fishery-conflict context. We found that the dictionary approach performed well, with accuracy for most terms rivaling the MCA. The size and nature of the datasets were key to the success of the dictionary approach, as was the method for constructing the dictionary. The data sets were small enough to manually code and develop concepts, yet large enough to gather repeated keywords and phrases to build a dictionary for ACA. The dataset described predictable, common, and consistent resource issues. Informing the dictionary with MCA, rather than automatically generating a dictionary, ensured the dictionary’s fidelity to the meanings of the words, and relevance to the conflict context. Although the initial upfront time investment for developing the dictionary was substantial, it can be used to efficiently analyze future results of the Hawaiʻi small boat fishery surveys. This research demonstrated how a dictionary-based approach can be applied within the field of fishery and natural resource conservation science, offering a valuable addition to the methodological toolbox.


Responses to this article are invited. If accepted for publication, your response will be hyperlinked to the article. To submit a response, follow this link. To read responses already accepted, follow this link.


We thank all participants surveyed and the National Oceanic Atmospheric Administration for data access and logistical support. We thank Dr. Thomas Oliver for statistical assistance and the Pacific Island Fisheries Science Center (PI: Dr. Minling Pan) for financial support (grant no. NA11NMF4320128).


The data/code that support the findings of this study are openly available in the NMFS Enterprise Data Management Program at, reference number [62698].


Armborst, A. 2017. Thematic proximity in content analysis. SAGE Open, 7(2).

Asghar, M. Z., A. Khan, S. Ahmad, and B. Ahmad. 2013. Subjectivity lexicon construction for mining drug reviews. Science International 26:145-149.

Bengston, D. N., and Z. Xu. 1995. Changing national forest values: a content analysis. Research Paper NC-323. U.S. Forest Service, North Central Forest Experiment Station, St. Paul, Minnesota, USA.

Bengtsson, M. 2016. How to plan and perform a qualitative study using content analysis. NursingPlus Open 2:8-14.

Boumans, J. W., and D. Trilling. 2016. Taking stock of the toolkit: an overview of relevant automated content analysis approaches and techniques for digital journalism scholars. Digital Journalism 4:8-23.

Cavanagh, S. 1997. Content analysis: concepts, methods and applications. Nurse Researcher 4:5-16.

Chan, H. L., and M. Pan. 2017. Economic and social characteristics of the Hawaii small boat fishery 2014. U.S. Department of Commerce, National Oceanic and Atmospheric Administration, National Marine Fisheries Service, Pacific Islands Fisheries Science Center, Honolulu, Hawaii, USA.

Charmaz, K. 2006. Constructing grounded theory: a practical guide through qualitative analysis. SAGE, Los Angeles, California, USA.

Christie, P. 2004. Marine protected areas as biological successes and social failures in Southeast Asia. American Fisheries Society Symposium 42:155-164.

Connolly-Ahern, C., L. A. Ahern, and D. S. Bortree. 2009. The effectiveness of stratified constructed week sampling for content analysis of electronic news source archives: AP Newswire, Business Wire, and PR Newswire. Journalism & Mass Communication Quarterly 86:862-883.

Crowley, S. L., S. Hinchliffe, and R. A. McDonald. 2017. Conflict in invasive species management. Frontiers in Ecology and the Environment 15:133-141.

Deng, Q., M. Hine, S. Ji, and S. Sur. 2017. Building an environmental sustainability dictionary for the IT industry. Proceedings of the 50th Hawaii International Conference on System Sciences 950-959.

Deng, Q., M. J. Hine, S. Ji, and S. Sur. 2019. Inside the black box of dictionary building for text analytics: a design science approach. Journal of International Technology and Information Management 27:119-159.

Elo, S., and H. Kyngäs. 2008. The qualitative content analysis process. Journal of Advanced Nursing 62:107-115.

Erlingsson, C., and P. Brysiewicz. 2017. A hands-on guide to doing content analysis. African Journal of Emergency Medicine 7:93-99.

Fulton, E. A., A. D. M. Smith, D. C. Smith, and I. E. van Putten. 2010. Human behaviour: the key source of uncertainty in fisheries management. Fish and Fisheries 12:2-17.

Glaser, B., and A. Strauss. 1967. Grounded theory: the discovery of grounded theory. Sociology: the Journal of the British Sociological Association 12:27-49.

Green, H. 2014. Use of theoretical and conceptual frameworks in qualitative research. Nurse Researcher 21:34-38.

Grimmer, J., and B. M. Stewart. 2013. Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Political Analysis 21:267-297.

Hospital, J., S. S. Bruce, and M. Pan. 2011. Economic and social characteristics of the Hawaii small boat pelagic fishery. Pacific Islands Fisheries Science Center, National Marine Fisheries Service, NOAA, Honolulu, Hawaii, USA.

Kamhawi, R., and D. Weaver. 2003. Mass communication research trends from 1980 to 1999. Journalism & Mass Communication Quarterly 80:7-27.

Leong, K. M., A. Torres, and S. Wise. 2020. Beyond recreation: when fishing motivations are more than sport or pleasure. National Oceanic and Atmospheric Administration, Washington, D.C., USA.

Lombard, M., J. Snyder-Duch, and C. C. Bracken. 2010. Practical resources for assessing and reporting intercoder reliability in content analysis research projects. [online] URL:

Luke, D. A., C. A. Caburnay, and E. L. Cohen. 2011. How much is enough? New recommendations for using constructed week sampling in newspaper content analysis of health stories. Communication Methods and Measures 5:76-91.

MacPhail, C., N. Khoza, L. Abler, and M. Ranganathan. 2016. Process guidelines for establishing intercoder reliability in qualitative studies. Qualitative Research 16:198-212.

Madden, F., and B. McQuinn. 2014. Conservation’s blind spot: the case for conflict transformation in wildlife conservation. Biological Conservation 178:97-106.

Maier, D., A. Waldherr, P. Miltner, G. Wiedemann, A. Niekler, A. Keinert, B. Pfetsch, G. Heyer, U. Reber, T. Häussler, H. Schmid-Petri, and S. Adam. 2018. Applying LDA topic modeling in communication research: toward a valid and reliable methodology. Communication Methods and Measures 12:93-118.

Murshed-e-Jahan, K., B. Belton, and K. K. Viswanathan. 2014. Communication strategies for managing coastal fisheries conflicts in Bangladesh. Ocean & Coastal Management 92:65-73.

Pacific Islands Fisheries Science Center (PIFSC). 2021. Automated content analysis of the Hawaiʻi small boat fishery. Pacific Islands Fisheries Science Center, Honolulu, Hawaii, USA. [online] URL:

Pollnac, R., P. Christie, J. E. Cinner, T. Dalton, T. M. Daw, G. E. Forrester, N. A. Graham, and T. R. McClanahan. 2010. Marine reserves as linked social-ecological systems. Proceedings of the National Academy of Sciences 107:18262-18265.

Pomeroy, R., J. Parks, R. Pollnac, T. Campson, E. Genio, C. Marlessy, E. Holle, M. Pido, A. Nissapa, S. Boromthanarat, and N. T. Hue. 2007. Fish wars: conflict and collaboration in fisheries management in Southeast Asia. Marine Policy 31:645-656.

Reed, M. S. 2008. Stakeholder participation for environmental management: a literature review. Biological Conservation 141:2417-2431.

Robson, M., and T. Davis. 2014. Evaluating the transition to sustainable forest management in Ontario’s Crown Forest Sustainability Act and forest management planning manuals from 1994 to 2009. Canadian Journal of Forest Research 45:436-443.

Spijkers, J., T. H. Morrison, R. Blasiak, G. S. Cumming, M. Osborne, J. Watson, and H. Österblom. 2018. Marine fisheries and future ocean conflict. Fish and Fisheries 19:798-806.

Stepchenkova, S., A. P. Kirilenko, and A. M. Morrison. 2009. Facilitating content analysis in tourism research. Journal of Travel Research 47:454-469.

Trilling, D., and J. G. Jonkman. 2018. Scaling up content analysis. Communication Methods and Measures 12:158-174.

Young, L., and S. Soroka. 2012. Affective news: the automated coding of sentiment in political texts. Political Communication 29:205-231.

Correspondent author:
Aviv Suan
Jump to top
Table1  | Table2  | Table3  | Table4  | Table5  | Figure1  | Appendix1