One of the main issues that emerged from prior research is that the debate around climate change solutions seems to be struggling with one particular feat: a lack of a shared language.

Moreover, the technical nature of climate science communication and the generic alarmism of mainstream media make it difficult to take hold of public opinion. This dehumanization of the issue, together with the technicality of the language used to share information about it, contributes to the fatigue of the general public in dealing with the problem. But there seem to be other ways for people to discuss climate change and take personal action.

Our team has looked at the online communication of young climate movements and how they drive public engagement on environmental topics. We explored the language of these new movements(Extinction Rebellion, Fridays for Future, Sunrise Movement, Zero Hour) in relation to climate emergencies by using digital methods of research(tools that allow data from social media platforms to be repurposed for investigating social and political issues). Furthermore, we broke down the structure of communication by analyzing the vocabulary they used.

To do this, we used a text processing operation known as Latent Dirichlet Allocation (LDA). LDA uses topic modelling. Through various mathematical procedures, it tries to identify a set number of topics that best describes the corpus, or collection of documents. (Ganegedara, 2018). Through LDA, we were able to outline 10 main topics that encompass the textual communication of climate movements. Using word frequency and relevance measures for the vocabulary (word frequency is the amount of times a term appears in the corpus & relevance is how much it contributes to a given topic), we found name entities(more specifically terms that represent real-world objects like people, places, organizations) and last but not least the topics of communication.

The textual data we collected for this particular task consisted of 9.5 thousand captions from the official accounts of young climate movements on Instagram and Twitter as well as individual accounts who used the following hashtags: #ExtinctionRebellion, #FridaysforFuture, #SunriseMovement, #ThisisZeroHour.

1. http://vis.stanford.edu/files/2012-Termite-AVI.pdf
2. http://nlp.stanford.edu/events/illvi2014/papers/sievert-illvi2014.pdf

description of the plot

LDAvis allows one to select a topic to reveal the most relevant terms for that topic. First, the areas of the circles are proportional to the relative prevalences of the topics in the corpus.The second core feature of LDAvis is the ability to select a term (by hovering over it) to reveal its ratio distribution over topics.

The widths of the gray bars represent corpus-wide frequencies of each term, and the widths of the red bars represent the topic-specific frequencies of each term. By comparing the widths of the red and gray bars for a given term, users can quickly understand whether a term is highly relevant to the selected topic because of its lift (a high ratio of red to gray), or its probability (absolute width of red). A slider allows users to change the value of λ , which can alter the rankings of terms to aid topic interpretation.If = 1, implies the red bars would be sorted from widest (at the top) to narrowest (at the bottom).

Conclusion

All in all, the LDA procedure has helped us outline some of the key topics of discussion that appear within the communication of climate movements on social media. In a wider sense, the use of LDA is an interesting way to use machine learning to identify key vocabulary and issues that come up in human discussion. Some limitations of LDA are that certain topics may be hard to understand; often times words are grouped together that seem unrelated. Furthermore, the size and quality of the dataset as well as the way words are cleaned (emojis, hashtags, or any other unfamiliar symbols should be removed) has an impact on the topics that are created and the overall understanding of the machine creating them. To learn more about LDA and how to implement it with your own data (using Python), check the following resource: https://towardsdatascience.com/topic-modeling-and-latent-dirichlet-allocation-in-python-9bf156893c24.