Group Photo of All Participants (On-site + Online)
The Text Mining and Digital Transformation Services Industry-Academia Alliance organized its third workshop this year with the aim of teaching alliance members how to utilize a text analysis workflow platform for public opinion analysis. The event was designed to accommodate both in-person and online participation, allowing partners who had to travel long distances to attend the workshop synchronously. Remote teaching enabled uninterrupted learning for all participants.
The workshop began with a video opening by Professor Huang Sanyi, who expressed gratitude to the industry professionals, academic faculty, and students for their enthusiastic participation. Recognizing that some attendees were newcomers, Professor Huang assured that the course would start from the basics and gradually guide everyone through the process, concluding with a practical exercise in public opinion analysis.
Professor Huang then explained that in addition to inviting industry partners, academic faculty members were also invited to participate in the workshop. The main reason behind this invitation was that text mining techniques can be applied to text analysis and research on social media comments. Therefore, there is a hope that more scholars can have access to the platform and utilize it to address their academic text analysis needs. As a result, the alliance plans to recruit academic members next year to promote and apply the alliance’s technologies more widely.
Professor Huang also shared the design philosophy behind the development of the text analysis workflow platform. Taking into account the challenges faced by existing text analysis systems in the market, the goal was to create a system that is user-friendly and easy to operate for the general public. The platform was designed to lower the entry barrier, eliminating the need for programming skills. Users can simply drag and drop components and follow an intuitive workflow concept, saving time on coding while achieving effective analysis.
One key feature of the platform is the ability to record the analysis workflow. This means that subsequent analyses can be easily replicated by modifying certain parameter settings. Users can execute the modified workflow and obtain immediate results, thus streamlining the analysis process.
In the third session of the workshop, the training was conducted by Dr. Tsai Yihang, a doctoral student from Professor Huang Sanyi’s team. The theme of this workshop was “Culinary Delights,” where participants utilized this theme and followed the instructor’s guidance to engage in practical operations using the text analysis workflow platform.
The first session of the workshop focused on the technical aspects of text analysis, specifically the processing of text. It included an introduction to the system, data extraction, data preprocessing, and word relationship analysis. Prior to hands-on operations, the basic concepts of text analysis were explained to the participants. The system utilized web crawling techniques to retrieve data. The data was then subjected to preprocessing, which involved standardizing formats, removing or replacing symbols. The next step was text analysis, including sentence segmentation and word segmentation. N-gram analysis was used to identify important phrases, and stop words were removed to eliminate unimportant terms. Towards the end of the session, word relationships were explored. The frequency of occurrence for each word in the text was calculated, and the co-occurrence frequency of word pairs was determined. Higher co-occurrence frequencies indicated stronger correlations between word pairs. The results were visualized through keyword network analysis graphs.
The second session of the workshop introduced sentiment analysis and opinion mining. It included the analysis of sentiment words (counting the occurrences of positive and negative words, which need to be predefined), topic analysis (counting the occurrences of topic-related words), and the process of manually defining topics. To define topics, a topic dictionary was created, where each topic consisted of multiple words. The topic of a document was determined by evaluating its words against the topic dictionary. The session also covered topic clustering and summarization, where multiple data points were grouped into one or more clusters, and aggregate functions were applied based on the clusters. Additionally, date clustering and summarization were discussed, with groupings based on date formats. Finally, the session concluded with the visualization dashboard, which presented the results of clustering and summarization through various charts and graphs, making the text analysis more visually meaningful.
In summary, this workshop focused on conducting sentiment analysis, with two key objectives in sentiment analysis: (1) identifying the topics discussed in articles, and (2) measuring the volume of discussions related to those topics. By exploring data from social media forums, the workshop demonstrated how the results that previously required programming skills could be easily achieved using the components of the text analysis workflow platform. The alliance aimed to facilitate knowledge exchange through these workshops, enabling partners to gain a better understanding of the alliance’s technical capabilities. The successful implementation of this workshop has fulfilled its communication goals. Thank you to all the participants for joining us today, and we look forward to seeing you again in next year’s workshop.