Exploring key participants for election issues with neural networks (以類神經網路探查選舉議題的關鍵參與人)

In this neural network test, we combined a total of 1.32 million articles on unclassified news and unclassified Internet sentiments together for a large amount of unsupervised machine learning. After getting this word vector model that represents recent news and sentimental words, we try to get the neural network to answer a question, that is: Who has recently participated in election-related events? In this kind of problem, we have initially thought about the structure of several logical inferences. Finally we chose to set the following:
在這次的類神經網路測試中,我們將近期的不分類的新聞與不分類的網路輿情共 132萬篇文章,一併彙整進行龐大的非監督式機器學習。在獲得這份可代表最近數個月的新聞及輿情的詞向量模型之後,我們嘗試讓類神經網路回答一種問題,也就是:最近有誰參與了選舉相關的事件?在這樣的問題中,我們初步思考了幾種邏輯推論的結構,最後選擇設定如下:

"Taipei City" + "Ding Shouzhong" + "Yao Wenzhi" = "County & City" + "Who participated in election-related events?"
「台北市」+「丁守中」+「姚文智」=「縣市」+「誰參與了選舉相關的事件?」

In the above logical inference structure, it is easy to raise a question: Why did you not choose Taipei Mayor "Ke Wenzhe"? However, the reason is not that the Taipei Mayor "Ke Wenzhe" vocabulary choice does not reflect logic. But when the actual operations "Taipei City" and "Ke Wenzhe" are added together, the results obtained are very diversified, irrespective of any vocabulary vector. Therefore, we tentatively judge that both the "Taipei City" and "Ke Wenzhe" are vocabularies with high multi-dimensional vector strength, and are therefore not suitable for use in the hope of expecting a "preference to politics" logic.
在上述的邏輯推論結構中,很容易就會引發一個問題:為什麼沒有選擇台北市的「柯文哲」市長?然而,其原因並不是市長「柯文哲」的詞彙選擇不合思考邏輯。而是當實際運算「台北市」與「柯文哲」這兩個詞彙相加之後,無論扣除了任何詞彙向量,得到的結果都是非常多元化。因此,我們初步判斷「台北市」與「柯文哲」這兩個詞彙都屬於多維度向量強度較高的詞彙,因此反而不適合用於期盼「偏向於政治」的邏輯推演。

However, "Ding Shouzhong" and "Yao Wenzhi" are very pure names that are related to electoral political issues. Therefore, after the vector calculation and comparison query, the first 200 output vocabularies selected are artificially judged. It can be found that more than 90% of the words are correctly segmented, and about 95% of them are the names of the correct politicians. In these names, we sort by the similarity and select the top 99 names for classification. The general classification is as follows:
然而「丁守中」與「姚文智」是非常單純與選舉政治議題相關的人名。因此在向量計算與比對查詢後,將挑選出來的前 200個輸出詞彙進行人為判斷,可以發現超過 90%的詞彙被正確斷詞,且其中約 95%為正確的政治人物名字。在這些人名中,我們依相似度排序,挑選前 99位人名進行分類,可獲得大致的分類如下:

Participating election-related events
參與的選舉相關事件
Names of people involved in election-related events
參與選舉相關事件的人名
Mayor election of Taipei city
台北市長選舉
柯文哲、張顯耀、蔣萬安、呂秀蓮、李錫錕、孫大千、姚文智、鄭麗文、邱文祥、蘇煥智、姚姚、丁丁、葉匡時
Mayor election of New Taipei city
新北市長選舉
侯友宜、吳秉叡、蘇貞昌、金介壽、李乾龍
Announcements related to elections in the Greater Taipei region
發表大台北地區選舉相關言論
連勝文、吳思瑤、姚立明、民進黨、郝龍斌、梁文傑、國民黨、王世堅、藍世聰、賴清德、蔡英文、王鴻薇、朱立倫、蔡正元、吳敦義、洪耀福、簡余晏、莊瑞雄、洪智坤、洪健益、黃國昌、阮昭雄、郁慕明、李慶鋒、鄭麗君、周玉蔻、陳菊、時代力量、林昶佐、周錫瑋、蘇巧慧、辜寬敏、劉奕霆、李鴻鈞、時力、趙少康、周柏雅、秦慧珠、吳音寧、陳水扁
Taipei city councilor election or relevant announcements
台北市議員選舉或發表相關言論
羅智強、游淑慧、顏聖冠、王威中、李明賢、顏若芳、高嘉瑜、江志銘、吳崢、徐弘庭、王閔生、親民黨、劉耀仁、簡舒培、應曉薇、馬英九、游藝、許毓仁
Relevant statements on elections in other regions
發表其他地區選舉的相關言論
趙天麟、王定宇、劉世芳、盧秀燕、高思博、鄭運鵬、林俊憲、費鴻泰、楊麗環、管碧玲、沈富雄、許淑華、柯建銘、林為洲 、馮光遠、楊偉中、韓國瑜、柯志恩、許智傑、廖國棟、吳宜臻、邱毅


In such analysis results, it can be found that even if the dictionary of the auxiliary segmentation word does not specifically set the name of the politician, the neural network can still extract the name with a correct rate of nearly 90%. (The wrong vocabulary is the name of the person with broken words.) The names that are responded to even include the nicknames given to politicians by netizens, or the participation of politicians in political groups (full name or abbreviation).  Judging from the quality of the output results, the accuracy of neural network semantic analysis under this test condition is close to 100%.
在這樣的分析結果中,可以發現即便輔助斷詞的辭典沒有特別設定政治人物的名字,類神經網路依舊能以接近 90%的正確率將名字抽取出來(抽取錯誤的詞彙皆為人名的斷詞錯誤)。回應出來的名字,甚至包含了網友給予政治人物的暱稱、或政治人物所參與政治團體(全名或簡稱)。從輸出結果的品質來看,此測試條件下,類神經網路的語意分析精準度已接近 100%。

However, if you look closely at the classification of names and participation topics, you may find that the issues are still strongly concentrated in Taipei City, New Taipei City or Greater Taipei. From the original data input bias to find out the reasons, we can find that election news in Greater Taipei makes the vocabulary vector intensity significantly higher than other electoral districts. The overlap between the word "city and county" and the broad term "Taipei city and county" may also be a key factor.
然而,若仔細閱覽人名與參與議題的分類,可以發現議題仍強烈集中於台北市、新北市或大台北地區。從原始輸入的資料偏向來查詢原因,可以發現大台北地區的選舉相關新聞使詞彙向量強度明顯高過其他選舉區;「縣市」兩字與「台北縣市」這個廣泛用詞的重疊,也可能是關鍵因素。

Even so, neural networks still found a good number of correct names in the 1.32 million news. Through the application of this technology, it is easier to find key participants or key opinion leaders from a wide range of news events. Finally, welcome to contact us, if you have any needs about large-scale data analysis .
但即便如此,類神經網路依舊在 132萬篇的茫茫新聞網海中順利找出大量的正確人名。透過此技術的應用,從廣泛新聞事件找尋關鍵參與人員或關鍵意見領袖,就能更加便捷。最後,如果您有領域相關或任何大規模資料分析的需求,歡迎與我們聯繫。