Exploring causality of target events with neural networks (以類神經網路探查目標事件的因果關係)

In the continuation of the aforementioned neural network analysis, the further question is: can we use artificial intelligence algorithms to realize the human logical deduction ability?

Because of the "association" of the word vector algorithm and neural network responses, it is clear that potential homogenous combinations can be identified. In the aforementioned combination of "lung canger" + "epidermal growth factor (EGFR)", the answer is very clear that there is a huge set of related collections of cancer and cancer-related genes. (Please note that the above + symbol represents the addition of vectors.) Such a huge collection of cancer and cancer-related genes will soon lead people to the next level of thinking, that is, which cancer and which cancer-related genes are mentioned at the same time?
由於詞向量演算法與類神經網路所回覆的「關聯性」,很明確可以找出潛在的同質性組合。在前述的「肺癌 Lung canger」+「表皮生長因子 EGFR」的組合之中,答案很明確就是很龐大的一組癌症及癌症相關基因的關聯集合(請注意上述的 + 符號代表的是向量相加)。這樣龐大的癌症及癌症相關基因的關聯集合,很快就能引導人們進入下一層思考,也就是:其中哪個癌症與哪個癌症相關基因是同時被提及的?

In this kind of problem, what we need is a little bit of mathematics. Here are some examples:

"Lung canger" + "EGFR" = "Breast cancer" + "What gene and breast cancer are mentioned at the same time?"
「肺癌 Lung canger」+「表皮生長因子 EGFR」=「乳癌 Breast cancer」+ 「什麼基因與乳癌同時提及?」

Terefore, "What gene and breast cancer are mentioned at the same time?" = "Lung canger" + "EGFR" - "Breast cancer"
則「什麼基因與乳癌同時提及?」 = 「肺癌 Lung canger」+「表皮生長因子 EGFR」 - 「乳癌 Breast cancer」

In addition to "epidermal growth factor (EGFR)", the key gene for lung cancer also have "tumor suppressor protein (p53)". So the logical math can be adjusted as follows:
若考量到肺癌關鍵的基因除了「表皮生長因子 EGFR」之外,還有「腫瘤抑制蛋白 p53」。因此邏輯數學式可調整如下:

"What gene and breast cancer are mentioned at the same time?" = "Lung canger" + "EGFR" + "p53" - "Breast cancer"
「什麼基因與乳癌同時提及」 = 「肺癌 Lung canger」+「表皮生長因子 EGFR」 + 「腫瘤抑制蛋白 p53」 - 「乳癌 Breast cancer」

After extracting the vocabulary vector from the recent 470,000 documents, and then adding and subtracting the execution vector, the most relevant 47 results are as follows (lung cancer, EGFR, and p53 in the picture represent initial input values):
在透過程式從近期的 47萬篇文獻中提取結論的詞彙向量,與執行向量的加法及減法計算之後,得到的最相關的 47個結果如下(圖片中的 lung cancer 、EGFR 與 p53 表示初始輸入值):

PTEN 、KRAS 、MDM2 、ALK 、TP53 、MYC 、EZH2 、BRAF 、FGFR1 、NOTCH1 、ERBB2 、β-catenin 、p21 、FLT3 、oncogenes 、NPM1 、c-Myc 、STAT3 、AKT 、ATM 、p27 、BRAFV600E 、NSCLC 、c-Met 、HER2 、TERT 、HMGA2 、p16 、DNMT1 、MITF 、KIT 、MET 、cyclin D1 、c-MYC 、survivin 、mutant p53 、YAP 、PD-L1 、SMAD4 、RUNX1 、ARID1A 、AXL 、RET 、ZEB1 、KRAS mutations 、JAK2 、E2F1 、FGFR3 、RTKs 、BCL2

After further human inspections, we can see that although there are several vocabularies that are different from the intuition of thinking, but all of them are correct. Where NSCLC is an abbreviation for cancer name. Because this word are often mentioned at the same time as breast cancer, they also appear in the screening results. The description of the gene state has mutant p53 and KRAS mutations. The overall description of genes is oncogenes. The remaining 43 gene vocabularies are all related to cancer and are mentioned in numerous breast cancer research reports.
在進一步的人為檢查之後,我們可以發現輸出 47 個結果中,雖有幾個詞彙與思考的直覺不同,但全部都是正確的。其中 NSCLC 是一種癌症名稱的縮寫,但因為常與乳癌同時被提及,因此也出現於篩選結果之中。其中與基因狀態有關的描述有 mutant p53 及 KRAS mutations ;屬於基因的整體描述詞彙有 oncogenes ;其餘的 43 個基因詞彙,則全部與癌症相關,且在眾多的乳癌研究報告中被提及。

With this kind of result presentation, we can clearly understand the similarity of the neural networks that simulate the logical deduction of human thinking. And the amount and correctness of the data responded to have already approached or even surpassed professionals in the field of research. However, the neural network does not understand the detailed relationship between these genes and breast cancer, and its exploration remains to be read and considered by professional researchers.  Therefore, it can also be determined that the neural network can play a powerful data exploration tool, which greatly improves the work efficiency of personnel. Finally, welcome to contact us, if you have any needs about large-scale data analysis .