Inquiring of cancer related genes by two layer artificial neural networks (以雙層類神經網路查詢癌症相關基因之範例)

In this analysis, we used the same two-layer neural network algorithm and paper data scale as the previous report. Trying to find out the conclusions of about 470,000 medical journals in the last month, which cancer types and cancer related genes are mentioned. The values adopted for data entry are the two most popular terms of "lung cencer" and "EGFR". A few seconds after the two words are sent out, a series of vocabulary outputs can be obtained, and a two-dimensional projection is generated based on the similarity of the word vectors.
在這次的分析測試中,我們採用與上一篇報告相同的雙層類神經網路演算法與論文資料規模,嘗試查詢最近一個月約 47萬篇醫學期刊的結論中,提及了哪些癌症種類與癌症相關基因。資料輸入採用的數值,是近期相當熱門的「肺癌 lung cencer」及「表皮生長因子 EGFR」這兩個英文詞彙。在送出這兩個詞彙之後的數秒鐘,就能得到一系列的詞彙輸出,並依據其詞向量的相似度,產生二維投影分布如上圖。

Since data only use the conclusions of recent medical research, the two-dimensional projection positions and distances between vocabularies may not necessarily represent correlations or similarities. Even so, the vocabulary extracted from research papers by neural networks is still quite accurate and meaningful. According to the cancer name or gene name, the list is organized as follows:
由於數據僅採用最近一個月的醫學研究結論,因此詞彙之間的二維投影位置與距離,並不一定能正確代表相關性或相似性。即便如此,類神經網路從研究論文中抽取出來的詞彙,依舊相當精準且具有實質意義。依據癌症名稱或基因名稱,列表整理如下:

Name of cancers
癌症名稱
Name of cancer related genes
癌症相關基因名稱


(input value 輸入值Lung cancer

(Output value 輸出值)NSCLC, HNSCC, lung adenocarcinoma, ovarian cancer, TNBC, SCLC, RCC, pancreatic cancer, lung cancers, melanoma, CRC, bladder cancer, glioblastoma, gastric cancer, PDAC, ESCC, colorectal cancer, colon cancer, GBM, osteosarcoma, breast cancer, PCa, ccRCC, thyroid cancer, EOC, PTC, prostate cancer, breast cancers, AML, ovarian carcinoma OSCC, glioma, esophageal cancer, GIST, CRPC, NPC, cervical cancer, MPM, colorectal carcinoma, non-small cell, NSCLCs

(input value 輸入值EGFR

(Output value 輸出值)HER2, KRAS, ALK, BRAF, PD-L1, MET, p53, EGFR mutations, FGFR1


In terms of the vocabulary description structure, except for a slight lack of a word (non-small cell should be non-small cell lung cancer), the acquisition of other cancer-related names is quite correct. For cancer-related genes, eight other important gene names are correctly queried. Among them, HER2 is a gene highly associated with breast cancer; KRAS, PD-L1, MET, p53, FGFR1, and BRAF are genes highly associated with various cancers; EGFR and ALK are associated with some types of lung cancer or other types of cancer. In addition, neural networks believe that the phrase EGFR mutations is similar to the noun EGFR, and it also means that these two vocabularies have the same important significance in medical conclusions.
在詞彙的描述結構上,除了一個詞彙略有缺失之外(non-small cell 應為 non-small cell lung cancer),其餘癌症相關名稱的擷取都相當正確。而對於癌症相關基因,則正確查詢出另外八個重要的基因名稱。其中,HER2是與乳癌高度相關的基因;KRAS、PD-L1、MET、p53、FGFR1與BRAF是與多種癌症高度相關的基因;EGFR與ALK是與部分種類的肺癌或其他癌症相關的基因。除此之外,類神經網路認為 EGFR mutations 這個名詞片語與 EGFR 名詞本身有相似性,也代表這兩個詞彙在醫學結論中有同樣重要的意義。

Looking at the above results, we can prove that the neural network can successfully extract the important research vocabulary in the paper under such experimental conditions. If the input and output range of the data is extended, the required analysis results can be more comprehensively obtained; if the input of the query is a less popular term in the non-cancer field, the neural network can also respond to correct results. This is also the most successful part of artificial intelligence compared with human thinking. Finally, welcome to contact us, if you have any needs about large-scale data analysis .
綜觀以上結果,可以證明類神經網路能在這樣的試驗條件中,成功抽取論文中的重要研究詞彙。倘若擴展資料的輸入與輸出範圍,就能更全面地獲得所需要的分析結果;倘若查詢的數值輸入為非癌症領域的較冷門名詞,類神經網路也能回應正確結果,這也是人工智慧與人類思考能力相比,勝出最多的部分。最後,如果您有領域相關或任何大規模資料分析的需求,歡迎與我們聯繫。