Web Data Crawler (網頁數據資料爬蟲)

 

Through the cloud data crawler system and proprietary API interface, we have achieved many-to-many crawler systems connected to back-end servers. This design assures long-term stability of the overall system, maintain the flexibility of the system and data processing performance. In addition to retrieving specific text fields on web pages, we can also provide image capture, stream video recording, and overall web page saving services.
This system is mainly based on Linux CentOS operating system on cloud server, MySQL MariaDB is a temporary database, and Ruby on Rails is the main system core language. The syntax used by the API is mostly JSON or XML file format to ensure cross-platform convergence.

藉由雲端爬蟲程式系統與專屬的 API介面設計,我們已經達成能多對多串接爬蟲程式與後端資料庫的系統部屬。這樣的設計能確保整體系統長時間運行的穩定性、保持系統的應用彈性以及資料處理的效能。除了擷取網頁上的特定文字欄位之外,我們也能提供圖片擷取、影音側錄以及網頁整體畫面的保存服務。
這個雲端系統主要以 Linux CentOS作業系統為基礎、MySQL MariaDB為暫存資料庫、Ruby on rails為主要系統核心語言。API所採用的語法則多為JSON或XML文件格式,以確保跨平台的銜接彈性。