In this case study, we selected articles related to oncology medicine in online news and community discussions with specific key words. According to the characteristics of media sources, the data is divided into four categories as shown above. The higher value of the black line, representing oncology medicine related news, comes mainly from more formal media reports or blog posts. The X-axis in the statistical picture shows the date of the last 30 days, and the Y-axis shows the number of articles filtered by the keyword on the day. (The total number of articles in the last 30 days is 235,235.)
From the above visual charts, it is easy to see the peak of three news reports, two of which appeared in April and one in May. According to the characteristics of the reporting peak, we use the 72-hour as time-window to observe the content characteristics of the reporting peak.
The word frequency content generated by the first time-window interval is shown in the 72 hours from April 11 to April 13th. Specific cancer reports are the main characteristics of the content, followed by food and environment factors. If we further explore the causes of the high frequency of words, we can find news related to cancer and related organ removal surgery, news related to the treatment of psoriasis and cutaneous carcinoma, news related to antalgesic and gastric cancer, news related to constipation, diarrhea and colorectal cancer, news related to dietary and cancer initiation/prevention, related to related to traditional Chinese and novel Western medicine for cancer treatment, news related to air pollution and lung cancer, and news related to anti-cancer drugs and import tariffs.
In the second time-window interval, news reports focused on surgical treatment methods and development of new anticancer drugs. The news report mentioned a large number of company names, showing that there was a high correlation with the company's first quarter financial report and annual shareholder meeting.
In the third and most recent time-windows, a series of reports involving the death of a celebrity performer has emerged. As a result, accurate disease vocabularies such as lung adenocarcinoma or chronic obstructive pulmonary disease (COPD) have emerged. In the corresponding social atmosphere, there are also a large number of relevant media reports on cancer prevention and cancer treatment.
Summarizing the above statistics and raw data observations, we can find that a large number of homogeneity news will appear in very short time period, and thus generate a statistical peak that can be used as threshold detection condition in early warning system. Word frequency analysis can be used to analyze the extensive vocabulary and precision vocabulary between different types of reports, so that early warning systems have the opportunity to obtain more accurate content analysis capabilities. In conjunction with data detection by Internet social media, we can understand whether news reports have caused widespread public attention and response. In this paradigm analysis, we can find news about the oncology medicine that did not trigger people's response in the three reporting peaks mentioned above (the orange lines in the line graph represent the Internet social media). However, the news of oncology medicine originally focused on social education. Although there is no numerical value reflected in the online social media, a high concentration of news reports is enough to obtain a high number of readers. If such a reporting peak is put into a commercially meaningful message, or is related to a specific organization, we believe this early warning system can show its application value.
Welcome to contact us, if you have any analysis needs about news reporting status.