Public health surveillance (also epidemiological surveillance, clinical surveillance or syndromic surveillance) is, according to the World Health Organization (WHO), "the continuous, systematic collection, analysis and interpretation of health-related data needed for the planning, implementation, and evaluation of public health practice."[1] Public health surveillance may be used to track emerging health-related issues at an early stage and find active solutions in a timely manner.[1] Surveillance systems are generally called upon to provide information regarding when and where health problems are occurring and who is affected.[2]
Public health surveillance systems can be passive or active. A passive surveillance system consists of the regular, ongoing reporting of diseases and conditions by all health facilities in a given territory. An active surveillance system is one where health facilities are visited and health care providers and medical records are reviewed in order to identify a specific disease or condition. Passive surveillance systems are less time-consuming and less expensive to run but risk under-reporting of some diseases. Active surveillance systems are most appropriate for epidemics or where a disease has been targeted for elimination.[3]
Techniques of public health surveillance have been used in particular to study infectious diseases. Many large institutions, such as the WHO and the Centers for Disease Control and Prevention (CDC), have created databases and modern computer systems (public health informatics) that can track and monitor emerging outbreaks of illnesses such as influenza, SARS, HIV, and even bioterrorism, such as the 2001 anthrax attacks in the United States.
Many regions and countries have their own cancer registry, which is monitors the incidence of cancers to determine the prevalence and possible causes of these illnesses.[4]
Other illnesses such as one-time events like stroke and chronic conditions such as diabetes, as well as social problems such as domestic violence, are increasingly being integrated into epidemiologic databases called disease registries. A cost-benefit analysis is conducted on these registries to determine governmental funding for research and prevention.
Systems that can automate the process of identifying adverse drug events, are currently being used, and are being compared to traditional written reports of such events.[5] These systems intersect with the field of medical informatics, and are rapidly becoming adopted by hospitals and endorsed by institutions that oversee healthcare providers (such as JCAHO in the United States). Issues in regard to healthcare improvement are evolving around the surveillance of medication errors within institutions.[6]
See also: Disease surveillance. Syndromic surveillance is the analysis of medical data to detect or anticipate disease outbreaks. According to a CDC definition, "the term 'syndromic surveillance' applies to surveillance using health-related data that precede diagnosis and signal a sufficient probability of a case or an outbreak to warrant further public health response. Though historically syndromic surveillance has been utilized to target investigation of potential cases, its utility for detecting outbreaks associated with bioterrorism is increasingly being explored by public health officials."[7]
The first indications of disease outbreak or bioterrorist attack may not be the definitive diagnosis of a physician or a lab.[8]
Using a normal influenza outbreak as an example, once the outbreak begins to affect the population, some people may call in sick for work/school, others may visit their drug store and purchase medicine over the counter, others will visit their doctor's office and other's may have symptoms severe enough that they call the emergency telephone number or go to an emergency department.
Syndromic surveillance systems monitor data from school absenteeism logs, emergency call systems, hospitals' over-the-counter drug sale records, Internet searches, and other data sources to detect unusual patterns. When a spike in activity is seen in any of the monitored systems disease epidemiologists and public health professionals are alerted that there may be an issue.
An early awareness and response to a bioterrorist attack could save many lives and potentially stop or slow the spread of the outbreak. The most effective syndromic surveillance systems automatically monitor these systems in real-time, do not require individuals to enter separate information (secondary data entry), include advanced analytical tools, aggregate data from multiple systems, across geo-political boundaries and include an automated alerting process.[9]
A syndromic surveillance system based on search queries was first proposed by Gunther Eysenbach, who began work on such a system in 2004.[10] Inspired by these early, encouraging experiences, Google launched Google Flu Trends[11] in 2008. More flu-related searches are taken to indicate higher flu activity. The results, which were published in Nature, closely matched CDC data, and led it by 1–2 weeks.[12] However, it has been shown that the original approach behind Google Flu Trends had various modelling deficiencies leading to significant errors in its estimates.[13] More recently, a series of more advanced linear and nonlinear approaches to influenza modeling from Google search queries have been proposed.[14] Extending Google's work researchers from the Intelligent Systems Laboratory (University of Bristol, UK) created Flu Detector;[15] an online tool which based on Information Retrieval and Statistical Analysis methods uses the content of Twitter to nowcast flu rates in the UK.[16]
Digital surveillance of public health largely relies on a number of methods. The most important ones being the use of search-based trends on sites like Google and Wikipedia, social media posts on platforms like Facebook and Twitter, and participatory surveillance websites such as Flu Near You and Influenzanet. However the range of potential data sources suitable for disease surveillance has increased as different areas have become digitized; today school attendance records, hospital emergency admissions data and even sales data, can be used for syndromic surveillance purposes. Search trends provide indirect data on public health, while the latter two methods provide direct data.[17]
Search aggregates have been most frequently used to track and model influenza. A popular example is Google Flu Trends,[18] which was first released in 2008.[17] Wikipedia has also been used, though it is potentially prone to "noise", as it is a popular source of health information whether a user is ill or not. During the COVID-19 pandemic a new methodology has been developed to model COVID-19 prevalence based on web search activity.[19] This methodology has also been used by Public Health England in the United Kingdom as one of their syndromic surveillance endpoints.
Examples of social media public health surveillance include HealthTweets, which gathers data from Twitter.[20] Twitter data is considered highly useful for public health research, as its data policies allow public access to 1% samples of raw tweets. Tweets can also be geolocated, which can be used to model the spread of contagious disease. It is the most used social media platform for public health surveillance.[17] During the COVID-19 pandemic, Facebook used aggregated, anonymized data collected from its platforms to provide human movement information to disease models. It also offered users a chance to participate in a disease symptom survey through Carnegie Mellon University.[21]
Flu Near You and Influenzanet are two examples of crowd-sourced digital surveillance systems. Both sites recruit users to participate in surveys about influenza symptoms. Influenzanet was established in 2009, and operates in ten countries in Europe. Its predecessor was Grote Griepmeting, which was a Dutch/Belgian platform launched in 2003 and 2004. Flu Near You is used in the US. Another example of a surveillance sites is Dengue na Web, used to survey for dengue fever in Bahia, Brazil.[17]
Some conditions, especially chronic diseases such as diabetes mellitus, are supposed to be routinely managed with frequent laboratory measurements. Since many laboratory results, at least in Europe and the US, are automatically processed by computerized laboratory information systems, the results are relatively easy to inexpensively collate in special purpose databases or disease registries. Unlike most syndromic surveillance systems, in which each record is assumed to be independent of the others, laboratory data in chronic conditions can be theoretically linked together at the individual patient level. If patient identifiers can be matched, a chronological record of each patient's laboratory results can be analyzed as well as aggregated to the population level.
Laboratory registries allow for the analysis of the incidence and prevalence of the target condition as well as trends in the level of control. For instance, an NIH-funded program called the Vermedx Diabetes Information System[22] maintained a registry of laboratory values of diabetic adults in Vermont and northern New York State in the US with several years of laboratory results on thousands of patients.[23] The data included measures of blood sugar control (glycated hemoglobin A1c), cholesterol, and kidney function (serum creatinine and urine protein), and were used to monitor the quality of care at the patient, practice, and population levels. Since the data contained each patient's name and address, the system was also used to communicate directly with patients when the laboratory data indicated the need for attention. Out of control test results generated a letter to the patient suggesting they take action with their medical provider. Tests that were overdue generated reminders to have testing performed. The system also generated reminders and alerts with guideline-based advice for the practice as well as a periodic roster of each provider's patients and a report card summarizing the health status of the population. Clinical and economic evaluations of the system, including a large randomized clinical trial, demonstrated improvements in adherence to practice guidelines and reductions in the need for emergency department and hospital services as well as total costs per patient.[24] [25] [26] The system has been commercialized and distributed to physicians, insurers, employers and others responsible for the care of chronically ill patients. It is now being expanded to other conditions such as chronic kidney disease.
A similar system, The New York City A1C Registry,[27] is in used to monitor the estimated 600,000 diabetic patients in New York City, although unlike the Vermont Diabetes Information System, there are no provisions for patients to have their data excluded from the NYC database. The NYC Department of Health and Mental Hygiene has linked additional patient services to the registry such as health information and improved access to health care services. As of early 2012, the registry contains over 10 million test results on 3.6 million individuals. Although intended to improve health outcomes and reduce the incidence of the complications of diabetes,[28] a formal evaluation has not yet been done.
In May 2008, the City Council of San Antonio, Texas approved the deployment of an A1C registry for Bexar County. Authorized by the Texas legislature and the state Health Department, the San Antonio Metropolitan Health District[29] implemented the registry which drew results from all the major clinical laboratories in San Antonio. The program was discontinued in 2010 due to lack of funds.
Laboratory surveillance differs from population-wide surveillance because it can only monitor patients who are already receiving medical treatment and therefore having lab tests done. For this reason, it does not identify patients who have never been tested. Therefore, it is more suitable for quality management and care improvement than for epidemiological monitoring of an entire population or catchment area.