Bibliomining is the use of a combination of data mining, data warehousing, and bibliometrics for the purpose of analyzing library services.[1] [2] The term was created in 2003 by Scott Nicholson, Assistant Professor, Syracuse University School of Information Studies, in order to distinguish data mining in a library setting from other types of data mining.[3]
First a data warehouse must be created. This is done by compiling information on the resources, such as titles and authors, subject headings, and descriptions of the collections. Then the demographic surrogate information is organized. Finally the library information (such as the librarian, whether or not the information came from the reference desk or circulation desk, and the location of the library) is obtained.
Once this is organized, the data can be processed and analyzed. This can be done via a few methods, such as online analytical processing (OLAP), using a data mining program, or through data visualization.
Bibliomining is used to discover patterns in what people are reading and researching and allows librarians to target their community better. Bibliomining can also help library directors focus their budgets on resources that will be utilized. Another use is to determine when people use the library more often, so staffing needs can be adequately met. Combining bibliomining with other research techniques such as focus groups, surveys and cost-benefit analysis, will help librarians to get a better picture of their patrons and their needs.
There is some concern that data mining violates patron privacy. But by extracting the data, all personally identifiable information is deleted, and the data warehouse is clean. The original patron data can then be totally deleted and there will be no way to link the new data to a particular patron. This can be done in a few ways. One, used with information regarding database access, is to track the IP address, but then replace it with a similar code, that will allow identification without violating privacy. Another is to keep track of an item returned to the library and create a "demographic surrogate" of the patron. The demographic surrogate would not give any identifiable information such as names, library card numbers or addresses.
The other concern in bibliomining is that it only provides data in a very detached manner. Information is given as to how a patron uses library resources, but there is no way to track if the resources met the user's needs completely. Someone could take out a book on a topic, but not find the information they were seeking. Bibliomining only helps identify which books are used, not how useful they actually were. Bibliomining cannot provide information on how well a collection serves a patron. In order to counteract this, bibliomining must be used in accordance with other research techniques.