Web browsing history refers to the list of web pages a user has visited, as well as associated metadata such as page title and time of visit. It is usually stored locally by web browsers[1] [2] in order to provide the user with a history list to go back to previously visited pages. It can reflect the user's interests, needs, and browsing habits.[3]
All major browsers have a private browsing mode in which browsing history is not recorded. This is to protect against browsing history being collected by third parties for targeted advertising or other purposes.
Locally stored browsing history can facilitate rediscovering lost previously visited web pages of which one only has a vague memory in mind, or pages difficult to find due to being located within deep web. Browsers also utilize it to enable autocompletion in their address bar for quicker and more convenient navigation to frequently visited pages.[4]
The retention span of browsing history varies per internet browser. Mozilla Firefox (desktop version) records history indefinitely by default inside a file named places.sqlite
, but automatically erases the earliest history upon exhausted disk space, while Google Chrome (desktop version) stores history for ten weeks by default, automatically pruning earlier entries. An indefinite history file named Archived History
was once recorded, but has been removed and automatically deleted in version 37, released in September 2014.[5] [6]
Browser extensions such as History Trends Unlimited for Google Chrome (desktop version) allow the indefinite local storage of browsing history, exporting into a portable file, and self-analysis of browsing habits and statistics.[7]
Browsing history is not recorded when using the private browsing mode provided by most browsers.
See main article: Targeted advertising. Targeted advertising means presenting the user with advertisements that are more relevant to one based on one's browsing history.[8] A typical example is a user receiving advertisements on shoes when browsing other websites after searching for shoes on shopping websites. One research shows that targeted advertising doubles the conversion rate of classical online advertising.[9]
Real-time bidding (RTB) is the method used behind targeted advertising. It is a system that automatically bids up the price for presenting advertisements on certain websites. Advertisers decide how much they are willing to pay based on the target audience of the websites. Therefore, more information about the users could encourage advertisers to pay higher prices. The information of users, such as browsing history, is provided to all firms that are involved in the bidding.[10] Since it is a real-time process, information is usually collected without the consent of the user and transferred in unencrypted form.[11] The user has very limited knowledge of how their information is collected, stored, and used.[12] [13]
The response of the user towards targeted advertising depends on whether one knows the information is being collected. If the user already knows that the information is being collected ahead of time, the targeted advertisement could potentially create a positive effect, leading to a higher intention of clicking through the link. However, if the user is not informed about information collection, one would be more concerned with privacy. This will decrease one's intention of clicking through the link. Meanwhile when the user considers the website reliable, it is more possible for them to click through the link and accept the personalization service.[14]
To solve the conflicts between privacy and profits, one newly proposed system is pay-per-tracking. A broker exists between users and advertisers. Users could decide whether to provide their personal information to the broker and then the broker would send the personal information offered by users to advertisers. Meanwhile, users could receive monetary rewards for sharing their personal information. This could help protect the privacy and tracking efficiency, but would lead to extra cost.[15]
See main article: Personalized pricing. Personalized pricing is based on the idea that if a user purchases a certain product frequently or pays a higher price for that product, the user could be charged a higher price for this product. Web browsing history could give reliable predictions on the purchasing behaviors of users. When using personalized pricing, the profit of firms could increase by 12.99% compared to status quo cases.[16]
Web browsing history could be used to facilitate research, such as revealing the browsing behavior of people. When a user browses extensively on one site, the probability of requesting an additional page increases. When a user visits more sites, the likelihood of requesting extra pages reduces.[17]
Web browsing history could also be used to create personal web libraries. A personal web library is created by collecting and analyzing the web browsing history of the user. It could help the user to notice browsing trends, time distribution, and the most frequently used websites. Some users regard this function as helpful.
Web browsing history stored locally is not published anywhere publicly by default. However, almost all the websites are tracked by adwares and potentially unwanted programs (PUPs) which collect users' information without their consent.[18] These tracking methods are usually allowed by platforms by default. Web browsing history is also collected by cookies on websites, which could be divided into two kinds, first-party cookies and third-party cookies. Third-party cookies are usually embedded on first-party websites and collect information from them.[19] Third-party cookies have higher efficiency and data aggregation ability than first-party cookies. While first-party cookies only have access to users' data on one website, third-party cookies could combine data collected from different websites to make the image of the user more complete. Meanwhile, several third-party cookies could exist on the same website.
With enough information available, users could be identified without logging into their accounts.[20]
When third-party cookies collect the web browsing history of users from multiple websites, more information leads to more privacy concerns. For example, a user browses news on one website and searches for medical information on the other website. When the web browsing history from these two websites is combined, the user may be considered interested in news related to medical topics. When browsing history from different websites is combined, it could reflect a more complete image of the person.
In 2006, AOL released a large amount of data of its users, including search history. Although no user IDs or names was included, users could be identified based on the browsing history released.[21] For example, user No. 4417749 was identified with her search history over three months.[22]
In 2020, Avast, a popular antivirus software, has been accused of selling browsing history to third parties. It is under preliminary investigation of this accusation by officials of the Czech Republic. The report shows that Avast sold users' data through Jumpshot, a marketing analytics tool. Avast claimed that users' personal information was not included in the leak. However, browsing history could be used to identify users. Avast shut down Jumpshot as a reply to this issue.[23]
When the user feels there is a risk to privacy, one's intention of disclosing personal information will be lower, but the actions are not affected.[24] However, some studies finds that there is no significant difference between the intention and the actions of disclosing private information, meaning the user will reduce actions of sharing personal information and take more protection measures when feeling concerned about privacy.[25] When users have privacy concerns, they would make less use of online services. They would also make more protection measures such as refusing to offer their information, offering false information, removing their information online and complaining to people around them or relevant organizations.[26]
However, it is hard for users to protect their privacy due to multiple reasons. First, users do not have enough privacy awareness. They are not concerned about being tracked unless there are substantial impacts on them. They are also not aware of how their data contains commercial values. It is generally difficult for users to notice privacy policy links on all kinds of websites, with female users and older users, being more likely to ignore these notices. Even when users notice privacy links, their information disclosure may not be affected.[27] In addition, users are also not equipped with enough technical knowledge to protect themselves even when they notice privacy leakage. They are placed on the passive side with little room to change the situation.
Most users make use of ad blockers, delete cookies, and avoid websites that collect personal information to try to protect their web browsing history from being collected.[28] However, most ad blockers do not offer enough guidance to users to help them improve their privacy awareness. More importantly, they rely on standard black and white list.[29] These lists do not usually include the websites that are tracking users. Ad blockers could only be effective if these tracking domains are blocked.[30]
There are a series of open source projects that try to protect their privacy through collecting their browsing history on the hard drive instead of the browser.[31] It solves the issue of such as that users cannot see the browsing history data once the user deletes the data on the browser.