Click tracking explained

Click tracking is when user click behavior or user navigational behavior is collected in order to derive insights and fingerprint users.[1] [2] Click behavior is commonly tracked using server logs which encompass click paths and clicked URLs (Uniform Resource Locator).[3] This log is often presented in a standard format including information like the hostname, date, and username. However, as technology develops, new software allows for in depth analysis of user click behavior using hypervideo tools. Given that the internet can be considered a risky environment, research strives to understand why users click certain links and not others.[4] Research has also been conducted to explore the user experience of privacy with making user personal identification information individually anonymized and improving how data collection consent forms are written and structured.[5] [6]

Click tracking is relevant in several industries including Human-Computer Interaction (HCI), software engineering, and advertising.[7] Email tracking, link tracking, web analytics, and user research are also related concepts and applications of click tracking.[8] A common utilization of click data from click tracking is to improve results' positions from search engines to make their order more relevant to users' needs.[9] Click tracking employs many modern techniques such as machine learning and data mining.

Tracking and recording technology

See also: Search engine privacy.

Tracking and recording technologies (TRTs) can be split into two categories, institutional TRTs and end-user TRTs.[10] Institutional TRTs and end-user TRTs differ by who is collecting and storing the data, and this can be respectively understood as institutions and users. Examples of TRTs include radio frequency identification (RFID), credit cards, and store video cameras. Research suggests that individuals are concerned with privacy, but they are less concerned with how TRTs are used daily. This discrepancy has been attributed to the public not understanding how information about them is getting collected.

Another means of obtaining user input is eye-tracking or gaze tracking. Gaze-tracking technology is especially beneficial for those with motor disabilities.[11] Systems that employ gaze-tracking often try to mimic cursor and keyboard behavior. In this process, the gaze-tracking system is separated into its own panel in the system interface, and the user experience of this system is compromised as individuals have to switch between the panel and the other interface features. The experience is also difficult because users have to first imagine how to complete the task using keyboard and cursor features and then employ gaze. This causes tasks to take additional time. Hence, researchers created their own web browser called GazeTheWeb (GTW), and the focus of their research was on the user experience. They improved the interface to incorporate gaze better.

Eye-movement tracking is also applied in usability testing when creating web applications.[12] However, in order to track user eye movements, a lab setting with appropriate equipment is often required. Mouse and keyboard activity can be measured remotely, so this quality can be capitalized for usability testing. Algorithms can use mouse movements to predict and trace user eye movements. Such tracking in a remote environment is denoted as a remote logging technique.Browser fingerprinting is another means of identifying users and tracking them.[13] In this process, information about a user is collected from their web browser to create a browser fingerprint. A browser fingerprint contains information about a device, its operating system, its browser, and its configuration. HTTP headers, JavaScript, and browser plugins can be used to build a fingerprint. Browser fingerprints can change over time from automatic software updates or user browser preference adjustments. Measures to increase privacy in this realm can reduce functionality by blocking features.

Methods of click tracking

User browsing behavior is often tracked using server access logs which contain patterns of clicked URLs, queries, and paths. However, more modern tracking software utilizes JavaScript in order to track cursor behavior. The collected mouse data can be used to create videos, allowing for user behavior to be replayed and easily analyzed. Hypermedia is used to create such visualizations that allow for behavior like highlighting, hesitating, and selecting to be monitored. Technology that is used to record such behavior can also be used to predict it. One of these monitoring tools, SMT2є, collects fifteen cursor features and uses the other fourteen to predict the last feature's outcome. This software also generates a log analysis which summarizes user cursor activity.

In a search session, users can be identified using cookies, identd protocol, or their IP address. This information can then be stored in a database, and every time a user visits a web page again, their click behavior will be appended to the database. DoubleClick Inc. is an example of a company that has such a database and partners with other companies to aid with their web mining. Cookies are added to HTTP (Hypertext Transfer Protocol), and when a user clicks on a link, they are connected to the associated web server. This action of a user clicking on a link is seen as a request, and the server “responds” by sending the user's information, and this information is a cookie. Cookies provide a “bookmark” for users’ sessions on a website, and they store user login information and the pages users visit on a website. This aids with preserving the state of the session. If there is more than one such server, information must be consistent among all servers; hence, information is transferred. Data collected via cookies can be used to improve websites for all users and this also aids with user profiling for advertising.

When data mining techniques and statistical procedures are applied to understand web log data, the process is noted as log analysis or web usage mining. This helps with determining patterns in the users’ navigational behaviors. Some features that can be observed include how long users viewed pages for, click path lengths, and the number of clicks. Web usage mining has three phases. First, the log data is "preprocessed" to see the users and search sessions’ content. Then, tools like association and clustering are applied to look for patterns, and lastly, these patterns are saved to be further analyzed. The tool of association rule mining helps with finding “patterns, associations, and correlations” among pages users visit in a search session. Sequential pattern discovery is association rule mining, but it also accounts for time like the page views in an allotted time period. Classification is a tool that allows for pages to be added to groups representing certain similar qualities.

Some examples of tools individuals can use when conducting click analytics are the Google Analytics tool In-Page Analytics, ClickHeat, and Crazy Egg.[14] These tools create a visual from user click data on a webpage. ClickHeat and Crazy Egg showcase the density of user clicks using specific colors, and all of these tools allow for webpage visitors to be categorized into groups by qualities like being a mobile user or using a particular browser. The specific groups' data can be analyzed for further insight.

Click behaviour

One of the main factors users consider when clicking links is a link's position in a list of results. The closer links are to the top, the more likely they are to be selected by users.[15] When users have a personal connection to a subject matter they tend to click that article more frequently. Pictures, position, and specific individuals in the news content also more heavily influenced users’ decisions. The source of the news was deemed as less important.

Click attitude and click intention play a large role in user click behavior. In one study when research participants were presented with positive and negative insurance advertisement photographs, emotion was seen to have a positive association with click intention and click attitude. The researchers also observed that click attitude affects click intention, and positive emotion has more of an impact than negative emotion on click attitude.

The internet can be considered a risky environment due to the abundance of cybersecurity attacks that can occur and the prevalence of malware. Hence, whenever individuals use the internet, they have to decide whether or not to click on the various links. A 2018 study found that users tend to click on more URLs on websites they are familiar with; this user trait is then exploited by cybercriminals, and personal information can be compromised. Hence, trust is seen to also increase click-through intention. When given Google Chrome warnings, 70% of the time people will click through. They also tend to adjust default computer settings in this process. Users were also found to better recognize malware risks when there is a greater potential for revealing their personal information.

Relevance of search results

Pages that are viewed by users during a particular search session constitute click data. Such data can be used to improve search results in two ways, as explicit and implicit feedback. Explicit feedback is when users indicate which pages are relevant to their search query, while implicit feedback is when user behavior is interpreted to determine results’ relevance. Certain user actions on a webpage that can be used as a part of the interpretation process include bookmarking, saving, or printing a particular web page. Through collecting click data from a few individuals, the relevance of results for all users for given queries can improve. In a search session, a user indicates which documents they are more interested in with their clicks, and this indicates what is relevant to the search. The most relevant click data to determine relevance of results is often the last viewed web page rather than all of the pages clicked on in a search session. Click data outside of search sessions can also be used to improve the accuracy of relevant results for users.

The search results to a given query are usually subject to positional bias.[16] This is because users tend to select links that are at the top of result lists. However, this position does not mean a result is the most relevant since relevance can change over time. As a part of a machine learning approach to improving the result order, human editors begin by supplying an original rank for each result to the algorithm. Then, live user click feedback in the form of tracked click-through rates (CTR) in search sessions can be used to rerank the results based on the data. This improves the order of the results based on the live indicated relevance from the users.

Click dwell time and click sequence information can also be used to improve the relevance of search results.[17] Click dwell time is how long a user takes to return to the search engine results page (SERP) after clicking on a particular result, and this can indicate how satisfied the user is with a particular result. Eye-tracking research indicates that users exhibit an abundance of non-sequential viewing activity when looking at search results. Click models that abide by “top-down” user click behavior cannot interpret the user process of revisiting pages.

Extensions

Advertising

Supply-demand mismatch costs can be reduced through click tracking.[18] Huang et al. defines strategic customers as “forward looking” individuals who know that their clicks are being tracked and expect that companies will engage in appropriate business activities. In the conducted study, researchers used clickstream data from customers to observe their preferences and desired product quantities. Noisy clicks are when customers click but do not actually buy the product. This leads to imperfect advanced demand information or ADI.

Click tracking can be used in the realm of advertising, but there is the potential for this tool to be used negatively. Publishers display advertisements on their websites, and they receive money depending on the amount of traffic, measured as a number of clicks, they send to the advertisers website. Click fraud is when publishers fake clicks to generate revenue for themselves. In the 2012 Fraud Detection in Mobile Advertising (FDMA) conference, competition teams were tasked with having to use data mining and machine learning techniques to determine “fraudulent publishers” from a given dataset. A successful algorithm is able to observe and use morning and night click traffic patterns. When there is density of clicks between these main patterns, it is often an indicator of a fraudulent publisher.

Website content can be adjusted to make it specific to users using “user navigational behavior” and user interests in a process called web personalization. Web personalization is useful in the realm of e-commerce. There are unique steps in the process of web personalization, and the first step is noted as “user profiling.” In this step, the user is understood and constituted through their click behavior, preferences, and qualities. Following user profiling is “log analysis and web usage mining.”

Email

Phishing is usually administered through emails, and when a user clicks on a phishing attempt email, their information will be leaked to particular websites.[19] Spear-phishing is a more “targeted” form of phishing in which user information is used to personalize emails and entice users to click. Some phishing emails will also contain other links and attachments. Once these are either clicked or downloaded, users’ privacy can be encroached. Lin et al. conducted a study to see which psychological “weapons of influence” and “life domains” affect users most in phishing attempts, and they found that scarcity was the most influential factor weapon of influence, and the legal domain was the most influential life domain. Age is also an important factor in determining those who are more susceptible to clicking on phishing attempts.

When a virus infects a computer, it finds email addresses and sends copies of itself through these emails. These emails will usually contain an attachment and will be sent to several individuals.[20] This differs from user email account behavior because users tend to have a particular network they communicate with regularly. Researchers studied how the Email Mining Toolkit (EMT) could be used to detect viruses by studying such user email account behavior and found that it was easier to decipher quick, broad viral propagations in comparison to slow, gradual viral propagations.

In order to know what emails users have opened, email senders engage in email tracking.[21] By merely opening an email, users' email addresses can be leaked to third parties, and if users click on links within the emails, their email address can get leaked to a larger number of third parties. Also, each time a user opens an email sent to them, their information can get sent to a new third party among those that their address has already been leaked to. Many third party email trackers are also involved in web tracking, leading to further user profiling.

Privacy

See also: Information privacy.

Privacy-protection models anonymize data after it is sent to a server and stored in a database. Hence, user personal identification information is still collected, and this collection process is based on users trusting such servers. Researchers study giving users control over what information is sent from their mobile devices. They also observe giving users control over how that information is represented in databases in the realm of trajectory data, and they create a system that allows for this approach. This approach gives users the potential to increase their privacy.

When user privacy is going to be encroached, consent forms are often distributed. The type of user activity required in these forms can have an effect on how much information a user retains from the form. Karegar et al. compares the simple agree/disagree format with forms that incorporate checkboxes, drag and drop (DAD), and swipe features. When testing what information users would agree to disclose with each of the consent form formats, researchers observed that users presented with DAD forms had a greater number of eye-fixations and on the given consent form.

When a third-party is associated with a first-party website or mobile application, anytime a user visits the first party website or mobile application, their information will be sent to the third-party.[22] Third-party tracking generates more privacy concerns than first-party tracking because it allows for many website or application records about a particular user to be combined, yielding better user profiles. Binns et al. found that among 5000 popular websites, the top two websites alone had 2000 trackers. Of the 2000 embedded trackers, 253 were used in 25 other websites. Researchers evaluated the reach of third-party trackers based on their contact with users rather than websites, so more "popular" trackers were those who received information about the highest number of people rather than code embedded in the most first-parties. Google and Facebook were deemed as the first and second largest web trackers, and Google and Twitter were deemed as the first and second largest mobile trackers.

See also

Notes and References

  1. Leiva. Luis. November 2013. Web browsing behavior analysis and interactive hypervideo. ACM Transactions on the Web. 7. 4. 1–28. 10.1145/2529995.2529996. 10251/39081. 14720910. free.
  2. Eirinaki. Magdalini. 2003. Web mining for web personalization. ACM Transactions on Internet Technology. 3. 1–27. 10.1145/643477.643478. 2880491.
  3. Kristol. David. 2001. HTTP Cookies: Standards, privacy, and politics. ACM Transactions on Internet Technology. 1. 151–198. 10.1145/502152.502153. cs/0105018. 2001cs........5018K. 1848140.
  4. Ogbanufe. Obi. 2018. "Just how risky is it anyway?" The role of risk perception and trust on click-through intention. Information Systems Management. 35. 3. 182–200. 10.1080/10580530.2018.1477292. 49411483.
  5. Karegar. Farzaneh. 2020. The Dilemma of User Engagement in Privacy Notices. ACM Transactions on Privacy and Security. 23. 1–38. 10.1145/3372296. 211263964. free.
  6. Romero-Tris. Cristina. 2018. Protecting Privacy in Trajectories with a User-Centric Approach. ACM Transactions on Knowledge Discovery from Data. 12. 6. 1–27. 10.1145/3233185. 52182075.
  7. Oentaryo. Richard. 2014. Detecting click fraud in online advertising: A data mining approach. The Journal of Machine Learning Research. 15. 99–140. ACM.
  8. Wu. ChienHsing. 2018. Emotion Induction in Click Intention of Picture Advertisement: A Field Examination. Journal of Internet Commerce. 17. 4. 356–382. 10.1080/15332861.2018.1463803. 158798317.
  9. Jung. Seikyung. 2007. Click data as implicit relevance feedback in web search. Information Processing & Management. 43. 3. 791–807. 10.1016/j.ipm.2006.07.021.
  10. Nguyen. David. 2009. Information privacy in institutional and end-user tracking and recording technologies. Personal and Ubiquitous Computing. 14. 53–72. 10.1007/s00779-009-0229-4. 8546306. free.
  11. Menges. Raphael. 2019. Improving User Experience of Eye Tracking-Based Interaction. ACM Transactions on Computer-Human Interaction. 26. 6. 1–46. 10.1145/3338844. 207834246.
  12. Boi. Paolo. 2016. Reconstructing User's Attention on the Web through Mouse Movements and Perception-Based Content Identification. ACM Transactions on Applied Perception. 13. 3. 1–21. 10.1145/2912124. 15346882.
  13. Laperdrix. Pierre. 2020. Browser Fingerprinting: A Survey. ACM Transactions on the Web. 14. 1–33. 10.1145/3386040. 145051810.
  14. Farney. Tabatha. 2011. Click Analytics: Visualizing Website Use Data. Information Technology and Libraries. 30. 3. 141–148. 10.6017/ital.v30i3.1771. free.
  15. Kessler. Sabrina Heike. 2019. Why do we click? Investigating reasons for user selection on a news aggregator website. European Journal of Communication Research. 44. 225–247.
  16. Moon. Taesup. 2012. An Online Learning Framework for Refining Recency Search Results with User Click Feedback. ACM Transactions on Information Systems. 30. 4. 1–28. 10.1145/2382438.2382439. 15825473.
  17. Liu. Yiqun. 2016. Time-Aware Click Model. ACM Transactions on Information Systems. 35. 3. 1–24. 10.1145/2988230. 207243041.
  18. Huang. Tingliang. 2011. The Promise of Strategic Customer Behavior: On the Value of Click Tracking. Production and Operations Management. 22. 3. 489–502. 10.1111/j.1937-5956.2012.01386.x.
  19. Lin. Tian. 2019. Susceptibility to Spear-Phishing Emails: Effects of Internet User Demographics and Email Content. ACM Transactions on Computer-Human Interaction. 26. 5. 1–28. 10.1145/3336141. 32508486. 7274040.
  20. Stolfo. Salvatore. 2006. Behavior-based modeling and its application to Email analysis. ACM Transactions on Internet Technology. 6. 2. 187–221. 10.1145/1149121.1149125. 13438014.
  21. Englehardt. Steven. 2018. I never signed up for this! Privacy implications of email tracking. Proceedings on Privacy Enhancing Technologies. 1. 109–126. 10.1515/popets-2018-0006. 41532115. free.
  22. Binns. Reuben. 2018. Measuring Third-party Tracker Power across Web and Mobile. ACM Transactions on Internet Technology. 18. 4. 1–22. 10.1145/3176246. 3603118. 1802.02507.