Latent semantic structure indexing explained
Latent semantic structure indexing (LaSSI) is a technique for calculating chemical similarity derived from latent semantic analysis (LSA).
LaSSI was developed at Merck & Co. and patented in 2007[1] by Richard Hull, Eugene Fluder, Suresh Singh, Robert Sheridan, Robert Nachbar and Simon Kearsley.
Overview
LaSSI is similar to LSA in that it involves the construction of an occurrence matrix from a corpus of items and the application of singular value decomposition to that matrix to derive latent features. What differs is that the occurrence matrix represents the frequency of two- and three-dimensional chemical descriptors (rather than natural language terms) found within a chemical database of chemical structures. This process derives latent chemical structure concepts that can be used to calculate chemical similarities and structure–activity relationships for drug discovery.
References
- Hull, R.D., Fluder, E.M., Singh, S.B., Nachbar, R.B., Sheridan, R.P. and Kearsley, S.K. (2001) "Latent semantic structure indexing (LaSSI) for defining chemical similarity." J Med Chem, 2001 Apr 12;44(8):1177–84.
- Hull, R.D., Singh, S.B., Nachbar, R.B., Sheridan, R.P., Kearsley, S.K. and Fluder, E.M. (2001) "Chemical similarity searches using latent semantic structure indexing (LaSSI) and comparison to TOPOSIM." J Med Chem, 2001 Apr 12;44(8):1185–91.
- Singh, S.B., Sheridan, R.P., Fluder, E.M. and Hull, R.D. (2001) "Mining the chemical quarry with joint chemical probes: an application of latent semantic structure indexing (LaSSI) and TOPOSIM (Dice) to chemical database mining." J Med Chem, 2001 May 10;44(10):1564–75.
Notes and References
- http://patft.uspto.gov/netacgi/nph-Parser?patentnumber=7219020 United States Patent: 7219020