Latent semantic structure indexing explained

Latent semantic structure indexing (LaSSI) is a technique for calculating chemical similarity derived from latent semantic analysis (LSA).

LaSSI was developed at Merck & Co. and patented in 2007[1] by Richard Hull, Eugene Fluder, Suresh Singh, Robert Sheridan, Robert Nachbar and Simon Kearsley.

Overview

LaSSI is similar to LSA in that it involves the construction of an occurrence matrix from a corpus of items and the application of singular value decomposition to that matrix to derive latent features. What differs is that the occurrence matrix represents the frequency of two- and three-dimensional chemical descriptors (rather than natural language terms) found within a chemical database of chemical structures. This process derives latent chemical structure concepts that can be used to calculate chemical similarities and structure–activity relationships for drug discovery.

References

Notes and References

  1. http://patft.uspto.gov/netacgi/nph-Parser?patentnumber=7219020 United States Patent: 7219020