The hierarchical editing language for macromolecules (HELM) is a method of describing complex biological molecules. It is a notation that is machine readable to render the composition and structure of peptides, proteins, oligonucleotides, and related small molecule linkers.[1]
HELM was developed by a consortium of pharmaceutical companies in what is known as the Pistoia Alliance. Development began in 2008. In 2012 the notation was published openly and for free.[2]
The HELM open source project can be found on GitHub.[2]
The need for HELM became obvious as researchers began working on modeling and computational projects involving molecules and engineered biomolecules of this type. There was not a language to describe the entities in an accurate manner which described both the composition and the complex branching and structure common in these entity types.[1] Protein sequences can describe larger proteins and chemical language files such as mol files can describe simple peptides. But the complexity of new research biomolecules makes describing large complex molecules difficult with chemical formats, and peptide formats are not sufficiently flexible to describe non-natural amino acids and other chemistries.[3]
In HELM, molecules are represented at a four levels in a hierarchy:[4]
Monomers are assigned short unique identifiers in internal HELM databases and can be represented by the identifier in strings. The approach is similar to that used in Simplified molecular-input line-entry system (SMILES). An exchangeable file format allows sharing of data between companies who have assigned different identifiers to monomers.[5]
(For now, see the following external links: "HELM notation" on HELM wiki, and test data file.)
In 2014 ChEMBL announced plans to adopt HELM by 2014.[6] The informatics company BIOVIA developed a modified Molfile format called the Self-Contained Sequence Representation (SCSR) A standard which can incorporate individual attempts to solve the problem and be used universally and avoid proliferating standards is a goal of HELM.[5]
An editor tool is needed to visualize and work with biomolecules at the correct level of detail. The editor is needed to "zoom out" to see a large molecule at the amino-acid sequence level, then "zoom in" to the atomic level at a particular site of conjugation or derivatization.[7]
The HELM Editor and HAbE (HELM Antibody Editor) are two client tools which may in the future be released as web-based applications.[8]
At a conference in Pistoia, Italy, a group of researchers from Pfizer, AstraZeneca, GlaxoSmithKline, and Novartis formed what came to be known as the Pistoia Alliance. All parties were interested in solving problems for data aggregation, data sharing and analytics for pharmaceutical research. The alliance was incorporated in 2008. The alliance is now composed of informatics experts and researchers from industry, academia and life science service organizations. [9]