Office Open XML file formats explained

See main article: Office Open XML.

Office Open XML Document
Extension:.docx, .docm
Mime:application/vnd.
openxmlformats-officedocument.
wordprocessingml.
document[1]
Owner:Microsoft, Ecma, ISO/IEC
Genre:Document file format
Extended From:XML, DOC, WordProcessingML
Standard:ECMA-376, ISO/IEC 29500
Url:ECMA-376, ISO/IEC 29500:2008
Office Open XML Presentation
Extension:.pptx, .pptm
Mime:application/vnd.
openxmlformats-officedocument.
presentationml.
presentation[2]
Owner:Microsoft, Ecma, ISO/IEC
Genre:Presentation
Extended From:XML, PPT
Standard:ECMA-376, ISO/IEC 29500
Url:ECMA-376, ISO/IEC 29500:2008
Office Open XML Workbook
Extension:.xlsx, .xlsm
Mime:application/vnd.
openxmlformats-officedocument.
spreadsheetml.
sheet[3]
Owner:Microsoft, Ecma, ISO/IEC
Genre:Spreadsheet
Extended From:XML, XLS, SpreadsheetML
Standard:ECMA-376, ISO/IEC 29500
Url:ECMA-376, ISO/IEC 29500:2008

The Office Open XML file formats are a set of file formats that can be used to represent electronic office documents. There are formats for word processing documents, spreadsheets and presentations as well as specific formats for material such as mathematical formulas, graphics, bibliographies etc.

The formats were developed by Microsoft and first appeared in Microsoft Office 2007. They were standardized between December 2006 and November 2008, first by the Ecma International consortium, where they became ECMA-376, and subsequently, after a contentious standardization process, by the ISO/IEC's Joint Technical Committee 1, where they became ISO/IEC 29500:2008.

Container

See main article: Open Packaging Conventions.

Office Open XML documents are stored in Open Packaging Conventions (OPC) packages, which are ZIP files containing XML and other data files, along with a specification of the relationships between them.[4] Depending on the type of the document, the packages have different internal directory structures and names. An application will use the relationships files to locate individual sections (files), with each having accompanying metadata, in particular MIME metadata.

A basic package contains an XML file called [Content_Types].xml at the root, along with three directories: _rels, docProps, and a directory specific for the document type (for example, in a .docx word processing package, there would be a word directory). The word directory contains the document.xml file which is the core content of the document.

[Content_Types].xml: This file provided MIME type information for parts of the package, using defaults for certain file extensions and overrides for parts specified by IRI.
  • _rels: This directory contains relationships for the files within the package. To find the relationships for a specific file, look for the _rels directory that is a sibling of the file, and then for a file that has the original file name with a .rels appended to it. For example, if the content types file had any relationships, there would be a file called [Content_Types].xml.rels inside the _rels directory.
  • _rels/.rels: This file is where the package relationships are located. Applications look here first. Viewing in a text editor, one will see it outlines each relationship for that section. In a minimal document containing only the basic document.xml file, the relationships detailed are metadata and document.xml.
  • docProps/core.xml: This file contains the core properties for any Office Open XML document.
  • word/document.xml: This file is the main part for any Word document.
  • Relationships

    An example relationship file (word/_rels/document.xml.rels), is:

    As such, images referenced in the document can be found in the relationship file by looking for all relationships that are of type <nowiki>http://schemas.microsoft.com/office/2006/relationships/image</nowiki>. To change the used image, edit the relationship.

    The following code shows an example of inline markup for a hyperlink:

    In this example, the Uniform Resource Locator (URL) is in the Target attribute of the Relationship referenced through the relationship Id, "rId2" in this case. Linked images, templates, and other items are referenced in the same way.

    Pictures can be embedded or linked using a tag:

    This is the reference to the image file. All references are managed via relationships. For example, a document.xml has a relationship to the image. There is a _rels directory in the same directory as document.xml, inside _rels is a file called document.xml.rels. In this file there will be a relationship definition that contains type, ID and location. The ID is the referenced ID used in the XML document. The type will be a reference schema definition for the media type and the location will be an internal location within the ZIP package or an external location defined with a URL.

    Document properties

    Office Open XML uses the Dublin Core Metadata Element Set and DCMI Metadata Terms to store document properties. Dublin Core is a standard for cross-domain information resource description and is defined in ISO 15836:2003.

    An example document properties file (docProps/core.xml) that uses Dublin Core metadata, is:

    Office Open XML File format and structure Wikipedia Office Open XML, Metadata, Dublin Core Office Open XML uses ISO 15836:2003 Wikipedia 1 2008-06-19T20:00:00Z 2008-06-19T20:42:00Z Document file format Final

    Document markup languages

    An Office Open XML file may contain several documents encoded in specialized markup languages corresponding to applications within the Microsoft Office product line. Office Open XML defines multiple vocabularies using 27 namespaces and 89 schema modules.

    The primary markup languages are:

    Shared markup language materials include:

    In addition to the above markup languages custom XML schemas can be used to extend Office Open XML.

    Design approach

    Patrick Durusau, the editor of ODF, has viewed the markup style of OOXML and ODF as representing two sides of a debate: the "element side" and the "attribute side". He notes that OOXML represents "the element side of this approach" and singles out the KeepNext element as an example:

    In contrast, he notes ODF would use the single attribute fo:keep-next, rather than an element, for the same semantic.[5]

    The XML Schema of Office Open XML emphasizes reducing load time and improving parsing speed.[6] In a test with applications current in April 2007, XML-based office documents were slower to load than binary formats.[7] To enhance performance, Office Open XML uses very short element names for common elements and spreadsheets save dates as index numbers (starting from 1900 or from 1904).[8] In order to be systematic and generic, Office Open XML typically uses separate child elements for data and metadata (element names ending in Pr for properties) rather than using multiple attributes, which allows structured properties. Office Open XML does not use mixed content but uses elements to put a series of text runs (element name r) into paragraphs (element name p). The result is terse and highly nested in contrast to HTML, for example, which is fairly flat, designed for humans to write in text editors and is more congenial for humans to read.

    The naming of elements and attributes within the text has attracted some criticism. There are three different syntaxes in OOXML (ECMA-376) for specifying the color and alignment of text depending on whether the document is a text, spreadsheet, or presentation. Rob Weir (an IBM employee and co-chair of the OASIS OpenDocument Format TC) asks "What is the engineering justification for this horror?". He contrasts with OpenDocument: "ODF uses the W3C's XSL-FO vocabulary for text styling, and uses this vocabulary consistently".[9]

    Some have argued the design is based too closely on Microsoft applications.In August 2007, the Linux Foundation published a blog post calling upon ISO National Bodies to vote "No, with comments" during the International Standardization of OOXML. It said, "OOXML is a direct port of a single vendor's binary document formats. It avoids the re-use of relevant existing international standards (e.g. several cryptographic algorithms, VML, etc.). There are literally hundreds of technical flaws that should be addressed before standardizing OOXML including continued use of binary code tied to platform specific features, propagating bugs in MS-Office into the standard, proprietary units, references to proprietary/confidential tags, unclear IP and patent rights, and much more".[10]

    The version of the standard submitted to JTC 1 was 6546 pages long. The need and appropriateness of such length has been questioned.[11] [12] Google stated that "the ODF standard, which achieves the same goal, is only 867 pages"[11]

    WordprocessingML (WML)

    Word processing documents use the XML vocabulary known as WordprocessingML normatively defined by the schema wml.xsd which accompanies the standard. This vocabulary is defined in clause 11 of Part 1.[13]

    SpreadsheetML (SML)

    Spreadsheet documents use the XML vocabulary known as SpreadsheetML normatively defined by the schema sml.xsd which accompanies the standard. This vocabulary is described in clause 12 of Part 1.[13]

    Each worksheet in a spreadsheet is represented by an XML document with a root element named in the

    Notes and References

    1. Web site: Register file extensions on third party servers . Microsoft . 2009-09-04 . microsoft.com .
    2. Web site: Register file extensions on third party servers . Microsoft . 2009-09-04 . microsoft.com .
    3. Web site: Register file extensions on third party servers . Microsoft . 2009-09-04 . microsoft.com .
    4. Web site: Office Open XML Overview . Tom Ngo . 6 . Ecma International . December 11, 2006 . 2007-01-23 .
    5. Web site: Patrick Durusau. Old Wine In New Skins . 21 October 2008 .
    6. Web site: Software Developer uses Office Open XML to Minimize File Space, Increase Interoperability. Intellisafe Technologies.
    7. Web site: MS Office 2007 versus Open Office 2.2 shootout . George Ou . 2007-04-27 . 2007-04-27 . ZDnet.com . 2009-03-26 . https://web.archive.org/web/20090326081043/http://blogs.zdnet.com/Ou/?p=480 . dead .
    8. Web site: Differences between the 1900 and the 1904 date system in Excel . 2013-03-05 . 2016-08-23 . Microsoft.
    9. Web site: Disharmony of OOXML . Rob Weir . 14 March 2008.
    10. Web site: OOXML — vote "No, with comments" . John Cherry . 14 March 2008.
    11. Web site: Google's Position on OOXML as a Proposed ISO Standard . February 2008 . . If ISO were to give OOXML with its 6546 pages the same level of review that other standards have seen, it would take 18 years (6576 days for 6546 pages) to achieve comparable levels of review to the existing ODF standard (871 days for 867 pages) which achieves the same purpose and is thus a good comparison. Considering that OOXML has only received about 5.5% of the review that comparable standards have undergone, reports about inconsistencies, contradictions and missing information are hardly surprising . dead . https://web.archive.org/web/20100818112807/http://www.odfalliance.org/resources/Google%20OOXML%20Q%20%20A.pdf . 2010-08-18 .
    12. Web site: OOXML: What's the big deal? . 2008-02-19 . . dead . https://web.archive.org/web/20091003044227/http://www.ibm.com/developerworks/library/x-ooxmlstandard.html . 2009-10-03 .
    13. Web site: ISO/IEC 29500-1:2016 . 2016-11-01 . ISO and IEC.