Jaql Explained

Paradigm:	Functional
Designer:	Vuk Ercegovac (Google)
Developers:	-->
Latest Release Version:	0.5.1
Programming Language:	Java
Operating System:	Cross-platform
License:	Apache License 2.0
File Formats:	-->
Implementations:	IBM BigInsights

Jaql (pronounced "jackal") is a functional data processing and query language most commonly used for JSON query processing on big data.

It started as an open source project at Google^[1] but the latest release was on 2010-07-12. IBM^[2] took it over as primary data processing language for their Hadoop software package BigInsights.

Although having been developed for JSON it supports a variety of other data sources like CSV, TSV, XML.

A comparison^[3] to other BigData query languages like PIG Latin and Hive QL illustrates performance and usability aspects of these technologies.

Jaql supports^[4] lazy evaluation, so expressions are only materialized when needed.

Syntax

The basic concept of Jaql is

source -> operator(parameter) -> sink ;

where a sink can be a source for a downstream operator. So typically a Jaql program has to following structure, expressing a data processing graph:

source -> operator1(parameter) -> operator2(parameter) -> operator2(parameter) -> operator3(parameter) -> operator4(parameter) -> sink ;

Most commonly for readability reasons Jaql programs are linebreaked after the arrow, as is also a common idiom in Twitter Scalding:

source -> operator1(parameter)-> operator2(parameter)-> operator2(parameter)-> operator3(parameter)-> operator4(parameter)-> sink ;

Core operators

Source:^[5]

Expand

Use the EXPAND expression to flatten nested arrays. This expression takes as input an array of nested arrays [[ T ] ] and produces an output array [T ], by promoting the elements of each nested array to the top-level output array.

Filter

Use the FILTER operator to filter away elements from the specified input array. This operator takes as input an array of elements of type T and outputs an array of the same type, retaining those elements for which a predicate evaluates to true. It is the Jaql equivalent of the SQL WHERE clause.Example:

data = [{name: "Jon Doe", income: 20000, manager: false}, {name: "Vince Wayne", income: 32500, manager: false}, {name: "Jane Dean", income: 72000, manager: true}, {name: "Alex Smith", income: 25000, manager: false} ];

data -> filter $.manager;

[{ "income": 72000, "manager": true, "name": "Jane Dean" } ]

data -> filter $.income < 30000;

[{ "income": 20000, "manager": false, "name": "Jon Doe" }, { "income": 25000, "manager": false, "name": "Alex Smith" } ]

Group

Use the GROUP expression to group one or more input arrays on a grouping key and applies an aggregate function per group.

Join

Use the JOIN operator to express a join between two or more input arrays. This operator supports multiple types of joins, including natural, left-outer, right-outer, and outer joins.

Sort

Use the SORT operator to sort an input by one or more fields.

Top

The TOP expression selects the first k elements of its input. If a comparator is provided, the output is semantically equivalent to sorting the input, then selecting the first elements.

Transform

Use the TRANSFORM operator to realize a projection or to apply a function to all items of an output.

External links

Notes and References

https://code.google.com/p/jaql/ Original Jaql project
http://www.vldb.org/pvldb/vol4/p1272-beyer.pdf Initial Publication
Book: https://link.springer.com/chapter/10.1007/978-3-642-24151-2_5 . 10.1007/978-3-642-24151-2_5 . Comparing High Level MapReduce Query Languages . Advanced Parallel Processing Technologies . Lecture Notes in Computer Science . 2011 . Stewart . Robert J. . Trinder . Phil W. . Loidl . Hans-Wolfgang . 6965 . 58–72 . 978-3-642-24150-5 .
http://www.havlena.net/en/tag/jaql/ JAQL in Hadoop, a brief introduction
http://pic.dhe.ibm.com/infocenter/bigins/v2r0/index.jsp?topic=%2Fcom.ibm.swg.im.infosphere.biginsights.jaql.doc%2Fdoc%2Fc0057482.html IBM BigInsights Documentation