JDPF Overview

:: Core Concepts

  > Pipelines

  > Nets

  > Components

  > Inspection

:: Computational Model

:: Components Examples


Core Concepts

JDPF is build around the pipeline concept. A pipeline (Fig. 1) is a set of data processing elements, that we call blocks or modules, connected in series, so that the output of one element is the input of the next one.

Pipeline
Figure 1 - The generic pipeline model.

Each block represents an operator defining the relation between an element x of a set X (the domain) and a unique element y of a set Y (the codomain) through a parameterization p of the embedded algorithm (Fig. 2). The parameterization allows decoupling the pipeline from the context as better explained later on. In most of the cases, X and Y can be given the mathematical structure of metric spaces.

Generic Pipeline Block
Figure 2 - The generic block which implements an operator A from a domain X to a codomain Y through a parameterization in P.

Delimited File Reader Pipelines

The pipeline in JDPF can be composed organizing sequentially two different kind of blocks:

Filter and transformer
Figure 3 - Symbols for the filter and the transformer.

Each block must be provided with a descriptor that specifies the type of input/output data the block can accept, as well as the set of accepted/needed parameters. A pipeline can be built by assembling different blocks according to the constraints defined by the descriptors.

Pipeline JDPF
Figure 4 - An example of pipeline, a sequence of blocks.

Delimited File Reader Nets

In order to process multiple data streams, JDPF provides also another couple of blocks:

Split
Figure 5 - Symbol for the split.

aggregator
Figure 6 - Symbol for the aggregator.

A net is a collection of blocks organized not simply in a sequence but in a network that uses splits and/or joins.

Net jdpf
Figure 7 - Example of net.

Delimited File Reader Components

When the pipeline or nets have been defined, it is needed to transform them into JDPF components that represent complete data processing elements. A component, when deployed in the JDPF engine, is able to fetch the data feeding the pipeline and store its results. This is possible by adding to the pipeline two kind of blocks:

Generator and Serializer
Figure 8 - Symbols for the generator and the serializer.


JDPF Component Example
Figure 9 - A simple example of pipeline component defined by the sequence: generator, filter, transformer, filter and serializer .
JDPF Net Component
Figure 10 - A simple example of net component.

Delimited File Reader Inspection

In order to be able to inspect data flowing in the components, we defined the blocks:

Pipeline with Inspectors
Figure 11 - A simple example of component (defined by the sequence: generator, filter, transformer, filter and serializer) with two different inspectors.

Parameters/Context

Since JDPF is conceived to provide a general purpose environment, the contextualization of the data processing components has been intentionally held out of the component scope. Our idea is that an external context manager should be used to determine the set of parameters for the blocks configurations. This solution guarantees, on one side, the independence of the algorithms from the context complexity and, on the other one, the possibility of reusing the same pipeline and/or component in different application contexts. Each time a component is run, it must receive the parameters for each component block.