Apache NiFi processors are the basic blocks of creating a data flow. Every processor has different functionality, which contributes to the creation of output flowfile. Dataflow shown in the image below is fetching file from one directory using GetFile processor and storing it in another directory using PutFile processor.

putfile_processor.jpg

GetFile

GetFile process is used to fetch files of a specific format from a specific directory. It also provides other options to user for more control on fetching. We will discuss it in properties section below.

getfile.jpg

GetFile Settings

Following are the different settings of GetFile processor −

Name

In the Name setting, a user can define any name for the processors either according to the project or by that, which makes the name more meaningful.

Enable

A user can enable or disable the processor using this setting.

Penalty Duration

This setting lets a user to add the penalty time duration, in the event of flowfile failure.

Yield Duration

This setting is used to specify the yield time for processor. In this duration, the process is not scheduled again.

Bulletin Level

This setting is used to specify the log level of that processor.

Automatically Terminate Relationships

This has a list of check of all the available relationship of that particular process. By checking the boxes, a user can program processor to terminate the flowfile on that event and do not send it further in the flow.

automatically_terminate_relationships.jpg

GetFile Scheduling

These are the following scheduling options offered by the GetFile processor −

Schedule Strategy

You can either schedule the process on time basis by selecting time driven or a specified CRON string by selecting a CRON driver option.

Concurrent Tasks

This option is used to define the concurrent task schedule for this processor.

Execution

A user can define whether to run the processor in all nodes or only in Primary node by using this option.

Run Schedule

It is used to define the time for time driven strategy or CRON expression for CRON driven strategy.

run_schedule.jpg

GetFile Properties

GetFile offers multiple properties as shown in the image below raging compulsory properties like Input directory and file filter to optional properties like Path Filter and Maximum file Size. A user can manage file fetching process using these properties.

getfile_properties.jpg

GetFile Comments

This Section is used to specify any information about processor.

getfile_comments.jpg

PutFile

The PutFile processor is used to store the file from the data flow to a specific location.

putfile.jpg

PutFile Settings

The PutFile processor has the following settings −

Name

In the Name setting, a user can define any name for the processors either according to the project or by that which makes the name more meaningful.

Enable

A user can enable or disable the processor using this setting.

Penalty Duration

This setting lets a user add the penalty time duration, in the event of flowfile failure.

Yield Duration

This setting is used to specify the yield time for processor. In this duration, the process does not get scheduled again.

Bulletin Level

This setting is used to specify the log level of that processor.

Automatically Terminate Relationships

This settings has a list of check of all the available relationship of that particular process. By checking the boxes, user can program processor to terminate the flowfile on that event and do not send it further in the flow.

automatically_terminate.jpg

PutFile Scheduling

These are the following scheduling options offered by the PutFile processor −

Schedule Strategy

You can schedule the process on time basis either by selecting timer driven or a specified CRON string by selecting CRON driver option. There is also an Experimental strategy Event Driven, which will trigger the processor on a specific event.

Concurrent Tasks

This option is used to define the concurrent task schedule for this processor.

Execution

A user can define whether to run the processor in all nodes or only in primary node by using this option.

Run Schedule

It is used to define the time for timer driven strategy or CRON expression for CRON driven strategy.

putfile_run_schedule.jpg

PutFile Properties

The PutFile processor provides properties like Directory to specify the output directory for the purpose of file transfer and others to manage the transfer as shown in the image below.

putfile_properties.jpg

PutFile Comments

This Section is used to specify any information about processor.

putfile_comments.jpg