Data Factory – integration and data pipelines in Fabric

MS Fabric Data Factory

Data Factory is the first experience within Fabric.  It allows you to pull data from any source into the OneLake, build your own “Lake House” for a project, and automate and control the flow of fresh data into your environment.

Data Factory contains two main artefacts:

  1. Data Pipeline – This orchestrates and automates a number of steps to run to get the data into your Lakehouse.  You can also monitor how well its done, and create alerts against your data not arriving correctly.
  2. Data Flow – The Data Flow allows for more complex transformations and cleaning of your data.   It can be used as one step in a data pipeline

Between the above your data should be automatically in your Lakehouse, up to date, partially cleaned and prepared, and ready for its next stage.   Being a low code environment most of these data transfer processes are surprisingly easy to create and maintain.

In the past this type of integration has usually been done using:

  • Coded integrations, some of which can get very complex and have been maintained for many years.
  • Old-style scheduled FTP file drops, which then need a routine to consume the new file into the data destination.
  • iPaas products (stands for Integration Platform as a service), which aim to bring all your integrations into a single service to reduce complexity, add reporting and transparency to how they work, and rationalize what can be a complex pattern of integrations.  Popular iPaas products include Snaplogic, Mulesoft, Node Red, and several dozen others.
  • Azure-based products like Microsoft’s Power Automate, or other Azure integration tools.

The aim with Data flows is to massively reduce the complexity of the above, use the 150 connectors provided by Microsoft Power Factory, set up rules to move and clean the data, and as far as possible standardize and automate the approach to all your data movements.

Most often an organization has a combination of all of the above which still resembles a rather complex spaghetti.