Synapse Data Engineering – in MS Fabric

Synapse Data Engineering

Once your data has arrived in its “Lakehouse”, Engineering provides a set of tools for Data Engineers to get the data cleaned, enriched, validated, and ready for live use.

Its main tools are Opensource Apache “Spark” jobs, which can run many processes in parallel for big data applications, and which allow you to further refine and enrich your data for final use.

The “Medallion” architecture is useful to think of here, where:

  • Bronze – raw data not yet cleaned or validated.
  • Silver – Data undergoing cleaning and preparation, now partly ready for use.
  • Gold – Data which is completely cleaned, enriched and validated, and now ready for output, BI, dashboards, or machine learning algorithms.

Data engineering is very much about the productionizing of your data feeds, so that regular data arrives, is imported, transformed, validated, cleaned, and prepared for reporting with – all being well – little or no manual intervention at all.