Azure Data Factory is Azure’s ETL cloud service for serverless data integration and scale-out data transformation. The service provides a code-free user interface for intuitive creation, monitoring, and management from a single console. You can also lift and shift existing SSIS packages to Azure and run them in ADF with full compatibility.
In times of big data, disorganized raw data is often stored in relational, non-relational, and other storage systems. On its own, however, the raw data lacks context or the necessary significance to be used meaningfully by analysts, data specialists or decision-makers in companies.
Big data requires a process orchestration and operationalization service that transforms these vast amounts of raw data into actionable business insights.
Azure Data Factory is a dedicated managed cloud service for these complex hybrid projects with ETL (extract, transform, and load), ELT (extract, load, and transform), and data integration.
The Azure Data Factory platform is the cloud-based ETL and data integration service that enables users to create data-driven workflows to orchestrate data movements and transformations on demand. With Azure Data Factory, users can create and schedule data-driven workflows (called pipelines) that collect data from different data stores. Users can create complex ETL processes that visually transform data using data flows or compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.
Users can also publish your transformed data to data stores, such as Azure Synapse Analytics, for use by business intelligence (BI) applications. Through Azure Data Factory, raw data can ultimately be organized into meaningful data stores and data lakes and used to make better business decisions.
Data Factory provides a code-free data integration and transformation layer that supports all digital transformation initiatives.
Azure Data Factory helps organizations modernize SSIS.
Data collection from several different sources can be costly, time-consuming, and sometimes require multiple solutions. Azure Data Factory provides a single pay-as-you-go service. Here are the following options:
With Azure Data Factory, collect data from on-premises, hybrid, and multicloud sources. The next step: Transform them into Azure Synapse Analytics.
With Azure Data Factory, users can rehost SQL Server Integration Services (SSIS) with a few clicks, and create code-free ETL/ELT pipelines with built-in Git and CI/CD support.
With Azure Data Factory, use a fully managed, serverless cloud service that scales on demand and bills on a pay-as-you-go basis.
Azure Data Factory provides more than 90 built-in connectors to capture all on-premises and software-as-a-service (SaaS) data. Leverage on-demand orchestration and monitoring.
Azure Data Factory provides autonomous ETL to increase operational efficiency and support integrators without programming experience.
areto’s reference architecture developed offers many advantages.
The use of areto’s reference architecture provides customers with architectural best practices for the development and operation of reliable, secure, efficient and cost-effective systems in the cloud. Areto’s architectural solutions are consistently measured against Microsoft best practices in order to deliver the highest benefit to customers.
The areto reference architecture is based on five pillars: operational excellence, safety, reliability, performance efficiency, cost optimization.
Operational Excellence
Optimal design of operation and monitoring of the systems as well as continuous improvement of supporting processes and procedures
Security
Protection of information, systems, assets, risk assessments and risk mitigation strategies
Cost optimization
Maximizing ROI through the continuous process of improving the system throughout its lifecycle.
Reliability
Ensure security, disaster recovery, business continuity as data is mirrored in multiple redundant locations.
Performance efficiency
Efficient use of computer resources, scalability to meet short-term requirement peaks, sustainability
Gartner, Magic Quadrant for Cloud Infrastructure & Platform Services, Raj Bala, Bob Gill, Dennis Smith, Kevin Ji, David Wright, 27 July 2021. Gartner and Magic Quadrant are registered trademarks of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from AWS. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
With the Microsoft expert team from areto to the data driven company!
Find out where your company currently stands on the way to becoming a data-driven company.
We analyze the status quo and show you what potential exists.
How do you want to get started?
Free consultation & demo appointments
Do you already have a strategy for your future Micrsoft Data Analytics solution? Are you already taking advantage of modern cloud platforms and automation? We would be happy to show you examples of how our customers are already using areto’s agile and scalable Microsoft solutions.
Workshops / Coachings
In our Microsoft workshops and coaching sessions, you will gain the necessary know-how, e.g. for setting up a modern cloud strategy or IBCS-compliant reporting with Power BI . The areto Microsoft TrainingCenter offers a wide range of learning content.
Proof of Concepts
Establishing a connection / collecting data
In companies, different types of data are stored in different sources (local, in the cloud, structured, unstructured as well as partially structured), usually all of them arrive at different intervals, at different speeds.
The first step in creating an information system for production involves connecting to all the necessary data as well as processing sources, e.g. SaaS (Software-as-a-Service) services, databases, file shares and FTP web services. The next step involves moving the data to a central location for further processing. Without Data Factory, organizations must create custom components for moving data or write custom services to integrate these data sources and processing. Integrating or managing these systems is expensive and time-consuming. Often, organizations lack appropriate monitoring , alerting, and control capabilities of a fully managed service.
With Data Factory, you can leverage copy activity in a data pipeline to move data from both on-premises and cloud-based source data stores to a central data store in the cloud for further analysis. For example, you can collect data in Azure Data Lake Storage and later transform it using an Azure Data Lake Analytics compute service. You can also collect data in Azure Blob Storage and transform it later using Azure HDInsight Hadoop clusters.
Transform / Extend
When data exists in a centralized data store in the cloud, you can process or transform the collected data with ADF mapping data flows. With data flows, data engineers can create as well as manage graphs for data transformation running under Spark without having to be familiar with Spark clusters or Spark programming.
If you prefer manual coding of transformations: ADF supports external activities to run your transformations with compute services, e.g. HDInsight Hadoop, Spark, Data Lake Analytics, Machine Learning.
CI/CD and publishing
Data Factory has full CI/CD support for your data pipelines via Azure DevOps and GitHub. This allows you to incrementally develop and deploy your ETL processes before publishing the finished product. After the raw data is in an enterprise-usable format, load it with Azure Data Warehouse, Azure SQL Database, Azure CosmosDB, or another analytics engine that your users can reference in their business intelligence tools.
Monitor
After you have successfully created and deployed your data integration pipeline to get business value from the optimized data, you can monitor the planned activities as well as pipelines for success and failure rates. Azure Data Factory provides built-in support for pipeline monitoring via Azure Monitor, API, PowerShell, Azure Monitor logs, and integrity buckets in the Azure portal.
General concepts
An Azure subscription can have at least one Azure Data Factory instance (or Data Factory). Azure Data Factory consists of the following main components:
Together, they provide the platform on which you can assemble data-driven workflows with steps for moving as well as transforming data.