How to Simplify Creating Azure Synapse Pipelines

How to Simplify Creating Azure Synapse Pipelines

Understanding the anatomy of a pipeline

Photo by JJ Ying on Unsplash

TLDR

  • What makes up an Azure Synapse Pipeline
  • Make Azure Synapse Pipeline process easier by creating your datasets ahead of time
  • Here is a video on how to create Azure Synapse Datasets

https://medium.com/media/1b1da2021a9d9a0bba21a55a967a5f9b/href

  • Leveraging templates to create pipelines

In my previous article, I mentioned Pipelines as one of the Top features in Azure Synapse. Now lets dive a bit more into what they are and how to use it.

What are Pipeline in Azure Synapse

Pipelines in Azure Synapse are the same as Azure Data Factory pipelines. It’s just been embedded into Azure Synapse Studio. It can be defined as a way to group activities together to create a job. It’s also a low code that gives you the ability to follow the iterations in journey of a job.

To understand pipelines, you need to know how it fits together with other Azure Synapse Data terminology like ‘Datasets’ and ‘Activities’. Take a look at the image below. This shows that a dataset is needed to create an activity and an activity can be seen as a logical grouping of a pipeline

Anatomy of an Azure Synapse Pipeline — Image by Author

How to create a pipeline

To Create a Pipeline in Synapse you need to go to the Integrate section and then you’ll see the option to create a new pipeline

Image by Author

On the left, you have all the activities and in the open space in the middle is where you can drag the activities in to the open space to begin the process. Examples of activities are Copy Jobs, Data Flows, Lookups, Iterations and Conditionals like filters. Within an activity is where you get to include your dataset.

Once you’re done creating your pipeline, you will need to publish it. Then you can run it to validate it works with no error. You can take it further by running your pipeline as a scheduled job.

How to make creating Pipelines easier

Creating a pipeline can be challenging if you do not understand all the building blocks. One thing I’ve found helpful when creating pipelines is to make sure you have your dataset created ahead of time. In the video at the beginning of this blog, I showed how to create a dataset and how it makes it easier to create my pipelines once my dataset is already created.

Creating Pipelines in Azure Synapse is a great way to run an efficient ETL or ELT process. Preparing the groundwork ahead of the pipeline is one of the biggest but simplest part that makes this process easier to create and publish. If you’re stuck and need some inspiration on creating pipelines check out the Gallery section (as shown in the red box below), under pipelines.

Image by Author

You’ll see different types of pipeline options which you can use to quickly get started. If you’re ever in doubt on a pipeline should look like, this is the place to go.

Image by Author

Conclusion

Creating pipeline in a cloud based solution like Synapse makes it easier to perform ETL / ELT Jobs. Making sure you create you datasets and using templates make it easier create your pipeline with less hassle. Stay tuned for other data related content.


How to Simplify Creating Azure Synapse Pipelines was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

This content was originally published here.