This is the first video within Tridant’s Data Science series which demonstrates how to integrate Snowflake as a data source for featuring engineering and machine learning modelling using AWS Sagemaker. The Snowflake tutorial used in this video is available at: https://quickstarts.snowflake.com/guide/vhol_snowflake_data_wrangler/index.html?index=..%2F..index#5
We will walk you through step-by-step configurations of both Snowflake and Amazon SageMaker to predict loan default on an example customer database.
1. Snowflake setting up includes creating a new loan database schema, loading up sample dataset and configuring integration with AWS Sagemaker role and access to S3 bucket.
2. Instead of utilising an AWS pre-configured template from the original tutorial, we discuss manual configurations of individual AWS roles, policies and trust relationships to enable access from SageMaker to Snowflake external data sources.
3. After successfully connected to Snowflake and imported loan dataset, we use SageMaker Data Wrangler to perform features engineering (e.g. remove unwanted columns, change column format, categorical coding, etc).
4. When the data is processed and ready, we can additionally analyse on the dataset including histogram plot, bias and leakage report, etc.
5. Finally, we train a loan default classification model using Quick Model function provided by SageMaker with detailed description of the model and attribute ranking.
Hoang Nguyen, Ph.D., Data Science Lead, Tridant