

In the kingdom of data manipulation and analytics, one common challenge is dealing with data stored in delimited string format within a single column. This format poses difficulties for analysis and querying since the data is not readily accessible or organized. Snowflake, a popular cloud-based data warehousing platform, offers a powerful solution to this problem through its versatile SQL capabilities. In this article, we will explore how Snowflake enables the splitting of a delimited string column into rows, facilitating more efficient data processing and analysis.
Delimited string columns contain multiple values separated by a specific delimiter, such as commas, pipes, or tabs. For example, consider a table employee_skills with a column skills storing skills possessed by each employee in a comma-separated format:

While this structure is convenient for storage, it complicates querying and analysis. Snowflake offers a solution to this problem through its SPLIT_TO_TABLE function, allowing us to split delimited strings into individual rows.Let’s create a table called EMPLOYEE_SKILLS and insert the above data to the table.

Here, the data for columns Employee_ID and Name inserted as is but skills column data inserted with delimiter (,).Let’s see how data is loaded to the table.select * from rt_snowflake_dev.stage.EMPLOYEE_SKILLS;

Snowflake's SPLIT_TO_TABLE function is specifically designed to split delimited strings into rows. It takes two arguments: the string to split and the delimiter used for splitting. Let's demonstrate how to use SPLIT_TO_TABLE to transform the skills column into individual rows.

This query splits the skills column by commas and returns each skill as a separate row, along with the corresponding employee_id and name. The TRIM(VALUE) function removes any leading or trailing whitespace from the extracted values. We have added an Audit column called ‘DW_insert_date’ to track changes. However, the addition of Audit column would be optional.
In the above SQL query, the LATERAL keyword is used before the SPLIT_TO_TABLE function. This signifies a LATERAL join, a powerful feature in SQL that allows correlated subqueries to reference columns from preceding tables in the FROM clause.In our scenario, the LATERAL join operates by executing the SPLIT_TO_TABLE function for each row of the employee_skills table. It dynamically splits the delimited string in the skills column into individual rows, providing a seamless way to expand the data.Let's demonstrate how to use SPLIT_TO_TABLE to transform the skills column into individual rows and assign a priority to each skill:

This query splits the skills column by commas and returns each skill as a separate row, along with the corresponding EMPLOYEE_ID and EMPLOYEE_NAME. Additionally, it assigns a priority to each skill within each employee using the ROW_NUMBER() window function.
To persist the transformed data, we can create a new table S_EMPLOYEE_SKILLS:

This CREATE TABLE statement selects the transformed data and inserts it into the new table S_EMPLOYEE_SKILLS. Now, the data is structured in a more accessible format, allowing for easier querying and analysis.

To consolidate the individual skills back into a comma-separated list for each employee, we can use the LISTAGG() function in Snowflake:

In this query:
This query produces a result where each employee's skills are combined into a single skillset, providing a concise summary of their abilities.
The ability to split delimited string columns into rows opens up numerous possibilities for data analysis and manipulation. Here are some practical applications:
When using SPLIT_TO_TABLE in Snowflake, consider the following best practices:
Snowflake's SPLIT_TO_TABLE function provides a powerful mechanism for splitting delimited string columns into individual rows, facilitating more flexible and efficient data analysis. By leveraging this functionality, data engineers and analysts can unlock valuable insights from delimited data, enabling better decision-making and deeper understanding of their datasets. Whether it's analyzing employee skills, parsing addresses, or categorizing data, Snowflake empowers users to conquer the challenges posed by delimited string columns with ease and efficiency.