Skip to main content

Snowflake data cleaning with RAW

· 2 min read
Jeremy Posner
Benjamin Gaidioz

Many companies are moving their in-scope data to Snowflake for advanced analytics, but they may encounter several scenarios where their data isn't readily available in the platform.

  1. The required data may simply not be in Snowflake yet.
  2. Or that data is needed on a one-off basis, and loading it into Snowflake would be time-consuming.
  3. Or the necessary data belongs to external sources.

The same of course applies to any other Cloud database, including RedShift, BigQuery, etc. Whichever you choose, their aim is to grow a captive audience who keep paying for more compute and storage, thus building up their Data Gravity (see my earlier blog on this topic).

RAW can help you get the job done without the hassle of importing data. We query all data in place. No moving, No downloading. It’s faster to get to results and iterate. We expose data as APIs. Data products your users can access without needing to know where the source lies.

We've prepared a demo use case that shows you step by step how a Snowflake table of addresses to be corrected, can be augmented with a column produced from an external system. The underlying logic is as straightforward as the code below.

We have created a demo use case that provides a step-by-step guide on how to validate and standardize addresses stored in a Snowflake table, using an external system. The logic underlying this process is straightforward, as illustrated by the code segment below.

let
addresses = Snowflake.InferAndRead("DB01", "CRM", "ADDRESS")
in
Collection.Transform(
addresses,
(a) ->
Record.AddField(
a,
// custom call to an external web service!
fixed = validate(a.country, a.city, a.street_address)
)
)

Curious to see it all? Clone and customize this example now!