Using Snowflake’s Result Cache for Pre-Aggregation and Retrieval Optimization.

 In AI, ML and Data

virtual warehouse

In this blog, we will look at Snowflake’s ‘Result Cache’ feature and its usage in terms of cost, performance and pre-aggregation. Before we dive into it, here’s a look at where the Result Cache is positioned within the Snowflake Architecture:

Snowflake Architecture

Snowflake Architecture

Snowflake stores the results of each query that it executes in the Result Cache, which is part of the Cloud Services layer. There is no cost associated with this Cache. With these details in mind, we will cover the following:

  • The reuse of the Result Cache and its limitations
  • A mechanism to reuse this cache and to further perform data analysis on this data (post-processing of the Result Cache).
  • Performance optimization by breaking complex queries into small steps using the Result Cache.

Let us go over a practical example to demonstrate the capability of a cached result.

Sample datasets

For more details on Sample datasets, please refer to this link.
We will be interested in generating the following aggregate results for the year 2001 (Q1):

  • Net profit by City, County, State, Country.
  • Net profit by State, Country.
  • Customer count by City, State, Country.
  • Customer count by Gender, County, State, Country.

The following options exist to store the results for post-processing:

Performance Benefits Security Benefits Simplifies Query Logic Supports Clustering Uses Storage Uses Credits for Maintenance Notes
Regular table
Transient table There are no fail-safe costs.
Temporary table Session specific; the scope of data retention is a minimum of one day. No fail-safe costs.
Regular view
Cached query result Used only if data has not changed, and if the query only uses deterministic functions (e.g., not CURRENT_DATE).
Materialized view Storage and maintenance requirements typically result in increased costs.

The above table is referenced from this link: “https://docs.snowflake.net/manuals/user-guide/views-materialized.html”.

However, for the purposes of this discussion, we will narrow our focus to the Result Cache (persisted results). The option for triggering a raw query is not considered, since it will always be with a different criterion and, therefore, would add to the compute cost.

The aggregate data for building the aggregate results (mentioned earlier) would be as follows:

aggregate data

Below is the base query to generate the above result:

query1

The execution of the above query took around 7 mins on a Virtual Warehouse of medium size.
Snowflake stores the result of each query that it executes internally at the Cloud Services layer and, by default, persists the result of each query for 24 hours (maximum 31 days if reused; please refer to the “limitations” section for more details).

Triggering

Triggering the same query again may use the Result Cache. The re-execution of the above generates the result within 5 seconds, and the query profile uses the Result Cache instead of scanning the table. The reuse of the result for the same query is not guaranteed, even if it meets all the criteria mentioned here: https://docs.snowflake.net/manuals/user-guide/querying-persisted-results.html#retrieval-optimization.

RESULT_SCAN

Snowflake does not guarantee that the Result Cache will always be used for the same query. How, then, can we make use of the result that the previous query processed? We can achieve that by using a function called “RESULT_SCAN”. If we pass the query_id to this function, it displays the results of that query_id.

SELECT *
FROM TABLE(RESULT_SCAN(018c15c9-02ab-c92d-0000-0389003cd5f2′))

The following command describes the column of the results.
DESCRIBE RESULT ‘018c0fcb-02c6-4eca-0000-0389003c105e’;

These results can be embedded as part of a view, and can be further used for data analysis.
CREATE VIEW VW_AGG_2001_Q1_DATA
AS
SELECT *
FROM TABLE(RESULT_SCAN(018c15c9-02ab-c92d-0000-0389003cd5f2′));

The Result Cache comparison with materialized view and regular view:

We will create the following objects for a performance comparison:

  • Regular view “VW_RG_AGG_2001_Q1_DATA”.

regular view

  • Materialized view “MV_AGG_2001_Q1_DATA”.
    The materialized view is a feature that is still evolving and at present does not support “INNER JOIN”. That is why, for our example, we will be using a hybrid method.

    • Create materialized view on the base table

    • Create regular view using materialized view.

regular view using materialized

    • We will refer to the previously created view “VW_AGG_2001_Q1_DATA“ for the Result Cache pre-aggregated query.
    • CREATE TABLE AS SELECT (CTAS) using the following query:

CREATE TABLE AS SELECT

Scenario Normal View for Pre-aggregate Materialized View for Pre-aggregate The Result Cache View for Pre-aggregate CTAS
Execution time (M Warehouse) Execution time (M Warehouse) Execution time (S Warehouse) Execution time (S Warehouse)
Total Net Profit by City, County, State, Country. 3m 29s 1m 17s 946ms 825ms
Total Net Profit by State, Country. 2m 36s 1m 2.05s 360ms
Customer Count by City, State, Country. 4m 31s 1m 14s 804ms 350ms
Customer Count by Gender, County, State, Country. 4m 28s 1m 18s 610ms 152ms

Note: For queries please refer the below table.

Scenario Result Cache View for Pre-aggregate CTAS (Table) Normal View for Pre-aggregate Normal View for Pre-aggregate
Total Net Profit by City, County, State, Country. SELECT SUM(NET_PROFIT) NET_PROFIT, S_CITY,S_COUNTY,S_STATE,S_COUNTRY FROM VW_RS_AGG_2001_Q1_DATA, GROUP BY S_CITY,S_COUNTY,S_STATE,S_COUNTRY; SELECT SUM(NET_PROFIT) NET_PROFIT, S_CITY,S_COUNTY,S_STATE,S_COUNTRY FROM TB_AGG_2001_Q1_DATA, GROUP BY S_CITY,S_COUNTY,S_STATE,S_COUNTRY; SELECT SUM(NET_PROFIT) NET_PROFIT, S_CITY,S_COUNTY,S_STATE,S_COUNTRY FROM VW_MV_AGG_2001_Q1_DATA, GROUP BY S_CITY,S_COUNTY,S_STATE,S_COUNTRY; SELECT SUM(NET_PROFIT) NET_PROFIT, S_CITY,S_COUNTY,S_STATE,S_COUNTRY FROM VW_RG_AGG_2001_Q1_DATA, GROUP BY S_CITY,S_COUNTY,S_STATE,S_COUNTRY;
Total Net Profit by State, Country. SELECT SUM(NET_PROFIT) NET_PROFIT, S_STATE,S_COUNTRY FROM VW_RS_AGG_2001_Q1_DATA, GROUP BY S_STATE,S_COUNTRY, ORDER BY 1 DESC; SELECT SUM(NET_PROFIT) NET_PROFIT, S_STATE,S_COUNTRY, FROM TB_AGG_2001_Q1_DATA, GROUP BY, S_STATE,S_COUNTRY, ORDER BY 1 DESC; SELECT SUM(NET_PROFIT) NET_PROFIT, S_STATE,S_COUNTRY, FROM VW_MV_AGG_2001_Q1_DATA, GROUP BY, S_STATE,S_COUNTRY, ORDER BY 1 DESC; SELECT SUM(NET_PROFIT) NET_PROFIT, S_STATE,S_COUNTRY, FROM VW_RG_AGG_2001_Q1_DATA, GROUP BY, S_STATE,S_COUNTRY, ORDER BY 1 DESC;
Customer Count by City, State, Country. SELECT SUM(CUSTOMER_COUNT) CUSTOMER_COUNT, S_CITY, S_STATE,S_COUNTRY, FROM VW_RS_AGG_2001_Q1_DATA
GROUP BY S_CITY, S_STATE,S_COUNTRY, ORDER BY 1 DESC;
SELECT SUM(CUSTOMER_COUNT) CUSTOMER_COUNT, S_CITY, S_STATE,S_COUNTRY, FROM TB_AGG_2001_Q1_DATA. GROUP BY S_CITY, S_STATE,S_COUNTRY, ORDER BY 1 DESC; SELECT SUM(CUSTOMER_COUNT) CUSTOMER_COUNT, S_CITY, S_STATE,S_COUNTRY, FROM VW_MV_AGG_2001_Q1_DATA, GROUP BY S_CITY, S_STATE,S_COUNTRY, ORDER BY 1 DESC; SELECT SUM(CUSTOMER_COUNT) CUSTOMER_COUNT, S_CITY, S_STATE,S_COUNTRY, FROM VW_RG_AGG_2001_Q1_DATA, GROUP BY S_CITY, S_STATE,S_COUNTRY, ORDER BY 1 DESC;
Customer Count by Gender, County, State, Country. SELECT SUM(CUSTOMER_COUNT) CUSTOMER_COUNT, CD_GENDER,S_CITY, S_STATE,S_COUNTRY, FROM, VW_RS_AGG_2001_Q1_DATA, GROUP BY CD_GENDER,S_CITY, S_STATE,S_COUNTRY, ORDER BY 1 DESC; SELECT SUM(CUSTOMER_COUNT) CUSTOMER_COUNT, CD_GENDER,S_CITY, S_STATE,S_COUNTRY, FROM TB_AGG_2001_Q1_DATA, GROUP BY CD_GENDER,S_CITY, S_STATE,S_COUNTRY, ORDER BY 1 DESC; SELECT SUM(CUSTOMER_COUNT) CUSTOMER_COUNT, CD_GENDER,S_CITY, S_STATE,S_COUNTRY, FROM, VW_MV_AGG_2001_Q1_DATA, GROUP BY CD_GENDER,S_CITY, S_STATE,S_COUNTRY, ORDER BY 1 DESC; SELECT SUM(CUSTOMER_COUNT) CUSTOMER_COUNT, CD_GENDER,S_CITY, S_STATE,S_COUNTRY, FROM VW_RG_AGG_2001_Q1_DATA, GROUP BY CD_GENDER,S_CITY, S_STATE,S_COUNTRY, ORDER BY 1 DESC;

Result comparison

Result comparison:

  • CTAS performed much better. There is a cost associated with storage.
  • The Result Cache view for pre-aggregate performs equally well with the CTAS option, and has no cost associated with it.
  • The materialized view is still evolving, and at present only works well with a single table.
  • A query on view/base table will always be heavy on compute.

Limitations of the materialized view and the Result Cache

Materialized view
While it does provide flexibility, it has these limitations:

  • SQL “INNER JOIN” are not supported.
  • The materialized view is typically slower than cached results. This is because the materialized view always provides current data, so if the materialized view is not up to date, it reads the up to date portion from the materialized view and rest of the data from the base tables.
  • The materialized view has storage and compute resource costs associated with it. The compute cost is due to the automatic internal maintenance of the materialized view by Snowflake.
  • The following link has more details on the limitations of the materialized view: https://docs.snowflake.net/manuals/user-guide/views-materialized.html#limitations-on-creating-materialized-views.

The Result Cache:
Here are the limitations of the Result Cache:

  • There is a trade-off in terms of query performance and query result rows and size. Ideally, query result output should be small with regard to rows and size.
  • Snowflake does not keep any metadata for the Result Cache, and hence the result output in terms of rows and size does matter.
  • The query results (result scan function) cannot be accessed by other users. It is only accessible to the user who executed the query.
  • There is no metadata associated with the results, which is why a small data size and rows are preferred.
  • The default purge time of 24 hours for persisted results can be reset to 31 days only if we reuse the persisted results. The end date is calculated from the date and time that the query was first executed.
  • In case the result has expired, the query displays the following error:

select * from table(result_scan(‘018bd8c9-025e-6245-0000-03890039b60e’));
Result for query 018bd8c9-025e-6245-0000-03890039b60e has expired

How can the Result Cache scan help?
The Results Cache has its limitations, as mentioned above. However, these factors make it a desirable option for pre-aggregation and retrieval optimization.

  • Results are stored irrespective of data changes in the base table. The query evaluation does not work there. If the base table has new rows which do not impact the earlier query result, the query won’t use the cached results. Instead, it will be re-executed.
  • It can help in the case of a complex query (i.e. queries with multiple joins) not performing as expected. These queries can be further broken down into multiple smaller components using the Result Cache’s features.

Example:

The following sql script breaks the query into multiple chunks and uses the function “last_query_id”.

  • The Result Cache helps in building a cost-efficient cache layer in terms of compute and storage.
Recent Posts

Start typing and press Enter to search

Contact Us
close slider
Contact Us

Yes, I would like Persistent to contact me on the information provided above. Click Here to read our full Privacy Notice.

snowflake