JSON vs Structs vs Columns in BigQuery - Observations
Shout out to Josh Peng and Claus Herther who inspired me to write this post! A common way of logging and delivering data from production systems is via the JSON format. JSON allows for a flexible schema that supports nested value pairs and arrays. In BigQuery we have a few options to consider when choosing how to store this data for use in BigQuery. JSON string column with BigQuery JSON functions Pros: Easiest to use directly from the source system Flexible schema as source data changes Cons: We lose BigQuery’s columnar data storage benefits. Every query will pay the cost of hitting the entire JSON object. JSON functions are somewhat limited Transform the JSON into a Structs Pros: Depending on your use case this may better fit your data needs, especially if you have nested arrays in your JSON Efficient use of columnar data storage Cons: Lack of schema flexibility May be new territory for SQL users who are not used to using nested fields Transform the JSON into a standard table with