0

I would like to ask for advice regarding the following task: assume a collection of BQ tables bearing names with structure name_YYYYMM and containing each a DATETIME type column called date_time whose values all belong to the YYYYMM month specified by the table suffix (a form of pseudo-sharding, in other words). The objective is to properly shard this collection, by creating a new collection of derived tables, bearing names with structure name_YYYYMMDD.

What I envision in principle is an INSERT statement iteration over the range of all YYYYMMDD dates emerging from the month-suffixes of the original tables. Within this iteration each individual statement would look as follows:

INSERT INTO `name_(#YYYYMMDD format date#)`
SELECT ...
FROM name_YYYYMM
WHERE FORMAT_DATETIME("%Y%m%d", date_time)=#YYYYMMDD format date#

What I am not aware of is a concrete & efficient method to somehow pass the iterated string variable #YYYYMMDD format date# as a suffix of the name of each correctly sharded table in the desired new collection. If anyone could enlighten me in this regard I would be very grateful.

2
  • I've not tried it, but you can use partitioning (instead of sharding) for each month shard? I don't see in the documentation where it says you can't have a partitioned table inside of a shard collection, and it may be easier to implement. As for your question, I believe you will need to write the sql dynamically and execute since you can't dynamically reference objects in plain-jane sql.
    – JNevill
    Commented Jun 4 at 19:56
  • I really think you need to read about partitioned tables: cloud.google.com/bigquery/docs/partitioned-tables This will reduce a lot of over-engineering that you might end up doing from your question. All you need is one partitioned table at the end and you can partition on your dated field, you can manage auto expiry of data (if applied) etc. That will streamline everything for you. Commented Jun 4 at 20:33

0