The kinds of queries dbplyr
can create when translating from R to BigQuery (or whatever database language you are using) depends on the translations that have been defined between R and BigQuery. I can not find any example that suggests a translation is defined for UNNEST
in the existing dbplyr
package. Reference 1, Reference 2
One work around is to define a custom function, not to do translation within dbplyr
, but as a translator alongside dbplyr
. I have used this approach with success before when I needed PIVOT
in SQL but could not find a translation for tidyr::spread
.
The approach works, because remote tables in dbplyr
are defined by two things: (1) the connection to the remote database, (2) the code/query that returns the current view of the table. Hence once dbplyr
has translated R to BigQuery or SQL it is updating the second half of the definition.
We can do this using a custom function:
unnest <- function(input_tbl, select_columns, array_column, unnested_columns){
# extract connection
db_connection <- input_tbl$src$con
select_columns = paste0(select_columns, collapse = ", ")
unnested_columns = paste0(paste0("un.", unnested_columns), collapse = ", ")
# build SQL unnest query
sql_query <- dbplyr::build_sql(
con = db_connection
,"SELECT ", select_columns, ", ", position, ", ", unnested_columns, "\n"
,"FROM (\n"
,dbplyr::sql_render(input_tbl)
,"\n) AS src\n"
,"CROSS JOIN UNNEST(", array_column, ") AS un WITH OFFSET position"
)
return(dplyr::tbl(db_connection, dbplyr::sql(sql_query)))
}
Note that I am a dbplyr
user, but not a BigQuery user, so my syntax in the above may not be quite perfect. I have followed this question and this one for syntax.
Example use:
remote_table = tbl(bigquery_connection, from = "table_name")
unnested_table = unnest(remote_table, "ID", "array_col", "list")
# check syntax of dbplyr query
unnested_table %>% show_query()
# if this is not a valid bigquery query then next command will error
# view top 10 rows
unnested_table %>% head(10)
If remote_table
looks like:
ID ARRAY_COL
01 list = [a,b,c]
02 list = [d,e]
03 list = [q]
Then unnested_table
should look like:
ID POSITION un.list
01 0 a
01 1 b
01 2 c
02 0 d
02 1 e
03 0 q
And unnested_table %>% show_query()
should look something like:
<SQL>
SELECT *, position, un.list
FROM (
SELECT *
FROM table_name
) AS src
CROSS JOIN UNNEST(ARRAY_COL) AS un WITH OFFSET position
Update to match target query
I am aware of no dbplyr
feature that will translate _TABLE_SUFFIX BETWEEN "20191101" AND "20191102"
easily so you will have to handle this another way - perhaps looping over a list of dates in R.
First step is to get dbplyr
to render the query prior to unnesting. Probably something like:
for(date in c("20191101", "20191102")){
table_name = paste0("bigquery-public-data.google_analytics_sample.ga_sessions_",date)
remote_table = tbl(bigquery_connection, from = table_name)
remote_table = remote_table %>%
filter(! (geoNetwork.networkDomain %like% "%google%")) %>%
select(fullVisitorId, visitId, date, visitStartTime, hits, geoNetwork.networkDomain) %>%
distinct()
}
Calling show_query(remote_table)
should then produce something equivalent to the following. But it will not be exactly identical because dbplyr
writes code differently to humans.
SELECT DISTINCT fullVisitorId, visitId, date, visitStartTime, hits, geoNetwork.networkDomain
FROM 'bigquery-public-data.google_analytics_sample.ga_sessions_20191101'
WHERE NOT(geoNetwork.networkDomain LIKE "%google%")
The second step is to call the custom unnest function"
remote_table = unnest(remote_table,
select_columns = c("fullVisitorId", "visitId", "date", "visitStartTime", "geoNetwork.networkDomain"),
array_column = "hits",
unnested_columns = c("page.pagePath", "time")
)
Calling show_query(remote_table)
should then produce the following:
SELECT fullVisitorId, visitId, date, visitStartTime, geoNetwork.networkDomain, position, un.page.pagePath, un.time,
FROM (
the_query_from_the_first_step
) AS src
CROSS JOIN UNNEST(src.hits) AS un WITH OFFSET position
That is probably as far as I can assist as I do not have a bigquery environment to test this in myself. You may have to adjust the custom unnest
function to get it to exactly match your context. Hopefully the above is enough to get you started.