https://twitter.com/tac0x2a/status/1547769068446330886
雰囲気としては、 dbt test が切り出された感じかなー。
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install soda-core-bigquery
touch configuration.yml
touch checks.yml
# configuration.yml
data_source <適当なデータソース名>:
type: bigquery
connection:
account_info_json: '{
"type": "service_account",
"project_id": "...",
"private_key_id": "...",
"private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"client_email": "...@<project-name>.iam.gserviceaccount.com",
"client_id": "...",
"auth_uri": "<https://accounts.google.com/o/oauth2/auth>",
"token_uri": "<https://accounts.google.com/o/oauth2/token>",
"auth_provider_x509_cert_url": "<https://www.googleapis.com/oauth2/v1/certs>",
"client_x509_cert_url": "<https://www.googleapis.com/robot/v1/metadata/x509/>..."
}'
auth_scopes:
- <https://www.googleapis.com/auth/bigquery>
- <https://www.googleapis.com/auth/cloud-platform>
- <https://www.googleapis.com/auth/drive>
project_id: "<project-name>"
dataset: <dataset_name>
# check.yml
# Checks for basic validations
checks for attraction:
- row_count > 0 # SELECT COUNT(*) FROM disney_load.attraction
- row_count > 1
soda scan -d <適当なデータソース名> -c configuration.yml checks.yml
row_count を複数回書いても、COUNT するクエリが1回だけ流れた。
project_id なり dataset の切り替えどうするんやろ。

ぴえん。
dbt Cloud で実行したジョブの結果を Soca Cloud に連携できるらしい。