Schedule export from BigQuery to Pigment

  • 20 January 2022
  • 0 replies
  • 304 views
Schedule export from BigQuery to Pigment
Userlevel 4
Badge +2

This article explains how to connect BigQuery service with Pigment.
As an example, we will see how to extract a full table from BigQuery and push it to the Pigment import API which will load this data into a Transaction List. We will also see how to schedule a daily export. Please adjust this example to better match your needs. 

 

Big Picture: to automate this export from BigQuery to Pigment, we will create a Google Cloud Workflow. This workflow will first export one of your BigQuery table into a csv format and then put this csv file in a GCS (Google Cloud Storage) bucket. Once done, the workflow will push this file to the Pigment Import API, triggering an import on the Pigment side. 

 

Let’s see how we build this! 

 

Create a GCS bucket

 

Creating a GCS bucket is very simple.  Just follow the steps on the Google documentation. This bucket will be used to temporarily store the data you plan to send to Pigment. 

Here are the recommended settings:

  • Name: pigment-bigquery-export
  • Location: “Single region”, preferably the same as your BigQuery dataset region
  • Storage class: Standard
  • Data access: Not Public
  • Encryption: keep default parameters

Once done, your bucket should look like this:

 

Create a Workflow

 

In this Workflow, we will configure the logic of the data transfer to Pigment.

First, go to the Workflows page in your Cloud console and, if needed, enable the Workflows API option.

Just follow the steps of this documentation

Service Account: 

Grant the necessary Roles to the service account to properly run the Workflow you will create in the next step. The Editor Role should be enough for most cases. 

Workflow:

  1. In the Cloud Console, go to the Workflows page.
  2. Select Create.
  3. Enter a name for the new workflow, such as “pigmentExportRevenueData”.
  4. Choose an appropriate region (same as the GCS bucket and the BigQuery cluster)
  5. Select the service account you just created
  6. Select Next.
  7. In the workflow editor, enter the definition for your workflow.
- init:
assign:
- project_id: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
- dataset_id: "bigquery_dataset_name"
- table_id: "bigquery_table_name"
- gcs_bucket: "pigment-bigquery-export"
- gcs_file: "extract-bigquery"
- pigment_api_key: "PIGMENT_API_KEY"
- pipeline_id: "PIGMENT_PIPELINE_ID"
- extract_to_gcs:
call: googleapis.bigquery.v2.jobs.insert
args:
projectId: ${project_id}
body:
configuration:
extract:
destinationUris: ${"gs://" + gcs_bucket + "/" + gcs_file}
destinationFormat: "CSV"
sourceTable:
projectId: ${project_id}
datasetId: ${dataset_id}
tableId: ${table_id}
- read_csv:
call: http.get
args:
url: ${"https://storage.googleapis.com/storage/v1/b/" + gcs_bucket + "/o/" + gcs_file}
query:
alt: "media"
auth:
type: OAuth2
result: csvContent
- send_to_pigment:
call: http.post
args:
url: ${"https://pigment.app/api/importservice/Csv/ApiPush/" + pipeline_id + "?api_key=" + pigment_api_key}
body: ${csvContent.body}
- the_end:
return: "Success"

In the init part, fill the  different fields except project_id: 

  • dataset_id :  Name of the Bigquery dataset
  • table_id: Name of the table

  • gcs_bucket: bucket Name you created in the previous step

  • gcs_file: any file name (will be the name of the csv file)

  • pigment_api_key: authentification key to Pigment

  • pipeline_id: Id of the import configuration you want to use. 

To retrieve the Pigment API key and Pipeline_id, please check the Pigment Import API documentation.

Note: as the workflow will contain the Pigment Import API key, anyone having read access on the workflow will have Write access to the Pigment data. You might want to secure this API key in a secret storage.

 

You are almost done! Click on Deploy and your Workflow is now created!

Before moving to the scheduling part, click on Execute. You will see the data appearing in Pigment!

 

 

Schedule the Export

 

To schedule this export, we will create a Cloud Scheduler job that trigger this workflow at the desired frequency. Please follow the Google Cloud Documentation

Here are some example of Cron frequency you might want to use

Schedule Cloud scheduler format
Daily: every day at 04:00 0 4  * * *
Weekly: every Monday at 09:00 0 9 * * 1
Monthly : Every first day of the month at 4  0 4 1 * *

 

 

 


0 replies

Be the first to reply!

Reply