BigQuery is a fully managed and serverless data warehouse solution available in the Google Cloud Platform that gives anyone the capability to perform analytics on terabytes of data in real-time.
MoEngage <> BigQuery
MoEngage and BigQuery integration makes use of MoEngage's S3 Data Exports to transfer data to Google BigQuery for further processing and analytics.
You can setup a Transfer Service to automatically schedule data ingestion from S3 into Google BigQuery.
Step 1: Setup AWS Data Exports
Ensure you have already setup the Data Exports to S3 by following the steps mentioned here. Once the data starts to flow into S3, you should move to the next step. This is important as we need to predefine the schema of our imports.
Note- If you do not have an S3 account, we can set it up on our S3 bucket and configure the transfer service for you. Please reach out to email@example.com.
Step 2: Pre-requisites to initiating data transfer
Before you create an Amazon S3 transfer:
- You need to verify that you can access the files in your assigned buckets. Download any one
.jsonfile on your local computer to validate.
- You need to verify that you have completed all actions required to enable the BigQuery Data Transfer Service.
- Create a BigQuery dataset to store your data.
- Retrieve your Amazon S3 URI, your access key ID, and your secret access key. For more details on managing the access keys, see the AWS documentation.
Note- If you intend to setup transfer to run notifications for Pub/Sub, you must have
pubsub.topics.setIamPolicy permissions. Pub/Sub permissions are not required if you just set up email notifications. For more information, see BigQuery Data Transfer Service run notifications.
Step 3: Create a new table to store the incoming data
Once you have the necessary details in place, you will need to create a new table with the schema. You can follow the instructions on Create a destination table. Make sure to follow these specific changes once you're on the Create a Table screen:
- Choose "Upload" under "Create table from".
- Select the downloaded file from your S3 bucket.
- File format should be "JSONL (Newline delimited JSON)".
- Select the destination Project, Dataset, and Table.
- Under the Schema section, select the "Auto detect" option.
You can change other configurations (apart from these) as per your requirements. Once you have been satisfied with your configuration, click on "Create Table". A new job will be run in the background to load the data to this table. If all goes well, your new table will be created.
Step 4: Verify if your data has been loaded
Before setting up the Transfer Service, you need to verify if our data has been correctly loaded and that the schema is in place.
- Navigate to the Explorer dashboard
- Click on your table in the sidebar to view its schema.
It should generally look like this:
- Use the preview tab to verify.
Once you verify your data here, you are good to go to the next step.
Step 5: Setup the BigQuery Data Transfer Service
Follow the steps mentioned on S3 Transfers page for a detailed and up-to-date guide.
- Go to the BigQuery page in the Google Cloud console.
Click Create a Transfer.
On the Create Transfer page:
In the Source type section, for Source, choose Amazon S3.
- In the Transfer config name section, for Display name, enter a name for the transfer such as
My Transfer. The transfer name can be any value that allows you to easily identify the transfer if you need to modify it later.
- In the Schedule options section, for Schedule, leave the default value (Start now) or click Start at a set time.
- In the Destination settings section, for Destination dataset, choose the dataset you created to store your data.
In the Data source details section:
- For Destination table, enter the name of the table you created to store the data in BigQuery. Destination table names support parameters.
- For Amazon S3 URI, enter the URI in the following format
s3://mybucket/myfolder/.... URIs also support parameters.
- For Access key ID, enter your access key ID.
- For Secret access key, enter your secret access key.
For File format choose your data format: newline delimited JSON.
- In the JSON,CSV section, check Ignore unknown values to accept rows that contain values that do not match the schema. Unknown values are ignored.
Note- These steps are indicative, you can choose to change configurations as per your requirements. Also, be aware of certain limitations of these data loads.
Once you set it up, you should see your first Transfer run as per your Schedule. You can always go back to the logs to see if something goes wrong or visit BigQuery's Troubleshooting section.