Introduction
Microsoft Azure Blob Storage is a massively scalable object storage for unstructured data offered by Microsoft as part of the Azure product suite.
MoEngage <> Microsoft Azure Blob
MoEngage and Microsoft Azure Blob integration makes use of MoEngage's S3 Data Exports to transfer data to your Azure Blob Storage for further processing and analytics.
Integration
library_add_check |
Prerequisites
|
You can set up a script to transfer data from S3 bucket to your Microsoft Azure Blob Storage to automatically schedule data ingestion.
Step 1: Create a storage account on Azure
On Microsoft Azure account:
-
-
- Navigate to Storage Accounts in the sidebar
- Click + Add to create a new storage account.
- Next, provide a storage account name. Other default settings will not need to be updated.
- Select Review + Create.
-
Even if you already have a storage account, we recommend creating a new one specifically for your MoEngage data.
Step 2: Get the connection string
Once the storage account is deployed, navigate to the Access Keys menu from the storage account and take note of the connection string.
Azure provides two access keys to maintain connections using one key while regenerating the other. You only need the connection string from one of them.
Step 3: Create a blob service container
- Navigate to Blob Service section >> Blobs menu.
- Create a Blob Service Container within that storage account you created earlier.
Provide a name for your Blob Service Container. Other default settings will not need to be updated.
Step 4: Setup AWS Data Exports on MoEngage
Ensure you have already set up the Data Exports to S3 by following the steps mentioned here. Once the data starts to flow into S3, you should move to the next step. This is important as we need to predefine the schema of our imports.
Sample file format
s3://client-moengage-data/event-exports/export_day=2021-07-01/export_hour=06/
info |
Note If you do not have an S3 account, we can set it up on our S3 bucket and configure the transfer service. Please reach out to support@moengage.com. |
Step 5: Script to transfer data from S3 to Azure blob
You can fetch the data from MoEngage S3 using the AWS CLI commands and ingest the data into your Azure Blob Storage (OR) use Azure commands directly to access the S3 bucket and fetch the data.
Below is the sample script that uses a middleware to process the data & ingest it into their Azure Blob Storage. The script
- Copies the data from S3 to an intermediate location (VM) & then to Azure Blob Storage.
- Deletes the data on the intermediate location after 1 day.
- Runs every 1 hour. You can modify it as per your requirements.
info |
Note This is a reference script; you can modify or use other methods compatible with your infrastructure. |
# Check az-copy
if ! [ -x "$(command -v ${AZ_COPY_COMMAND_PATH}/azcopy)" ]; then
echo 'Error: azcopy is not installed.' >&2
exit 1
fi
# Get one hour ago date
ONE_HOUR_AGO=$(date -d '1 hour ago' ${MOENGAGE_PARTITION_FORMAT})
# Get Directory
YEAR_DIRECTORY='year='$(date -d '8 hour ago' ${YEAR_PARTITION})
MONTH_DIRECTORY='month='$(date -d '8 hour ago' ${MONTH_PARTITION})
DAY_DIRECTORY='day='$(date -d '8 hour ago' ${DAY_PARTITION})
#HOUR_DIRECTORY='hour='$(date -d '8 hour ago' ${HOUR_PARTITION})
PARTITION_DIRECTORY='/'${YEAR_DIRECTORY}'/'${MONTH_DIRECTORY}'/'${DAY_DIRECTORY}'/'
S3_MOENGAGE_FINAL_PATH=${S3_MOENGAGE_BASE_PATH}${PARTITION_DIRECTORY}
EVENTS_FINAL_DIRECTORY=${EVENTS_BASE_DIRECTORY}${PARTITION_DIRECTORY}
echo "Start of Sync from S3 bucket"
# Sync of data from amazon s3 bucket to our local VM
echo "command run aws s3 sync ${S3_MOENGAGE_FINAL_PATH} ${EVENTS_FINAL_DIRECTORY}"
/usr/local/bin/aws s3 sync ${S3_MOENGAGE_FINAL_PATH} ${EVENTS_FINAL_DIRECTORY} --profile ${S3_MOENGAGE_AWS_PROFILE} | tee ${LOG_PATH_AWS}/${ONE_HOUR_AGO}.log
echo "Sync from S3 bucket completed"
echo "Start of Sync to Azure Blob"
# Sync of data from local VM to azure blob
${AZ_COPY_COMMAND_PATH}/azcopy sync "${EVENTS_BASE_DIRECTORY}/" "${AZURE_BLOB_BASE_PATH}/${AZURE_CONTAINER_NAME}/${AZURE_DIRECTORY_PATH}/?${AZURE_SAS_TOKEN}" --recursive | tee ${LOG_PATH_AZURE}/${ONE_HOUR_AGO}.log
echo "Sync to azure blob completed"
PREVIOUS_DAY=$(date -d '8 hour ago' ${DAY_PARTITION})
PRESENT_DAY=$(date -d '6 hour ago' ${DAY_PARTITION})
EVENTS_PREVIOUD_DAY_DIRECTORY=${EVENTS_BASE_DIRECTORY}'/year='$(date -d '24 hour ago' ${YEAR_PARTITION})'/month='$(date -d '24 hour ago' ${MONTH_PARTITION})'/day='$(date -d '24 hour ago' ${DAY_PARTITION})'/'
if [ "${PREVIOUS_DAY}" = "${PRESENT_DAY}" ];
then
rm -R ${EVENTS_PREVIOUD_DAY_DIRECTORY}
else
echo "this is false"
fi