BigQuery Upload Process

Complete post-test data pipeline from CSV to Google BigQuery

Introduction

This diagram documents the complete data pipeline that transfers test execution results from local CSV files to Google BigQuery for long-term storage, analytics, and reporting. The pipeline is automatically triggered by Maven's post-integration-test phase after all tests complete.

Pipeline Stages

  1. Maven trigger and configuration validation
  2. Google Cloud authentication
  3. Dataset and table validation/creation
  4. CSV parsing and data collection (OpenCSV)
  5. Batch insertion to BigQuery
  6. CSV file archiving

Key Process Steps

1. Maven Trigger

Phase: post-integration-test
Execution: Automatically triggered after test completion
Entry Point: FW_GoogleCloudBigQueryUploader.main()

2. Configuration Validation

Validates all required properties from testConfig.properties: Google Cloud Project ID, Service account credentials file, Dataset and table names, Schema file location, CSV file paths.

3. Authentication

Loads service account JSON credentials, creates authenticated BigQuery client, verifies client connectivity.

4. Dataset & Table Validation

Checks if dataset exists (creates if missing), checks if table exists (creates with schema if missing), validates table schema against JSON definition.

5. CSV Processing

Parses CSV using OpenCSV library, reads header row (skipped - already in schema), collects data rows into List<RowToInsert>, maps column names to values.

6. Data Upload

Batch insertion via FW_GoogleCloudUtils.writeBigQueryTableRows(), monitors job completion, handles timeout and retry logic.

7. File Archiving

Moves processed CSV to archive folder, appends timestamp to filename, confirms successful archive.

Complete Flow Diagram

flowchart TD
    Start([Maven Build<br/>post-integration-test]) --> CheckFlag{googleCloudBigQueryExport<br/>= true?}
    CheckFlag -->|No| Skip[Skip Export]
    CheckFlag -->|Yes| Main[FW_GoogleCloudBigQueryUploader<br/>main method]
    Main --> Upload[Call uploadTestStepsCSV]
    subgraph UploadFlow["uploadTestStepsCSV() Method"]
        U1[Start Upload Process]
        U1 --> V1[Get googleCloudProjectId]
        V1 --> V1Check{Valid?}
        V1Check -->|No| V1Fail[ERROR: Exit]
        V1Check -->|Yes| V2[Get googleCloudServiceAccountJsonFile]
        V2 --> V2Check{Valid?}
        V2Check -->|No| V2Fail[ERROR: Exit]
        V2Check -->|Yes| V3[Get googleCloudProjectDataset]
        V3 --> V3Check{Valid?}
        V3Check -->|No| V3Fail[ERROR: Exit]
        V3Check -->|Yes| V4[Get googleCloudProjectDatasetTableTestSteps]
        V4 --> V4Check{Valid?}
        V4Check -->|No| V4Fail[ERROR: Exit]
        V4Check -->|Yes| V5[Get googleCloudProjectDatasetTableTestStepsSchema]
        V5 --> V5Check{Valid?}
        V5Check -->|No| V5Fail[ERROR: Exit]
        V5Check -->|Yes| V6[Get dataOutputPath]
        V6 --> V6Check{Valid?}
        V6Check -->|No| V6Fail[ERROR: Exit]
        V6Check -->|Yes| V7[Get dataTestStepsFilename]
        V7 --> V7Check{Valid?}
        V7Check -->|No| V7Fail[ERROR: Exit]
        V7Check -->|Yes| FileCheck[File Validation]
        FileCheck --> F1[Check CSV file exists]
        F1 --> F1Check{Exists?}
        F1Check -->|No| F1Fail[ERROR: Exit]
        F1Check -->|Yes| F2[Check schema file exists]
        F2 --> F2Check{Exists?}
        F2Check -->|No| F2Fail[ERROR: Exit]
        F2Check -->|Yes| F3[Check service account file exists]
        F3 --> F3Check{Exists?}
        F3Check -->|No| F3Fail[ERROR: Exit]
        F3Check -->|Yes| ReadSchema[Read Schema]
        ReadSchema --> S1[FW_GoogleCloudUtils.readSchemaFromFile]
        S1 --> S1Check{Success?}
        S1Check -->|No| S1Fail[ERROR: Exit]
        S1Check -->|Yes| AuthStart[Authentication]
        AuthStart --> A1[Load service account credentials]
        A1 --> A2[Create GoogleCredentials]
        A2 --> A3[Create BigQuery client]
        A3 --> A3Check{Success?}
        A3Check -->|No| A3Fail[ERROR: Exit]
        A3Check -->|Yes| DatasetCheck[Dataset Validation]
        DatasetCheck --> D1[checkBigQueryDatasetExist]
        D1 --> D1Check{Dataset exists?}
        D1Check -->|Yes| D1Skip[Dataset OK]
        D1Check -->|No| D2[createBigQueryDataset]
        D2 --> D2Check{Created?}
        D2Check -->|No| D2Fail[ERROR: Exit]
        D2Check -->|Yes| D1Skip
        D1Skip --> TableCheck[Table Validation]
        TableCheck --> T1[checkBigQueryTableExist]
        T1 --> T1Check{Table exists?}
        T1Check -->|Yes| T1Skip[Table OK]
        T1Check -->|No| T2[createBigQueryTable with schema]
        T2 --> T2Check{Created?}
        T2Check -->|No| T2Fail[ERROR: Exit]
        T2Check -->|Yes| T1Skip
        T1Skip --> CSVParse[CSV Parsing]
        CSVParse --> CSV1[Create CSVParser]
        CSV1 --> CSV2[Create BufferedReader]
        CSV2 --> CSV3[Create CSVReader]
        CSV3 --> CSV4[Initialize List RowToInsert]
        CSV4 --> CSV5[Read header row]
        CSV5 --> CSV5Check{Header exists?}
        CSV5Check -->|No| CSV5Fail[ERROR: Empty CSV]
        CSV5Check -->|Yes| CSV6[Skip header row]
        CSV6 --> CSV7[Loop: Read next row]
        CSV7 --> CSV8{More rows?}
        CSV8 -->|Yes| CSV9[Create Map]
        CSV9 --> CSV10[Map headers to values]
        CSV10 --> CSV11[Create RowToInsert]
        CSV11 --> CSV7
        CSV8 -->|No| WriteData[Write to BigQuery]
        WriteData --> W1[writeBigQueryTableRows]
        W1 --> W2[Create InsertAllRequest]
        W2 --> W3[Execute batch insert]
        W3 --> W4[Monitor job]
        W4 --> W4Check{Success?}
        W4Check -->|No| W4Fail[ERROR: Insert failed]
        W4Check -->|Yes| W5[Upload Complete]
        W5 --> Complete[Return]
    end
    Complete --> Archive[Call archiveTestStepsCSV]
    subgraph ArchiveFlow["archiveTestStepsCSV() Method"]
        Arch1[Start Archive] --> Arch2[Get config paths]
        Arch2 --> Arch3[Generate timestamp]
        Arch3 --> Arch4[Create archived filename]
        Arch4 --> Arch5[Check CSV exists]
        Arch5 --> Arch5Check{Exists?}
        Arch5Check -->|No| Arch5Fail[WARN: Not found]
        Arch5Check -->|Yes| Arch6[Move to archive]
        Arch6 --> Arch6Check{Success?}
        Arch6Check -->|No| Arch6Fail[ERROR: Move failed]
        Arch6Check -->|Yes| Arch7[Archive Complete]
        Arch7 --> Done[Return]
    end
    Archive --> End([Pipeline Complete])
    Skip --> End
    style Start fill:#e1f5ff
    style End fill:#e1ffe1
    style Main fill:#fff3e1
    style V1Fail fill:#ffcccc
    style V2Fail fill:#ffcccc
    style V3Fail fill:#ffcccc
    style V4Fail fill:#ffcccc
    style V5Fail fill:#ffcccc
    style V6Fail fill:#ffcccc
    style V7Fail fill:#ffcccc
    style F1Fail fill:#ffcccc
    style F2Fail fill:#ffcccc
    style F3Fail fill:#ffcccc
    style S1Fail fill:#ffcccc
    style A3Fail fill:#ffcccc
    style D2Fail fill:#ffcccc
    style T2Fail fill:#ffcccc
    style CSV5Fail fill:#ffcccc
    style W4Fail fill:#ffcccc

Configuration Requirements

Required Properties (testConfig.properties)

# Google Cloud Project Configuration
googleCloudProjectId=project-mw-automation-v01

# Service Account Credentials
googleCloudServiceAccountJsonFile=src/test/resources/.private/project-*.json

# BigQuery Dataset & Tables
googleCloudProjectDataset=dataset_automation_01
googleCloudProjectDatasetTableTestSteps=table_teststeps_02
googleCloudProjectDatasetTableTestStepsSchema=src/test/java/framework/data/FW_GoogleCloudBigQueryTableTestStepsSchema.json

# CSV Data Files
dataOutputPath=target/data/
dataTestStepsFilename=TestStepsData.csv
dataArchivePath=target/data/archive/

# Export Control
googleCloudBigQueryExport=true

# Performance Settings
googleCloudBigQueryTimeout=10
googleCloudBigQueryMaxAttempts=10

Required Files

FilePurposeLocation
CSV Data FileTest step resultstarget/data/TestStepsData.csv
Schema FileBigQuery table definitionsrc/test/java/framework/data/FW_GoogleCloudBigQueryTableTestStepsSchema.json
Service AccountGCP authenticationsrc/test/resources/.private/*.json
Properties FileConfigurationsrc/test/resources/testConfig.properties

Schema Structure

The BigQuery table schema includes 26 columns for comprehensive test step tracking:

Test Identification (11 columns)

  • clientRefID, projectRefID, applicationRefID - Reference IDs (STRING, REQUIRED)
  • testRunRefID, testSuiteRefID, testCaseRefID, testStepRefID - Test hierarchy IDs
  • testSuiteName, testCaseName - Display names (STRING, NULLABLE)
  • testCaseRepetition, testCaseRepetitions - Repetition tracking (INTEGER)

Test Execution (11 columns)

  • testStepTimestamp - Execution timestamp (TIMESTAMP, REQUIRED) [Partition Column]
  • testStepRefOrder - Step order, testStepPageObject - Page object, testStepMethod - Method
  • testStepResult - Pass/Fail, testStepResultDetails - Details
  • testStepResultMetricKey, testStepResultMetricNumber, testStepResultMetricString - Metrics

Assets & Performance (4 columns)

  • testStepScreenshotRef, testStepScreenshotThumbRef - Screenshot URLs
  • testStepDurationHR (seconds), testStepDurationMS (milliseconds) - Duration

Error Handling

Comprehensive error handling for 15+ scenarios including: configuration errors, file errors, authentication errors, dataset/table errors, parse errors, upload errors, and archive errors.

Related Components

  • FW_GoogleCloudBigQueryUploader - Main orchestration class
  • FW_GoogleCloudUtils - Reusable Google Cloud utility methods
  • FW_TestStepDataProcessor - Generates CSV file during test execution
  • FW_GoogleCloudStorageUploader - Upload screenshots to Cloud Storage

Related Documentation

DocumentDescription
Data Collection FlowHow CSV is generated
Test Assets ScannerTest discovery process
Maven to Test CaseComplete test execution flow
Framework OverviewArchitecture overview