BigQuery Upload Process
Complete post-test data pipeline from CSV to Google BigQuery
Introduction
This diagram documents the complete data pipeline that transfers test execution results from local CSV files to Google BigQuery for long-term storage, analytics, and reporting. The pipeline is automatically triggered by Maven's post-integration-test phase after all tests complete.
Pipeline Stages
- Maven trigger and configuration validation
- Google Cloud authentication
- Dataset and table validation/creation
- CSV parsing and data collection (OpenCSV)
- Batch insertion to BigQuery
- CSV file archiving
Key Process Steps
1. Maven Trigger
Phase: post-integration-test
Execution: Automatically triggered after test completion
Entry Point: FW_GoogleCloudBigQueryUploader.main()
2. Configuration Validation
Validates all required properties from testConfig.properties: Google Cloud Project ID, Service account credentials file, Dataset and table names, Schema file location, CSV file paths.
3. Authentication
Loads service account JSON credentials, creates authenticated BigQuery client, verifies client connectivity.
4. Dataset & Table Validation
Checks if dataset exists (creates if missing), checks if table exists (creates with schema if missing), validates table schema against JSON definition.
5. CSV Processing
Parses CSV using OpenCSV library, reads header row (skipped - already in schema), collects data rows into List<RowToInsert>, maps column names to values.
6. Data Upload
Batch insertion via FW_GoogleCloudUtils.writeBigQueryTableRows(), monitors job completion, handles timeout and retry logic.
7. File Archiving
Moves processed CSV to archive folder, appends timestamp to filename, confirms successful archive.
Complete Flow Diagram
flowchart TD
Start([Maven Build<br/>post-integration-test]) --> CheckFlag{googleCloudBigQueryExport<br/>= true?}
CheckFlag -->|No| Skip[Skip Export]
CheckFlag -->|Yes| Main[FW_GoogleCloudBigQueryUploader<br/>main method]
Main --> Upload[Call uploadTestStepsCSV]
subgraph UploadFlow["uploadTestStepsCSV() Method"]
U1[Start Upload Process]
U1 --> V1[Get googleCloudProjectId]
V1 --> V1Check{Valid?}
V1Check -->|No| V1Fail[ERROR: Exit]
V1Check -->|Yes| V2[Get googleCloudServiceAccountJsonFile]
V2 --> V2Check{Valid?}
V2Check -->|No| V2Fail[ERROR: Exit]
V2Check -->|Yes| V3[Get googleCloudProjectDataset]
V3 --> V3Check{Valid?}
V3Check -->|No| V3Fail[ERROR: Exit]
V3Check -->|Yes| V4[Get googleCloudProjectDatasetTableTestSteps]
V4 --> V4Check{Valid?}
V4Check -->|No| V4Fail[ERROR: Exit]
V4Check -->|Yes| V5[Get googleCloudProjectDatasetTableTestStepsSchema]
V5 --> V5Check{Valid?}
V5Check -->|No| V5Fail[ERROR: Exit]
V5Check -->|Yes| V6[Get dataOutputPath]
V6 --> V6Check{Valid?}
V6Check -->|No| V6Fail[ERROR: Exit]
V6Check -->|Yes| V7[Get dataTestStepsFilename]
V7 --> V7Check{Valid?}
V7Check -->|No| V7Fail[ERROR: Exit]
V7Check -->|Yes| FileCheck[File Validation]
FileCheck --> F1[Check CSV file exists]
F1 --> F1Check{Exists?}
F1Check -->|No| F1Fail[ERROR: Exit]
F1Check -->|Yes| F2[Check schema file exists]
F2 --> F2Check{Exists?}
F2Check -->|No| F2Fail[ERROR: Exit]
F2Check -->|Yes| F3[Check service account file exists]
F3 --> F3Check{Exists?}
F3Check -->|No| F3Fail[ERROR: Exit]
F3Check -->|Yes| ReadSchema[Read Schema]
ReadSchema --> S1[FW_GoogleCloudUtils.readSchemaFromFile]
S1 --> S1Check{Success?}
S1Check -->|No| S1Fail[ERROR: Exit]
S1Check -->|Yes| AuthStart[Authentication]
AuthStart --> A1[Load service account credentials]
A1 --> A2[Create GoogleCredentials]
A2 --> A3[Create BigQuery client]
A3 --> A3Check{Success?}
A3Check -->|No| A3Fail[ERROR: Exit]
A3Check -->|Yes| DatasetCheck[Dataset Validation]
DatasetCheck --> D1[checkBigQueryDatasetExist]
D1 --> D1Check{Dataset exists?}
D1Check -->|Yes| D1Skip[Dataset OK]
D1Check -->|No| D2[createBigQueryDataset]
D2 --> D2Check{Created?}
D2Check -->|No| D2Fail[ERROR: Exit]
D2Check -->|Yes| D1Skip
D1Skip --> TableCheck[Table Validation]
TableCheck --> T1[checkBigQueryTableExist]
T1 --> T1Check{Table exists?}
T1Check -->|Yes| T1Skip[Table OK]
T1Check -->|No| T2[createBigQueryTable with schema]
T2 --> T2Check{Created?}
T2Check -->|No| T2Fail[ERROR: Exit]
T2Check -->|Yes| T1Skip
T1Skip --> CSVParse[CSV Parsing]
CSVParse --> CSV1[Create CSVParser]
CSV1 --> CSV2[Create BufferedReader]
CSV2 --> CSV3[Create CSVReader]
CSV3 --> CSV4[Initialize List RowToInsert]
CSV4 --> CSV5[Read header row]
CSV5 --> CSV5Check{Header exists?}
CSV5Check -->|No| CSV5Fail[ERROR: Empty CSV]
CSV5Check -->|Yes| CSV6[Skip header row]
CSV6 --> CSV7[Loop: Read next row]
CSV7 --> CSV8{More rows?}
CSV8 -->|Yes| CSV9[Create Map]
CSV9 --> CSV10[Map headers to values]
CSV10 --> CSV11[Create RowToInsert]
CSV11 --> CSV7
CSV8 -->|No| WriteData[Write to BigQuery]
WriteData --> W1[writeBigQueryTableRows]
W1 --> W2[Create InsertAllRequest]
W2 --> W3[Execute batch insert]
W3 --> W4[Monitor job]
W4 --> W4Check{Success?}
W4Check -->|No| W4Fail[ERROR: Insert failed]
W4Check -->|Yes| W5[Upload Complete]
W5 --> Complete[Return]
end
Complete --> Archive[Call archiveTestStepsCSV]
subgraph ArchiveFlow["archiveTestStepsCSV() Method"]
Arch1[Start Archive] --> Arch2[Get config paths]
Arch2 --> Arch3[Generate timestamp]
Arch3 --> Arch4[Create archived filename]
Arch4 --> Arch5[Check CSV exists]
Arch5 --> Arch5Check{Exists?}
Arch5Check -->|No| Arch5Fail[WARN: Not found]
Arch5Check -->|Yes| Arch6[Move to archive]
Arch6 --> Arch6Check{Success?}
Arch6Check -->|No| Arch6Fail[ERROR: Move failed]
Arch6Check -->|Yes| Arch7[Archive Complete]
Arch7 --> Done[Return]
end
Archive --> End([Pipeline Complete])
Skip --> End
style Start fill:#e1f5ff
style End fill:#e1ffe1
style Main fill:#fff3e1
style V1Fail fill:#ffcccc
style V2Fail fill:#ffcccc
style V3Fail fill:#ffcccc
style V4Fail fill:#ffcccc
style V5Fail fill:#ffcccc
style V6Fail fill:#ffcccc
style V7Fail fill:#ffcccc
style F1Fail fill:#ffcccc
style F2Fail fill:#ffcccc
style F3Fail fill:#ffcccc
style S1Fail fill:#ffcccc
style A3Fail fill:#ffcccc
style D2Fail fill:#ffcccc
style T2Fail fill:#ffcccc
style CSV5Fail fill:#ffcccc
style W4Fail fill:#ffcccc
Configuration Requirements
Required Properties (testConfig.properties)
# Google Cloud Project Configuration
googleCloudProjectId=project-mw-automation-v01
# Service Account Credentials
googleCloudServiceAccountJsonFile=src/test/resources/.private/project-*.json
# BigQuery Dataset & Tables
googleCloudProjectDataset=dataset_automation_01
googleCloudProjectDatasetTableTestSteps=table_teststeps_02
googleCloudProjectDatasetTableTestStepsSchema=src/test/java/framework/data/FW_GoogleCloudBigQueryTableTestStepsSchema.json
# CSV Data Files
dataOutputPath=target/data/
dataTestStepsFilename=TestStepsData.csv
dataArchivePath=target/data/archive/
# Export Control
googleCloudBigQueryExport=true
# Performance Settings
googleCloudBigQueryTimeout=10
googleCloudBigQueryMaxAttempts=10Required Files
| File | Purpose | Location |
|---|---|---|
| CSV Data File | Test step results | target/data/TestStepsData.csv |
| Schema File | BigQuery table definition | src/test/java/framework/data/FW_GoogleCloudBigQueryTableTestStepsSchema.json |
| Service Account | GCP authentication | src/test/resources/.private/*.json |
| Properties File | Configuration | src/test/resources/testConfig.properties |
Schema Structure
The BigQuery table schema includes 26 columns for comprehensive test step tracking:
Test Identification (11 columns)
clientRefID,projectRefID,applicationRefID- Reference IDs (STRING, REQUIRED)testRunRefID,testSuiteRefID,testCaseRefID,testStepRefID- Test hierarchy IDstestSuiteName,testCaseName- Display names (STRING, NULLABLE)testCaseRepetition,testCaseRepetitions- Repetition tracking (INTEGER)
Test Execution (11 columns)
testStepTimestamp- Execution timestamp (TIMESTAMP, REQUIRED) [Partition Column]testStepRefOrder- Step order,testStepPageObject- Page object,testStepMethod- MethodtestStepResult- Pass/Fail,testStepResultDetails- DetailstestStepResultMetricKey,testStepResultMetricNumber,testStepResultMetricString- Metrics
Assets & Performance (4 columns)
testStepScreenshotRef,testStepScreenshotThumbRef- Screenshot URLstestStepDurationHR(seconds),testStepDurationMS(milliseconds) - Duration
Error Handling
Comprehensive error handling for 15+ scenarios including: configuration errors, file errors, authentication errors, dataset/table errors, parse errors, upload errors, and archive errors.
Related Components
- FW_GoogleCloudBigQueryUploader - Main orchestration class
- FW_GoogleCloudUtils - Reusable Google Cloud utility methods
- FW_TestStepDataProcessor - Generates CSV file during test execution
- FW_GoogleCloudStorageUploader - Upload screenshots to Cloud Storage
Related Documentation
| Document | Description |
|---|---|
| Data Collection Flow | How CSV is generated |
| Test Assets Scanner | Test discovery process |
| Maven to Test Case | Complete test execution flow |
| Framework Overview | Architecture overview |