Data Pipeline Modernization

Background:

Curt Larson, the Principal Engineer at our organization, undertook a critical project for a large regional bank as a Consultant specializing in Software Development and Data Engineering. The bank was facing challenges related to legacy systems, and the need for modernization in their data processing infrastructure.

Challenges:

- Finding a well maintained open source library for working with EBCDIC and Copybook files.

- Error handling edge cases in the EBCDIC data files.

- Ensuring compatibility with existing mainframe processes.

Curt's Approach:

Curt collaborated closely with the bank's mainframe team to ensure that the new data pipeline mirrored the functionality of the existing mainframe pipeline. To achieve this, he chose to implement the data pipeline using an Apache Spark library called Cobrix, leveraging the bank's existing Spark cluster.  Cobrix has many features for reading different EBCDIC formats, and it is able to process very large files efficiently using Spark.

Results:

- Successful creation of a POC data pipeline for EBCDIC file loading into Teradata.

- Successfully validated the feasibility of EBCDIC ETL jobs on the Linux platform.

- Close alignment with existing mainframe processes.

- The POC was a resounding success, paving the way for the bank to move forward with the modernization effort.

Conclusion:

Curt Larson's exceptional expertise in software development and data engineering proved invaluable to the large regional bank. His contributions in modernizing the data pipeline demonstrated his problem-solving skills, technical prowess, and dedication to ensuring the bank's operational efficiency and security. The bank is now well-positioned to embrace modernization with confidence, thanks to Curt's efforts and expertise.