Portfolio

ML Utility Library

GitHub Repository: https://github.com/FutureAdLabs/python-mlops/tree/alpha

Introduction

This Python-based utility project was developed to provide the data science team with a comprehensive set of tools to streamline their workflows and enhance productivity. The goal was to create a library that offered custom tools for preparing environments, managing dependencies, and integrating with various external services, including common databases like MongoDB and MySQL. This utility library addresses common tasks and challenges faced by the team, ensuring consistency and efficiency across different projects.

Submodules

- AWS  
  Tools for managing AWS resources, including EC2, S3, and EMR clusters, facilitating the team's ability to leverage cloud infrastructure for data analysis, storage, and deployment tasks.

- Conda
  Scripts for managing Conda environments, ensuring consistent and reproducible setups across projects by handling dependencies efficiently.

- Git
  Tools to automate common Git tasks and maintain version control best practices, streamlining the development workflow.

- Google
  Integrations with Google APIs like Google Sheets, automating data retrieval and collaboration with seamless access to Google services.

- LLM (Language Models) 
  Integrations with language models (e.g., OpenAI), enabling advanced language processing tasks such as natural language understanding and text generation.

- Local Environment 
  Tools for managing local development environments, mirroring production setups to streamline development.

- Native Utilities
  Functions for common tasks like date-time operations, file handling, and data parsing, ensuring code reusability and reliability.

- Slack 
  Functions for interacting with Slack, enabling communication and notifications directly within workflows.

- Database
  Interfaces for MongoDB and MySQL to streamline common database operations like queries, transactions, and data migrations, enhancing data handling efficiency.

Outcome

This project has proven to be a valuable asset, streamlining Python workflows, reducing setup times, and enabling seamless integration with various services. By following best practices and integrating industry-standard tools, I created a robust and scalable utility library that meets the diverse needs of our data science projects. This library promotes collaboration and consistency across the team, leading to more efficient project delivery.

Real Time Bidding Algorithm

GitHub Repository: https://github.com/FutureAdLabs/ttd-bid-algo

Introduction

The objective of this project was to build a Machine Learning algorithm using an innovative statistical approach to score advertisement spaces probabilistically and translate these scores into bid values. Factors such as audience appropriateness, client budget, market trends, and ad space performance were considered. This project aims to facilitate real-time bidding for ad inventory, directly impacting financial efficiency in programmatic advertising.

Data Engineering

The data engineering phase involved collecting large datasets from a demand-side platform, using Python, SQL, and PySpark. While Snowflake was tested, we opted for AWS Athena for its robustness in processing BigQuery data. Data engineering was critical for building reliable and accurate models.

Machine Learning Model Build

Models were built based on probability distributions for success and cost, focusing on Return on Ad Spend (ROAS). The models dynamically identified appropriate bid values while adjusting to market conditions, ensuring financial optimization of ad campaigns.

MLOps

MLOps practices were employed to validate models, version them, and monitor their performance using MLFlow. Each module generated artifacts to ensure robust and reliable deployments, which were essential for managing significant advertising budgets effectively.

Outcome

This project became the core component of Lorenzo, Adludio’s AI-driven platform (Lorenzo AI). It delivers highly targeted, high-performance advertising at scale, and continues to evolve with new business needs, driving successful programmatic advertising campaigns globally.

CI/CD

GitHub Repository: https://github.com/FutureAdLabs/data-science-cicd/tree/alpha

Introduction

This CI/CD project became my focus when I was promoted to Senior Engineer, as the company began to grow. Recognizing the need for improved collaboration and efficiency, I led the team toward adopting industry-standard coding practices. I then integrated these practices with segregated environment setups and a seamless deployment pipeline to production.

The primary goal was to establish proper packaging and versioning for in-house libraries. Each versioned project was deployed into AWS using CloudFormation. This custom design enabled every version to be managed through CloudFormation templates, making integration with peripheral dependencies like AWS Lambda and AWS ECR seamless.

Package Management

Each project version was meticulously tagged using semantic versioning and stored in a private package management server, ensuring that sensitive data and dependencies were secured. This was a key part of the CI/CD process, maintaining security and integrity across all deployments.

Environment Setup

Each project had its own isolated environment, complete with resources like EC2 instances and AWS EMR clusters. Infrastructure, including databases and other dependencies, was automatically built upon code deployment, allowing projects to scale independently based on their specific requirements.

Testing and Deployment

The CI/CD pipeline was built on CircleCI, automating testing and deployment:

  - **Automated Testing**  
    I integrated continuous testing using tools like PyTest, ensuring that code quality was maintained and bugs were caught early.

  - **Linting and Code Quality Checks**  
    Tools like Coala were used to enforce coding standards and ensure high-quality code.

  - **Automated Deployment**  
    Deployment was automated using AWS services and CloudFormation, ensuring consistency and reliability in applying infrastructure changes.

Outcome

This project became the backbone of my CI/CD operations and has significantly improved deployment efficiency and production stability. The pipeline is now a benchmark for my team, facilitating cross-team collaboration and improving our overall development processes. By integrating industry-standard tools and practices, I built a robust and scalable CI/CD pipeline that meets the needs of our growing company.

TTD Orchestrator

GitHub Repository: https://github.com/FutureAdLabs/python-ttd-orchestrator

Introduction

This TTD campaign orchestration library was developed to automate the process of managing campaigns on The Trade Desk (TTD) platform. Initially, a dedicated team manually handled these tasks, but the goal of this project was to identify every human workflow and ensure it was flawlessly automated.

Around this time, Language Models (LLMs) became available and were incorporated into the design to manage the human component of the tasks, enhancing the automation process. The primary objective was to provide the data science and marketing teams with powerful tools to create, monitor, and optimize ad campaigns efficiently, eliminating human intervention.

Submodules

- Monitor  
  Uses BigQuery to collect large amounts of data periodically and focuses on monitoring and analyzing the performance of both ad groups and campaigns. Key components include:
  - Data Analysis: Reviews ad space performance, providing insights into which ad spaces yield the best campaign results.
  - Pacing Strategies: Ensures that pacing strategies align with campaign goals without overspending, helping campaigns stay within budget while achieving their objectives.

- Planner
  Responsible for planning future campaign strategies. This submodule includes:
  - Blocklist Strategies: Develops blocklist strategies at the ad group level to exclude low-performing or harmful ad spaces.
  - Bidding Strategies: Designs optimized bidding strategies based on historical data and projected performance to maximize return on ad spend (ROAS).
  - Pacing Strategies: Plans pacing strategies to ensure steady and effective use of the budget throughout the campaign period.

- Executor
  Manages strategy execution at both ad group and campaign levels, ensuring that all planned actions are carried out accurately and effectively.  
  - Strategy Implementation: Executes the planned strategies on The Trade Desk platform.

Technology Stack

Language Models (LLMs): Automates human-like decision-making processes.
BigQuery: Used for extensive data analysis and performance monitoring.
MongoDB: Handles real-time data management and campaign health monitoring.
MySQL: Utilized for structured data handling in pre-release environments.
The Trade Desk Platform: The primary platform targeted by the orchestration library.

This comprehensive suite of tools ensures that every aspect of campaign orchestration, from planning to execution, is automated with a high degree of precision and efficiency.

Outcome

The development of this TTD campaign orchestration library has dramatically improved the efficiency and effectiveness of managing ad campaigns on The Trade Desk platform. By automating campaign monitoring, planning, and execution, the library reduces manual effort and enables data-driven decision-making. This initiative also led to the creation of AdSapiens (https://www.adsapiens.com/about).

The project has proven to be a valuable asset, streamlining workflows, eliminating human intervention, and empowering strategic planning and optimization. By incorporating LLMs and automating campaign management processes, I created a robust and scalable orchestration library that meets the diverse needs of our marketing and data science teams, leading to more efficient and successful ad campaigns.