About

Summary

I’m a Chief Technologist. 25 years in tech. I build products and I build teams that ship continuously.

What I Do

In the 25 years of working in tech, I have been a Developer, a Manager, a Director, a CTO. I am a Chief Technologist now (without an O) so I create.

I envision, design, and build products and services. I build and run and grow teams that I enjoy building things together with.

I continuously learn new things. I was a very good Java dev and a very good C# dev. Now I am a very good TypeScript dev, a very good AWS serverless architect, and a very good AI engineer. I am sure I can be a very good Python dev and a very good Rust dev and a very good <Name Your Tech> specialist if there is a problem that excites me that is best solved with the technology I have not yet mastered.

How I Think

I started a YouTube channel - Coding with AI - where I record how I use AI in my day to day work. I read how other people do it too. When I read articles like this one, where developers say that “…we handed these responsibilities to specialists - Product Owners, Architects, Quality Engineers, Platform Engineers…“, I think “yea, but why so much specialization in engineering?“. After 20 years in services, I have been building products for the last 5. All my teams are engineers. We do not really role play when it comes to engineering. I have product owners and designers and even a few delivery managers but when it comes to technology, we are all engineers.

I make technology choices that start not with what’s best in an abstract, but with what will allow me and my teams to work in the most efficient way. My teams deliver continuously. Every feature branch gets an integrated environment auto-provisioned on CI for early integration and testing. All products and platforms and services that I built with my teams in the last 5 years run in AWS native serverless - CloudFront, API Gateway, Lambda and Lambda@Edge, Step Functions, SQS, SNS, S3, EventBridge, DynamoDB (single table design), OpenSearch, Athena, Fargate (long-running workloads).

Everything is as-code and in the same language. Our UIs are all React (TypeScript). Our APIs are all node.js (TypeScript), our infrastructure is all AWS CDK (TypeScript). There is some Python for our Glue jobs and a few more special-purpose tech alternatives but the vast majority is TypeScript. API devs own infrastructure. I have one infrastructure/platform specialist who helped us build the abstractions and the libraries, who did all the networking setup, who consults us on all the things AWS nitty gritty details, and who polices our IAM policies. There are no fences to throw things over and that means the speed is only limited by how efficient the engineer is at solving the problem at hand.

I love to create and the best way to create is with the least amount of friction so my teams automate all the things and we only reach for a process solution when we (the engineers) know that we need one. And that also means that we welcome all the AI assistance we can get as it just makes us all move faster.

How I Work

I know of only one way to build highly functional well-performing teams - enable them and work with them. It is a cliche but I lead by example. I roll up my sleeves and I work side by side with my teams. My first order of priority is always to make sure that my teams have work to do, know what is expected of them, and are not blocked. I only get my hands dirty with tech when I know I have done everything that my teams need.

What I’ve Built

Three things that I am extremely proud of from the last five years:

  1. api.epam.com.

The kind of an API platform that every company I spoke to wants to have. All our systems stream their data and their events into one big Kafka cluster. Systems can consume it by becoming kafka consumers but that typically means ingesting all of the things you need and building your own small data platform. That’s expensive and time consuming and people will and do make mistakes. I decided to instead ingest it once and expose it via a generic set of APIs with access control, real-time change detection, and self-service console. Sub-half-second performance at a pretty decent scale.

Technically, this is metadata-driven multi-topic kafka consumers running in ECS Fargate and streaming data into OpenSearch (for API access) and into DynamoDB (for change-data-capture change detection). Our /search API supports RSQL but we also offer a /query API for people to run advanced OpenSearch DSL queries. We cache at CDN with a per-dataset cache policy. Change detection allows consumers to subscribe to the events that interest them (JMESPath is our language of choice for rules) and we can do direct SQS delivery for consumers in AWS as well as webhooks for everybody else. We even use DNS as our edge database for one specific use case.

  1. Dataheart

Real-time serverless analytics platform. Instead of running a very capable but somewhat heavy Databricks, we instead built serverless pipelines to stream the data from the same big Kafka cluster into S3. Lambda pipelines work on all the data as it comes in and create projections. All of it is query-able in Athena at any stage. Data products are built in Glue and exported as Parquet. We stream the data out back to the big Kafka cluster, as well as BigQuery and other destinations. The most interesting part is the data quality and data correction layer. When a DQ test highlights a data problem or an inconsistency, one can write a correction rule and target a record by a key and/or offset, by values patterns, and more. Records then are corrected as they stream through. You can always re-stream fully or partially to update your corrected projections every time you update your DQ rules set.

Someone on my team has built the Dataheart AI - Talk To Your Data experience - on top of the Dataheart platform with the help of the dbt semantic layer.

  1. Documents Intelligence

LLM-first multi-tenant knowledge management system. Each tenant has entities that are described by a collection of heterogeneous documents. For example, the Contracts tenant brings the contracts (SOWs) with their addendums and MSAs that govern them plus emails about them (aka whatever we have in the CRM). The Projects tenant brings experience that people left in their profiles when they described their past work. Presales Archive brings the RFP packages that the clients submitted and the responses that our sales and technical teams produced. These are docs, excels, power points, emails, images, etc. We ingest them and run tenant-specific pre-processing on them: facts and attributes, summarization, structured and unstructured extractions. We build and embed search-oriented summaries and synopses. All to enable a cross-tenant conversational knowledge search and research. All data ingest and AI pipelines run in concurrency-controlled step functions.

This is not your typical RAG. We are not building a simple search engine for all your documents. We are building a domain specific knowledge discovery system that is intimately aware of what it’s reading and how to present the results back to the users. We extract images from documents and re-inject them into the summaries and chat answers where it’s relevant.

We have a lot more coming up very soon. This is a very young product with a lot of potential.

Get In Touch

If this resonates with you, I’d love to connect. Email me at pveller@gmail.com

Author

Pavel Veller

Posted on

June 1, 2025

Updated on

June 11, 2025

Licensed under