DNS as your Edge database

Modern computing has gone a long way. Elastic architectures have become commodities. With platforms like AWS and all its serverless offerings, you can build very reliable and very scalable systems. We learned to push static content very close to the end users thanks to the proliferation of CDNs. We then learned to run compute at the edge as well. One thing we still can’t really do effectively is push data to the edge.

What if I told you that you could use DNS? I didn’t come up with the idea. I’ve read about it here some time ago and when I had a problem that sounded like - “how do I get my data closer to the edge” - I remembered that blog post and I decided to try and do it.

An important caveat first. The problem I was solving is not a typical OLTP data problem. You are very unlikely to actually be able to replace a database with DNS using the approach I will present here. You can, however, deliver a fairly stable (and fairly small) dataset to the edge ang have low single to double digit milliseconds response time reading the data from anywhere in the world.

In a Nutshell

At a glance, the architecture looks like this:

Architecture

  • #1 is a lambda function that is basically a CRUD API used by our developer portal to provision and manage your API access
  • #2 is the main DynamoDB table. The source of record for all API keys metadata
  • #3 is the stream enabled on the DynamoDB table to stream out any changes
  • #4 is a lambda function subscribed to the stream. Depending on the event captured, it will create, update, or delete a DNS replica using one TXT record per key
  • #5 the viewer-request can now dig DNS TXT record to quickly check if the API key is valid and has access to the requested API

Want to know more and look at the code? Please head over to our Anywhere Engineering blog and read the full article


Till next time!

Async Recursion with backoff

It’s been a while since I published anything. More than three years! A lot of things happened since then. The most relevant to mention in the beginning of this post is that I have been super busy building a lot of cool tech with a very talented team here at EPAM Anywhere. We are doing full-stack Typescript with next.js and native AWS serverless services and can’t get enough of it. This experience has been challenging me to learn new things every day and I have a lot to share!

When you work with AWS, you will certainly use aws-sdk and APIs of different services. Need to send a message to an SQS queue? That’s an HTTP API call and you will use sdk. Need to update a document in DynamoDB? The same. Need to push a message to the Firehose? The same. Many of these APIs have their batch equivalents:

These batch APIs will throw if something fundamental is wrong. Say your auth is not good or you don’t have enough permissions or you don’t have the connectivity to the service. If the sdk connected successfully to the service but failed to perform some or all of the operations in your batch, the operation won’t throw. It will return an object that tells you which operations succeeded and which ones failed. The most likely reason to get partial failures is due to throttling. All of these APIs have soft and hard limits and sooner or later you will attempt to do more than AWS feels comfortable letting you get away with.

We learned it the hard way. It’s all documented, obviously, but things like this one are only obvious in hindsight. Let me show you a neat technique to batch safely:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
interface DynamoBatchDeliveryOptions {
readonly db: DocumentClient;
readonly table: string;
readonly retries?: number;
readonly backoff?: BackoffStrategy;
}

/**
* Perform batch write/delete to the DynamoDB. All records that failed will be retried up to five times with a backoff.
* The method will throw if we failed to deliver the batch after the specified number of retries. Default is 5
*
* This factory method returns a reusable function that you can use over and over again if you are sending batches to the same `table`.
*
* **NOTE** We are not checking any limits imposed by AWS/Dynamo. As of the time of this writing:
* - 25 requests per batch
* - max document size is 400Kb including attribute name lengths
* - the total size of items written cannot exceed 16Mb (400x25 is less btw)
*
* @param db instantiated document client
* @param table the name of the DynamoDB table to work with
* @param retries how many times to retry. default is 5
* @param backoff backoff strategy that can be calculated based on the attempt number. the default is 100ms * attempt
* @returns the reusable function that you call with your batch details
*/
export const deliverBatchToDynamo =
({ db, table, retries = 5, backoff = attemptTimesOneHundredMs }: DynamoBatchDeliveryOptions) =>
(batch: DocumentClient.WriteRequests): Promise<void> => {
const run = async (batch: DocumentClient.WriteRequests, attempt: number): Promise<void> => {
if (attempt > retries) {
throw new Error(`Failed to deliver batch after ${attempt} attempts`);
}

const { UnprocessedItems } = await db.batchWrite({ RequestItems: { [table]: batch } }).promise();

if (UnprocessedItems) {
const retry = UnprocessedItems[table];
if (retry?.length) {
return new Promise((resolve) => setTimeout(() => resolve(run(retry, attempt + 1)), backoff(attempt)));
}
}
};

return run(batch, 1);
};

If you want to learn more about the technique, please checkout the full version of this post on our engineering blog: https://anywhere.epam.com/en/blog/async-recursion-with-backoff.

Till next time!

Lambda@Edge with serverless

Lambda@Edge allows you to run lambda functions in response to CloudFront events. In order to use a lambda function with CloudFront, you need to make sure that your function can assume edgelambda identity. I want to show you an easy way to do it with serverless.

Missing identity

As of right now, serverless framework has no native support for lambda@edge. There is a plugin though that allows you to associate your lambda functions with a CloudFront distribution’s behavior.

The plugin works great if you deploy and control both your lambda functions and its associations with the CloudFront distributions. You might, however, be deploying a global function that is to be used by different teams on different distributions. Here’s a good example - a function that supports redirecting / to /index.html deeper in the URL hierarchy than the site root.

Serverless allows you to define additional IAM role statements in iamRoleStatements block but doesn’t seem to have a shortcut for the iamRoleLambdaExecution. You can certainly configure your own custom IAM::Role but that’s a pretty involved excercise if all you want to achieve is this:

Lambda@Edge Identity in IAM

Easy way out

If you don’t define your own IAM::Role, serverless will create one for you. The easiest way to see how it looks is to run sls package, look inside your .serverless folder, and inspect the CloudFormation JSON that will orchestrate your deployment. Look for IamRoleLambdaExecution in the Resources group.

Serverless carries a template that it uses as a starting point to build the role definition. The good news is that serverless merges it into the list of other resources that you might have defined in your serverless.yml. Take a look at the code if you want to see how it does it.

The name of the roles seems to always default to IamRoleLambdaExecution (here and here). Knowing how lodash’s merge works, all we need to do now is to give our resources definition a little boost.

In my serverless.yml:

1
2
3
4
5
6
7
8
9
10
11
Resources:
IamRoleLambdaExecution:
Type: "AWS::IAM::Role"
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
- edgelambda.amazonaws.com

And that’s it. Serverless will merge its template over this structure and will keep the edgelambda principal in there. Enjoy!