Header

AWS Lambda (python) with packages through pip (in CloudFormation)

AWS Lambda (python) with packages through pip (in CloudFormation)

Note: originally posted on Medium, moved here when the blog moved in June 2021.

I absolutely love CloudFormation as a tool for creating small and large items on AWS. Having code-based infrastructure, of easily maintaining your system in git, seeing differences, etc is pure joy. There are however (many) times when CloudFormation (or AWS in general) seems to miss some things. In such cases, blogs like this one should help you :).

This document is here mostly for historical reference. There is a new and better method, that doesn’t have some of the security implications present in this post.

TL;DR scroll down to find a CloudFormation custom resource that builds Lambda Layers based on a list of pip packages.

If one wants to create a lambda function (in python — the lambda environment by now supports lots of languages, this blogpost deals with python only) there are 2 ways to do this in CloudFormation; either write the code directly into the CloudFormation yaml file, or upload a zipped archive to an S3 bucket and reference that. The first method is very nice, in that it’s self-contained; you don’t need any external tools to upload the code you just wrote to an S3 bucket, and you don’t need an S3 bucket (which supposedly you would need to make in another, previous, CloudFormation stack) — it does help to have a small tool to insert an external python file into the yaml, more on this on another blog post. The second method however has some major advantages, in that it supports up to 250MB of python code (whereas the first method is limited to 4096 bytes), and it supports multiple files (whereas the first method is limited to a single file). 4096 bytes is a serious limitation when writing a (readable and failure-resistant) python program, however it is doable for smaller functions (and these are exactly the lambdas for which you don’t want to be bothered to set up a system to upload them to S3, etc:)); actually, there seems to be a workaround for this, but we’ll get to that in a later post (update: later post is live). The second limitation, only allowing a single file, is also a large restriction, especially in those cases when we would like to include some library for pypi (or another repository).

In the python universe, most libraries are downloaded from pypi.org (PYthon Package Index), through the pip command. If you type pip install numpy, pip will contact pypi.org to find and download the numpy package, and then install it (build it if necessary). Python ships with a large standard library, but quite often the tools in there are just suboptimal for the job (as an example, the urllib package in the standard library is less robust than the requests package. The different xml.* packages are not protected against DOS attacks, whereas pypi provides defusedxml). In addition, tools like numpy, pillow, etc provide functionality that is hard to build yourself on the python runtime. In short, if you cannot use pypi packages, you’re very limited in what programs you can write. As a side note: The default AWS Lambda environment always contains the boto3 package, as well as (if you install your lambda through method 1) something called cfnresponse. It used to also include the requests package somewhere under boto3.vendored; this will be removed in some weeks though.

For years the only solution for using pypi packages on lambda was to install them locally, then zip them together with the lambda function, and upload them (for CloudFormation this means: adding them to S3, etc) — hoping that (for packages that compile) your compiled code is actually compatible with whatever AWS Lambda runs on. Two years ago AWS released Lambda Layers. You can make a lambda layer with whatever data, code, libraries you want, and include this in your lambda. It’s great that you could just write your simple, method 1, small lambda, and import a layer with, say, numpy, pillow, requests and abstractcp (shameless plug for my only pypi package :)). All you need is to create this layer…. Below a CloudFormation Custom Resource to do just that.

Solution

We create a Custom Resource that runs a lambda that calls pip (which luckily nowadays is available in the lambda environment). This will download and build packages for exactly the system and python version that the lambda runs on. It will then save the result as a lambda layer.

Packages can be specified using whatever syntax pip understands, so specifying specific versions, or ranges, all is (should be) allowed.

The example below produces a layer called mylayer with the required pypi packages installed. A lambda created later can use this layer by adding a property Layers: [!Ref MyLayer].

The !Ref MyLayer is actually the ARN of the layer version. Any change to the MyLayer resource (e.g. a new package is added, or the description is changed) will result in a new layerversion. CloudFormation notes that the ARN has changed, and will therefore also update any lambda using this layer to use the new layerversion (and afterwards call the PipLayer custom Resource with a Delete request for the old layer version).

# Note: this snippet needs to be placed in the Resources section of a CloudFormation template
# By default this will create a layer for the python version that the custom resource lambda is running in, so if you want
# a layer for python 3.6, just replace the runtime of the PipLayerLambda.
# This gist accomanies an article on my blog: https://blog.claude.nl/tech/howto/2021/02/10/aws-lambda-python-with-packages-through-pip-in-cloudformation.html
PipLayerLambdaRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Action:
- sts:AssumeRole
Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
Policies:
- PolicyDocument:
Version: 2012-10-17
Statement:
- Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Effect: Allow
Resource:
- !Sub arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/PipLayer-${AWS::StackName}:*
- Action:
- lambda:PublishLayerVersion
- lambda:DeleteLayerVersion
Effect: Allow
Resource:
- "*"
PolicyName: lambda
PipLayerLambda:
Type: AWS::Lambda::Function
Properties:
Description: Create layers based on pip
FunctionName: !Sub "PipLayer-${AWS::StackName}"
Handler: index.handler
MemorySize: 1024
Role: !GetAtt PipLayerLambdaRole.Arn
Runtime: python3.8
Timeout: 300
Code:
ZipFile: |
import json
import logging
import pathlib
import re
import subprocess
import sys
import tempfile
import typing as t
import shutil
import cfnresponse
import boto3
logger = logging.getLogger()
logger.setLevel(logging.INFO)
class PipLayerException(Exception):
pass
def _create(properties) -> t.Tuple[str, t.Mapping[str, str]]:
try:
layername = properties["LayerName"]
description = properties.get("Description", "PipLayer")
packages = properties["Packages"]
except KeyError as e:
raise PipLayerException("Missing parameter: %s" % e.args[0])
description += " ({})".format(", ".join(packages))
if not isinstance(layername, str):
raise PipLayerException("LayerName must be a string")
if not isinstance(description, str):
raise PipLayerException("Description must be a string")
if not isinstance(packages, list) or not all(isinstance(p, str) for p in packages):
raise PipLayerException("Packages must be a list of strings")
tempdir = pathlib.Path(tempfile.TemporaryDirectory().name) / "python"
try:
subprocess.check_call([
sys.executable, "-m", "pip", "install", *packages, "-t", tempdir])
except subprocess.CalledProcessError:
raise PipLayerException("Error while installing %s" % str(packages))
zipfilename = pathlib.Path(tempfile.NamedTemporaryFile(suffix=".zip").name)
shutil.make_archive(
zipfilename.with_suffix(""), format="zip", root_dir=tempdir.parent)
client = boto3.client("lambda")
layer = client.publish_layer_version(
LayerName=layername,
Description=description,
Content={"ZipFile": zipfilename.read_bytes()},
CompatibleRuntimes=["python%d.%d" % sys.version_info[:2]],
)
logger.info("Created layer %s", layer["LayerVersionArn"])
return (layer["LayerVersionArn"], {})
def _delete(physical_id):
match = re.fullmatch(
r"arn:aws:lambda:(?P<region>[^:]+):(?P<account>\d+):layer:"
r"(?P<layername>[^:]+):(?P<version_number>\d+)", physical_id)
if not match:
logger.warning("Cannot parse physical id %s, not deleting", physical_id)
return
layername = match.group("layername")
version_number = int(match.group("version_number"))
logger.info("Now deleting layer %s:%d", layername, version_number)
client = boto3.client("lambda")
deletion = client.delete_layer_version(
LayerName=layername,
VersionNumber=version_number)
logger.info("Done")
def handler(event, context):
logger.info('{"event": %s}', json.dumps(event))
try:
if event["RequestType"].upper() in ("CREATE", "UPDATE"):
# Note: treat UPDATE as CREATE; it will create a new physical ID,
# signalling CloudFormation that it's a replace and the old should be
# deleted
physicalId, attributes = _create(event["ResourceProperties"])
cfnresponse.send(
event=event,
context=context,
responseData=attributes,
responseStatus=cfnresponse.SUCCESS,
physicalResourceId=physicalId,
)
else:
assert event["RequestType"].upper() == "DELETE"
_delete(event["PhysicalResourceId"])
cfnresponse.send(
event=event,
context=context,
responseData={},
responseStatus=cfnresponse.SUCCESS,
physicalResourceId=event["PhysicalResourceId"],
)
except Exception as e:
logger.exception("Internal Error")
cfnresponse.send(
event=event,
context=context,
responseData=None,
responseStatus=cfnresponse.FAILED,
reason=str(e))
MyLayer:
Type: Custom::PipLayer
Properties:
ServiceToken: !GetAtt PipLayerLambda.Arn
Region: !Ref AWS::Region
LayerName: mylayer
Packages:
- numpy==1.20
- pandas>1.0
- abstractcp==0.9.6
- pillow
view raw piplayer.yaml hosted with ❤ by GitHub