Reference: DevOps, CICD, & IaC2022-02-15
These are my notes regarding software development related to DevOps, CI/CD, and IaC type concerns. I've written some variant of these notes for multiple companies, and I decided to post here for future reference. Much of this is constantly changing and even some of the below is becoming dated already.
The notes below include opinions and arbitrary nomenclature. These are meant only as guidelines to facilitate discussion. Every environment and team is different.
- Development Workstations
- Build Artifacts
- Versioning and Releases
- Git Repo Setup
- Code Formatting and Conventions
- Branching Patterns
- Continuous Integration(CI) and Continuous Deployment (CD)
- Infrastructure as Code (IaC)
- Local, Cloud, and Serverless
- Application Source vs IaC Source vs Pipeline Source
- Stages vs Environments vs Regions/Zones
- AWS Account Management
- Autoscaling, Metrics, and Logging
An all encompassing approach to build pipelines and environments organization that fits all types of projects is not possible. My notes here generally assume a project that :
- Is a SaaS web type project that can be deployed.
- Different pipelines are needed for desktop apps, mobile apps, firmware, reusable libraries, etc.
- Has a CI pipeline that will compile, run unit tests, assemble a complete package, and then deploy or upload that package somewhere that it can be accessed.
- Has some level of continuous deployment. It's project dependent as to how extensive a continuous deployment process can be.
- Has multiple feature branches being worked on simultaneously by multiple developers.
- Has deployments to PROD that will usually require a manual review and approval. (And so is not 100% continuously deployed)
Many of my thoughts follow the 12-factor app principles, and if not familiar, I strongly suggest reading through it even though it is becoming dated.
All source of a project should be managed in version control except for secrets and large binaries. This includes the non-secret configuration variables for all environments, code-as-infrastructure source, container definitions, and possibly even the CI/CD pipelines themselves (more on that below).
Most configuration should be externalized such that a given runtime artifact can be configured to run in another environment without a recompile.
Most SaaS type projects should be able to be developed on OS X, Windows, Windows with WSL, Linux, etc. It should generally be an individual developer choice and preference. Making build processes and developer tools work across all platforms sometimes takes a little extra effort, but I have found that it tends to result in more robust dev environments and makes for CI/CD pipelines that are more closely aligned to developer setups.
There are exceptions to this depending on project and team. For example:
- Use of docker in development teams can make workstation choice less irrelevant. However, using docker for a completely contained dev setup has significant downsides, and I usually do not recommend.
- Similarly, the use of cloud based development environments can make local workstation choice less relevant.
- Certain projects are inherently platform specific. i.e. Native iOS development, native Windows graphics development, etc.
- The availability of WSL on Windows makes Windows specific support in build tools less important. Many teams can/should now expect Windows workstation developers do all build activities in WSL.
Generally, a compiled build, or packaged artifact, should not be added to source control. Artifacts should always be built automatically, ideally by an automated CI/CD pipeline, resulting in a versioned artifact. That versioned artifact is then promoted up through the environments or referenced via dependency management. Built artifacts should not be managed with source control.
Some teams will make use of dependency management systems. Those teams should consider setting up a private repository for managing and sharing internally built dependencies. e.g. AWS CodeArtifact , JFrog Artifactory, etc.
Versioning and Releases
Every build package should be semantically versioned. i.e. MAJOR.MINOR.PATCH. How that actually works is dependent on team and tech stack. I often put an incremental BUILD-NUMBER in place of PATCH, but there are other options. The assigned build version should always be evident somehow and ideally encoded in to the package name. If there's a UI component, the version should be visible in the UI. It should always be relatively easy for anyone, even non-developers, to say "I did/saw/experienced this while using version X.Y.Z". And all automatic logging, error reporting, etc. should be able to include the version number.
There are various conventions that work, but roughly:
- Every shared build gets an incremental BUILDNUM (or incremented PATCH).
- Every "release" results in the MINOR getting incremented. What constitutes a release is dependent on project, but usually after something gets pushed to PROD, or is made available to customers, then the working source gets a version bump in prep for the next release.
- Very large feature changes, or large backwards incompatible or API breaking changes, should result in MAJOR getting incremented.
Note that products might be versioned separately from their components in more complex projects and organizations.
Git Repo Setup
Projects needing to source control binary files of any significant size should make use of Git Large File Storage (LFS).
All projects should include a .gitattributes file in their root. One reason is to configure Git LFS, but just as important is to configure consistent and sane line ending handling by git. Or more specifically, to disable Git's insane default line ending handling.
This is controversial to a few still, but most would agree that early Git versions made a mistake by setting the default line ending handling to be "auto". This default makes working on code across multiple platforms very frustrating, and it can be a hard to diagnose and later fix problem. In modern times, all source files should be using LF line endings, except for Windows specific files. Every modern IDE supports LF line endings without issue.
* text eol=lf
or maybe this:
but never this (which is the default):
Some teams expect developers to configure their git install appropriately with
git config --global core.autocrlf false
, but by using a
.gitattributes, it becomes a non-issue.
Code Formatting and Conventions
Whether IDE specific project files should be added to source control is project and team specific. Often times, it makes sense to share certain project files when multiple developers are using the same IDE. But whenever possible, IDE choice should be up to the individual developer.
Code formatting and convention choices are almost a religious topic to some developers. Having worked across multiple projects and teams over the years, I've become mostly ambivalent. My only suggestions:
- Automate the formatting and any important rules enforcement as much as possible. But try not to over do it. Over bearing rules can have quickly diminishing returns and cause frustration.
- Reference an existing syntax/conventions document specific to the language being used. There's little reason to write one from scratch. Alternatively, specify team specific conventions only as exceptions to any of the default and common industry standard rules for the language/platform being used.
The ideal setup allows for project-wide automatic formatting at any time. This results in less formatting-only diffs as different developers work on the same code over time.
The development world is generally heading towards a set of common patterns, so some of this might seem obvious. Other parts are very arbitrary with multiple differing opinions.
I previously used GitFlow heavily, but gitflow can be complex to use, and is overkill for many projects. It's now considered a legacy git branching strategy by most teams and there are multiple alternatives, most with their own trade-offs. Git Flow still has a place though for teams needing to support multiple product versions simultaneously, and I consider it a sort-of superset of most other strategies.
This an area where being clear about the distinctions between SaaS vs On-premise, Cloud vs. Product, and Continuous-Deployment best practices is important because it changes the recommended git branching strategy for a team.
The general procedure though (for gitflow):
/mainbranch holds the gold copy of source. It generally represents the latest release. It should always be confidently deployable/releasable.
/developbranch holds the latest code changes shared by developers. Any merge to
/developwill immediately and automatically be built/tested and then deployed to
When needed (for long-lived releases and products), a
/release-x.y.zbranch is created for managing hotfixes, etc. This allows work to be done on a release without holding up ongoing development.
Developers work on
When a developer completes their unit of work, they submit a PR against
- Some organizations will automatically run CI and unit tests against the PR when it is created. Many open source projects do this on Github.
When it makes sense, the developer submitting the PR can assign it for review to a specific person. When there is no one specific person that should review it (which is ideal on an agile team), then anyone on the team should review when they have some time. It becomes a shared responsibility, and usually works well in small teams.
Reviewer can supply comments, questions, suggestions on the PR. Sometimes a response from the developer is required, sometimes not.
If/when reviewer feels it's acceptable to merge the PR, then they give a simple :thumbsup. At that point, developer can merge in with whatever way makes the most sense. (Sometimes as merge, sometimes as a rebase.)
Developer can then close and delete PR. Most teams will delete the branch at that point, or soon after, too.
Any merge to the
/develop branch is automatically versioned and built/unit-tested/deployed to the ALPHA environment.
It might also be automatically deployed to a TEST environment where it becomes accessible to others (usually non-public)
with some subset of data preloaded allowing for internal review.
Every build package should generally be available for some period of time. Release builds should be available until there's no chance of them ever being referenced again.
A separate procedure is used by QA and/or release managers to promote a build to
PROD. The version promoted
will have already been deployed to
ALPHA first. There is no separate or new version made for releases. Part of the
PROD procedure should result in the MINOR version getting incremented, and the BUILDNUM/PATCH reset to
0, for whatever comes next on the
/develop branch. i.e. A v1.0.325 gets released to customers, and at the same time, the
develop branch version gets incremented to 1.1.0.
Continuous Integration(CI) and Continuous Deployment (CD)
The basic idea for CI is that every incoming unit-of-work from developers is automatically built/tested then made available for upstream usages. A Jenkins server was commonly used for this in the past, but most modern solutions are cloud based and easier to work with. Some popular choices (the first two being ones I'm most familiar with):
- GitHub Actions,
- AWS CodeBuild / CodePipeline
- Atlassian Bamboo
- Azure Pipelines
Continuous deployment goes further and says that every incoming unit-of-work will get automatically deployed. Generally teams that refer to CD mean automatically deployment all the way to production, assuming the build passes all the automated tests along the way. It requires robust automated testing and an experienced team. It can get complex for team that need to do upgrades and have minimal, to no, downtime. It generally only makes sense for centralized cloud systems where the dev team is in complete control of the rollout. However, it can also make sense for on-premise products if those products include some sort of automatic built-in update routine.
There are many in-between scenarios for CD. A common one being to have CI/CD up to a certain stage, but with deployment to production still gate checked by human operator deciding if/when/which releases to actually put in to production.
Infrastructure as Code (IaC)
“Infrastructure as code, also referred to as IaC, is a type of IT setup wherein developers or operations teams automatically manage and provision the technology stack for an application through software, rather than using a manual process to configure discrete hardware devices and operating systems. Infrastructure as code is sometimes referred to as programmable or software-defined infrastructure.
The concept of infrastructure as code is similar to programming scripts, which are used to automate IT processes. However, scripts are primarily used to automate a series of static steps that must be repeated numerous times across multiple servers. Infrastructure as code uses higher-level or descriptive language to code more versatile and adaptive provisioning and deployment processes.”
IaC approaches tend to be less verbose than more procedural scripted approaches, but aren't always. IaC tends to allow repeatability in that the IaC toolset is responsible for setting things up in a way that the devops team specifies declaratively.
IaC has an emphasis on server and cloud based systems. Common IaC choices:
- CloudFormation (but only makes sense for AWS)
- AWS CDK (my favorite...but only for AWS)
From past experience, I would avoid Puppet and Chef. They are heavy-handed and require significant expertise. I don't know Terraform or Ansible well enough to have strong opinions.
AWS CDK is especially interesting because it is programmatic. The result is ultimately CloudFormation, but generating it can be done programmatically. That makes it more flexible, and importantly, it opens it up to easier reuse and certain levels of automated testing. i.e. CDK constructs can be unit tested.
While IaC tends to emphasize cloud type environments, there are tools that might be considered IaC like for other types of projects. For example, when building custom linux images/firmware:
Local, Cloud, and Serverless
Many modern projects have components that exist only in the cloud. i.e. AWS Lambdas, DynamoDB, API Gateway, etc. Development and testing with these types of components can be cumbersome. There are emerging solutions like LocalStack, aws-sdk-mock, etc.
However, with a smart CDK based pipeline, another option is for developers to create developer specific environments, or even complete pipelines, as needed. I find this approach preferable to most mocking type solutions because it avoids the inevitable mismatch between the facades and reality.
Containers have changed the modern development and deployment landscape. Modern teams should be well versed in the advantages of using Docker, Docker Compose, and similar technologies for development, and testing, and deployment.
DevOps teams at operating at a larger scale will also want to be familiar with Kubernetes.
Application Source vs IaC Source vs Pipeline Source
Consider that for a modern SaaS project, there might be three logical code bases:
- Application Source
- The source that gets compiled and packaged in to an application bundle. Configurable at runtime per environment.
- Environment IaC
- The IaC for setting up a server or "environment". Often, it's configurable for changing the resulting infrastructure size, costs, security, etc.
- Pipeline IaC
- The IaC for setting up a CI/CD pipeline. Perhaps multiple variants of a pipeline, or perhaps multiple pipelines.
Many projects will put all of these in a single repository, and many more combine the environment and pipeline IaC (if they use IaC for the pipeline), but they can be considered as logically distinct code bases.
One reason for considering them distinct is that versioning and deployment aspects are sometimes different. For example, sometimes infrastructure changes will be one-way only and destructive.
The environment IaC might, or might not, deploy the application bundle, instead of the pipeline doing it. That tends to be team and project dependent.
Stages vs Environments vs Regions/Zones
This is my personal nomenclature because I don't know the industry standard naming around much of this. I would generally recommend using naming according to the tools being used. For example, AWS CDK has specific nomenclature. i.e. Constructs, Stacks, Apps, Environments, etc. The AWS CDK notion of "environment" does not map one-to-one with my naming of environment here.
A stage represents logical state in the sequence of a pipeline. A pipeline is implemented with steps that move a build through stages and do actions at each stage. Example stages:
A complete pipeline might look something like this:
An deployment environment represents a complete and contained setup of a project, generally live and running. There will often be an almost one-to-one correspondence between pipeline stages and environments, but not always. Example environments:
Advanced IaC based teams cab allow for fast and easy creation (and teardown) of arbitrary new environments, such that perhaps each developer will have their own DEV-CLOUD environment per branch, testers can create their own environments as needed, etc.
See also: distribution groups, blue/green, rollout targets, availability zones, blast radius, etc. At larger scales and for projects with uptime requirements, redundancy needs, etc., a logical environment (i.e. PROD) might contain multiple subgroups organized by region, blast radius, and other possibilities. An example of regional organization:
AWS Account Management
Environments can be organized multiple ways in AWS. Examples:
- All environments in a single AWS account
- One AWS account per environment
- Multiple environments in a single shared AWS account, but PROD in its own AWS account
- Accounts (and environments) created on demand (perhaps for developer or load testing)
In general, my recommendation is to use one AWS account per environment when the environment will be long-lived and shared. And then one AWS account per developer/tester. reasons:
- It's easier to keep track of costs per environment (and per person) that way.
- Sometimes environments will have external dependencies, integrations, or differing infosec requirements that limit what an environment can contain.
- Giving each developer/tester their own account provides a lot of freedom and flexibility, while still maintaining accountability.
Managing multiple AWS accounts has become much easier over the years, and is now generally considered abest practice for most teams. It can quickly get overly complex if not careful though, and there are many in-between options and recommendations .
Useful AWS services, with the first two being the ones I have the most experience with:
Autoscaling, Metrics, and Logging
As a project gets deployed, being able to auto-scale up (and down!), metrics, logging, alerting, and notifications all become more important. Depending on project and expectations, capabilities for handling those aspects should be considered early on in the project life cycle, even if not needed initially.