open-source

Introducing git-xargs: an open source tool to update multiple GitHub repos

git-xargs allows you to run arbitrary commands or scripts against many repos in parallel
Introducing git-xargs: an open source tool to update multiple GitHub repos
Zack Proser
Published April 21, 2021

Today we’re open sourcing git-xargs, a command-line tool (CLI) for making updates across multiple Github repositories with a single command. You give git-xargs a script to run and a list of repos, and it checks out each repo, runs your scripts against it, commits any changes, and opens pull requests with the results. At the end of each run, you get a detailed report on exactly what happened with each repo:

For example, have you ever needed to add a particular file across many repos at once? Or to run a search and replace to change your company or product name across 150 repos with one command? What about upgrading Terraform modules to all use the latest syntax? How about adding a CI/CD configuration file, if it doesn’t already exist, or modifying it in place if it does, but only on a subset of repositories you select?

You can handle these use cases and many more with a single git-xargs command. Just to give you a taste, here’s how you can use git-xargs to add a new file to every repository in your Github organization:

git-xargs \
--branch-name add-contributions \
--github-org my-example-org \
--commit-message "Add CONTRIBUTIONS.txt" \
touch CONTRIBUTIONS.txt

In this example, every repo in the my-example-org GitHub org have a CONTRIBUTIONS.txt file added, and an easy to read report will be printed to STDOUT :

In this blog post, I’m going to cover the following topics:

  • Example use cases for this tool
  • Why we built git-xargs
  • How git-xargs works under the hood
  • How you can get started with git-xargs quickly

Let’s look at some example use cases for git-xargs

In the following sections we’ll take a look at some of the use cases we found git-xargs useful for tackling internally, as well as some suggested tasks it would be well-suited for in the wild.

  1. This script will create a LICENSE.txt file, if it doesn’t exist already, and put the MIT license in it.
  2. If a LICENSE.txt file already exists, it will update the copyright year.
#!/usr/bin/env bash

YEAR=$(date +"%Y")
FULLNAME="Gruntwork, LLC"

function create_license {
cat << EOF > LICENSE.txt
MIT License

Copyright (c) 2016 to $YEAR, $FULLNAME

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
EOF
}

# Copyrights should be declared as "$CREATION_YEAR to $CURRENT_YEAR"
# Therefore, this sed command will look to update the date immediately following the word "to"
function update_license_copyright_year {
echo "Updating license copyright year to $(date +%Y)..."
sed -i "s|to \([1-9][0-9][0-9][0-9]\)|to $(date +%Y)|" LICENSE.txt
if [ $? -eq 0 ]; then
echo "Success!"
else
echo "Error!"
fi
}

# if the repo does not contain a LICENSE.txt file, then create one with the correct year
if [ ! -f "LICENSE.txt" ]; then
echo "Could not find LICENSE.txt at root of repo, so adding one..."
create_license
else
update_license_copyright_year
fi
add-or-edit-license.sh view raw

Use case: Updating CircleCi contexts (or other YAML files) across repos

We use CircleCI extensively, and we wanted to begin leveraging their new contexts feature so that we could increase the security and control of our build projects.

Using contexts is more maintainable and secure than copying and pasting the same secrets as environment variables throughout every repository that needs access to the same credentials, such as AWS keys in order to run Terratest integration tests on every build.

By having all of our projects use a single CircleCI context, we could easily rotate our secrets in one place and have all of our projects pick up the change instantly. However, to get this working we also needed to update our CircleCI workflows syntax version to 2.0 across all our repos, as this is the earliest version that introduced support for contexts. Imagine our YAML files looked something like this:

workflows:
# We need to upgrade this version to 2 across all our repos to be able to use the context nodes!
version: 1
build-and-test:
jobs:
- test:
context:
- Gruntwork Admin
filters:
tags:
only: /^v.*/
- build:
context:
- Gruntwork Admin
requires:
- test
filters:
tags:
only: /^v.*/

- deploy:
context:
- Gruntwork Admin
requires:
- build
filters:
tags:
only: /^v.*/
branches:
ignore: /.*/
config.yml view raw

To accomplish this, we leveraged Mike Farah’s excellent yq tool to programmatically update all our .circleci/config.yml files. This script is a simplified version that shows how you might bump your CircleCI workflows to the version that supports contexts, but you can imagine how you could easily extend this to add or rotate CircleCI contexts, etc.

#!/usr/bin/env bash

echo "Upgrading CircleCI workflows syntax to 2..."

yq w -i .circleci/config.yml 'workflows.version' 2

# remove stray merge tags from the final output
sed -i '/!!merge /d' .circleci/config.yml
circle-ci-workflows-upgrade.sh view raw

Use case: Rename all your repositories to follow a new pattern

We recently renamed all of our repos to be compliant with Hashicorp’s Terraform Cloud, which required that every repo be prefixed with terraform-aws-<module-name> . Without git-xargs this would have been a pretty painful exercise given the scope of our Terraform library.

After we renamed all our repositories through the Github UI, we still needed to update all our internal references throughout our source code to point to the updated repo names (in READMEs, code and URLs). We were able to write one Bash script that did this for us:

#!/usr/bin/env bash
# We renamed a ton of our repos to the Terraform Registry naming format. This script uses grep and sed to search for
# references to the old names and replace them with the new names.

# Bash doesn't have a good way to escape quotes in strings. The official solution is to list multiple strings next
# to each other (https://stackoverflow.com/a/28786747/483528), but that becomes unreadable, especially with regex.
# Therefore, to make our regex more readable, we are declaring clearly-named variables for the special characters we
# want to match, and including those in a string with no need for escaping or crazy concatenation
readonly DOUBLE_QUOTE='"'
readonly SINGLE_QUOTE="'"
readonly BACKTICK='`'
readonly START_OF_LINE='^'
readonly END_OF_LINE='$'
readonly FORWARD_SLASH='\/'
readonly DOT='\.'
readonly WHITESPACE='\s'
readonly OPEN_BRACKET='\['
readonly CLOSE_BRACKET='\]'
readonly OPEN_PAREN='\('
readonly CLOSE_PAREN='\)'
readonly OPEN_CURLY_BRACE='\{'
readonly CLOSE_CURLY_BRACE='\}'

# When replacing old names with new, these are regular expressions for the characters we allow before a name or after.
# We check these characters explicitly to make sure we don't accidentally replace one of the names when it just happens
# to appear as a substring in some unrelated word. E.g., "module-ci" should NOT be replaced in
# "gruntwork-module-circleci-helpers".
readonly ALLOWED_CHARS_BEFORE="($START_OF_LINE|$WHITESPACE|$FORWARD_SLASH|$DOUBLE_QUOTE|$SINGLE_QUOTE|$BACKTICK|$OPEN_BRACKET|$CLOSE_BRACKET|$OPEN_PAREN|$CLOSE_PAREN|$OPEN_CURLY_BRACE|$CLOSE_CURLY_BRACE)"
readonly ALLOWED_CHARS_AFTER="($END_OF_LINE|$WHITESPACE|$FORWARD_SLASH|$DOUBLE_QUOTE|$SINGLE_QUOTE|$BACKTICK|$OPEN_BRACKET|$CLOSE_BRACKET|$OPEN_PAREN|$CLOSE_PAREN|$OPEN_CURLY_BRACE|$CLOSE_CURLY_BRACE|$DOT)"

# The list of repos to replace, in pairs, where the first entry is the old name and the second entry is the second name
# (we use this ugly array instead of a map because the old Bash version on Mac doesn't support maps).
readonly REPLACEMENT_PAIRS=(
"module-vpc"                  "terraform-aws-vpc"
"module-aws-monitoring"       "terraform-aws-monitoring"
"package-directory-services"  "terraform-aws-directory-services"
"module-file-storage"         "terraform-aws-file-storage"
"module-ecs"                  "terraform-aws-ecs"
"module-security"             "terraform-aws-security"
"cis-compliance-aws"          "terraform-aws-cis-service-catalog"
"aws-service-catalog"         "terraform-aws-service-catalog"
"aws-architecture-catalog"    "terraform-aws-architecture-catalog"
"package-terraform-utilities" "terraform-aws-utilities"
"module-ci"                   "terraform-aws-ci"
"module-asg"                  "terraform-aws-asg"
"module-server"               "terraform-aws-server"
"package-beanstalk"           "terraform-aws-beanstalk"
"package-openvpn"             "terraform-aws-openvpn"
"module-data-storage"         "terraform-aws-data-storage"
"module-load-balancer"        "terraform-aws-load-balancer"
"package-zookeeper"           "terraform-aws-zookeeper"
"package-kafka"               "terraform-aws-kafka"
"package-messaging"           "terraform-aws-messaging"
"module-cache"                "terraform-aws-cache"
"package-static-assets"       "terraform-aws-static-assets"
"package-elk"                 "terraform-aws-elk"
"package-mongodb"             "terraform-aws-mongodb"
"package-lambda"              "terraform-aws-lambda"
"package-datadog"             "terraform-aws-datadog"
"package-waf"                 "terraform-aws-waf"
"package-sam"                 "terraform-aws-sam"
"module-ci-pipeline-example"  "terraform-aws-ci-pipeline-example"
)

# Finds all files in the repo to replace, taking care to exclude the .git folder, Terraform & Terragrunt scratch
# folders, binary files, and other files we shouldn't be touching.
function find_files_to_update {
find . \
-not -iwholename '*.git*' \
-not -iwholename '*.terragrunt-cache*' \
-not -iwholename '*.terraform*' \
-not -iwholename '*.png' \
-not -iwholename '*.jpg' \
-not -iwholename '*.jpeg' \
-not -iwholename '*.gif' \
-not -iwholename '*.bmp' \
-not -iwholename '*.tiff' \
-not -iwholename '*.DS_Store*' \
-not -iwholename '*.go' \
-not -iwholename '*go.mod' \
-not -iwholename '*go.sum' \
-type f
}

# Format a regex replacement string for use with perl. The returned value will be of the format:
#
# s/<REPO_OLD_NAME_1>/<REPO_NEW_NAME_1>/g; s/<REPO_OLD_NAME_2>/<REPO_NEW_NAME_2>/g; s/<REPO_OLD_NAME_3>/<REPO_NEW_NAME_3>/g; ...
#
# This string will allow us to replace multiple values in a single call to Perl.
#
# https://stackoverflow.com/a/8934117/483528
function format_replacement_string {
local replacements=""
local i old_name new_name
for ((i=0;i<"${#REPLACEMENT_PAIRS[@]}";i+=2)); do
old_name="${REPLACEMENT_PAIRS[i]}"
new_name="${REPLACEMENT_PAIRS[i+1]}"

# This is the sed-like regex for the replacements we'll be doing. To help create this regex, I used this handy
# online regex tester, that not only gives you nice highlighting, but even lets you define a bunch of test cases to
# check against!
#
# https://regexr.com/5l8n4
#
replacements+=" s/$ALLOWED_CHARS_BEFORE$old_name$ALLOWED_CHARS_AFTER/\$1$new_name\$2/g;"
done

# Strip extra space at start of string: https://stackoverflow.com/a/16623897/483528
echo "${replacements# }"
}

# The main entrypoint for this script
function update_all_repo_names {
local replacements
replacements=$(format_replacement_string)

local files_to_update
files_to_update=($(find_files_to_update))

local file
for file in "${files_to_update[@]}"; do
# I originally used sed, but on Mac, sed added an unnecessary newline at the end of every single file it touched,
# so I switched to Perl. This also has the added benefit of allowing us to process multiple replacements in a
# single call.
perl -i -pe "${replacements[@]}" "$file"
done
}

update_all_repo_names
update-all-repo-names.sh view raw

Why did we build git-xargs?

Why did we build this? At Gruntwork we maintain over 150 repositories, containing hundreds of thousands of lines of code. Thousands of developers rely on this code in production.

This means a large part of what we do is keep our code, especially our Infrastructure as Code (IaC) library, up to date with best practices, new releases (e.g.; new Terraform versions), and security patches. In addition to constantly shipping improvements and new features for our IaC library and service catalog, we also have to stay on top of maintenance tasks that are constantly cropping up.

And we do all of this with a team of less than 20 engineers! git-xargs helps us to more efficiently and quickly perform tasks that require updating many of our repositories at once.

How git-xargs works

git-xargs allows you to run a script (or multiple scripts, e.g., Bash, Ruby, Python) against 5, 50, or 150 repos at once! You can select the exact repos you want to run it against either by supplying the --github-org flag to match every repo in your Github org, or by providing a flatfile that explicitly lists which repos you want it to operate on (see below for an example).

  1. git-xargs will clone each of your selected repos to your machine to the /tmp/ directory of your local machine.
  2. it will checkout a local branch (whose name you specify)
  3. it will run all your selected scripts against your selected repos
  4. it will commit any changes in each of the repos (with a commit message you can optionally specify)
  5. it will push your local branch with your new commits to your repo’s remote
  6. it will call the Github API to open a pull request with a title and description that you can optionally specify. If you don’t specify these, git-xargs will use your commit-message for the PR title and description, if you provide one, or fall back to defaults for all 3, if you don’t.
  7. it will print out a detailed run summary to STDOUT that explains exactly what happened with each repo and provide links to successfully opened pull requests that you can quickly follow from your terminal. If any repos encountered errors at runtime (whether they weren’t able to be cloned, or script errors were encountered during processing, etc) all of this will be spelled out in detail in the final report so you know exactly what succeeded and what went wrong.

git-xargs does all this using goroutines, so it is pretty fast, as it runs against multiple repos concurrently.

Additional ideas to implement as scripts for git-xargs

Here are some other starter ideas for scripts that would be good candidates to run via git-xargs :

  • Modify package.json files in-place across repos to bump a common node.js depdency using jq https://stedolan.github.io/jq/
  • Update your Terraform module library from Terraform 0.13 to 0.14 .
  • Remove stray files of any kind, when found, across repos using find and its exec option
  • Add a new file of any kind with conditional contents to repos using heredoc syntax: https://stackoverflow.com/questions/2953081/how-can-i-write-a-heredoc-to-a-file-in-bash-script
  • Rename every instance of a company or product name that has changed using sed
  • Add baseline tests to repos that are missing them by copying over a common local folder where they are defined
  • Refactor multiple Golang tools to use new libraries by executing go get to install and uninstall packages, and modify the source code files’ import references

Give it a shot and let us know what you think, and contribute some scripts!

Please give git-xargs a shot and let us know what you think! Grab a copy of the binary from the git-xargs releases page, give it execute permissions, and run --help to get started:

chmod u+x git-xargs
git-xargs --help

If you have a good script that you believe is generic enough to be of use to many other people, please open a pull request against our scripts directory so that others may benefit from it!

If you find bugs or have ideas for ways we could extend and improve it, please feel free to file a Github issue.