Manage a large number of GitHub repos (command line tools)
View the Project on GitHub salewski/ads-github-tools
2022-10-26
):ads-github-tools-0.3.5.tar.gz
ads-github-pages
' project home pageThe ads-github-tools
project provides command line tools for managing a
large number of GitHub repositories, motivated by the following two related
use cases:
When you work on multiple computers, keeping your local clones of the repos in synch can be a chore.
Similarly, when you have a large number of GitHub repos that are mostly forks of other repos, keeping your forks in synch with the upstream changes can be a chore.
ads-github-tools
.
Typical usage involves ensuring there is an updated cache, and then just invoking a couple of commands.
The cache is updated like this:
$ ads-github-cache -v --update
First time users will want to update the cache prior to running any of the
other commands. It may take a while to poplute the cache the first time
(especially if you have a large number of GitHub repos), but updating it
thereafer is done conditionally and on an as-needed basis. Note that the only
objects cached are those used by the various utilities in the
ads-github-tools
; it does not cache any data that is not used.
The cache is provided in order to avoid unnecessary network round trips to the GitHub API servers and allow the utilities to work quickly in an interactive shell process. The cache storage is specific to each GitHub user. A Unix system user may have multiple GitHub accounts (say, work and personal); the cached data for those GitHub users would be stored separately.
Regular users would probably just run the above cache update command (say, once or twice a day) from a cron job (or similar). That way the cached data is already available when you need it.
With the cached data in hand, you're ready to run the other utilities. Typical invocation involves only one or two commands, both invoked from the parent directory of the user's directory that contains his GitHub projects (forked and otherwise). The 'ads-github-tools' author typcially runs these two commands once or twice daily:
$ ads-github-fetch-all-upstreams -v -c $ ads-github-merge-all-upstreams -v -k -p
Many users would prefer to run those commands from a crontab (or similar); the author runs them manually because he is actively hacking on the tools and wants to inspect the output.
The following invocation is also handy for creating local clones of recently forked GitHub repos:
$ ads-github-fetch-all-upstreams -m
That will clone all of a user's GitHub repos for which there is no working directory beneath the current directory. It can be thought of as creating clones of "all the new stuff (and just the new stuff)"; it operates much more quickly than invocations that operate on all of a user's GitHub repos (assuming only a minority of them are "missing", which is the common case). Note that the '-m' (--missing-only) option was introduced in the version of 'ads-github-fetch-all-upstreams' released with ads-github-tools-0.2.0 (circa 2017).
Installation should be as simple as:
$ ./configure # add a --prefix=path/to/some/location to install someplace other than beneath /usr/local $ make $ make check # optional, but encouraged $ make install
At the time of writing (2022-10-26) there are ten tools:
ads-github-cache - Manipulate the ads-github-tools cache. Mainly used to keep relevant GitHub data at-the-ready for use by the other utilities, but will also be of interest to software developers working with the GitHub v3 API. Because the tool can hide the paging necessary to obtain all of the items for a given resource, the following is a very fast way to get all of data for all of the user's repo:
$ ads-github-cache --get-cached '/user/repos' | jq '.'
ads-github-whoami - Show the currently authenticated GitHub user. This program similar in spirit to the Unix whoami(1) and id(1) commands, but shows information about the authenticated GitHub user (as understood by the GitHub API). Obtains its direct information via the GitHub v3 API call:
GET /user
Default output is plain text for interactive command line use, but we provide a knob to have the raw JSON response emitted, as well. The ads-github-whoami command was added in ads-github-tools-0.3.1.
ads-github-normalize-url - Produces a "normalized" view of a given URL, suitable for use in generating an ID. Currently is a quick 'n dirty implementation optimized for this sole purpose, so there's no guarantee that the normalized variation of the URL will actually work
ads-github-hash-url - Similar in spirit to git-hash-object(1), this tool takes a (presumably normalized) URL and emits a checksum for it. Currently uses the SHA-3 256-bit algorithm variant.
ads-github-show-rate-limits - Show user's GitHub API rate limits ("core", "search", "graphql", others).
ads-github-fetch-all-upstreams - Operates on the working directories of a collection of GitHub-hosted git repositories. The user can specify one or more repositories explicitly to restrict operations to just those repos. Each that is found with an 'upstream' remote defined will have `git fetch upstream` invoked in it.
With the '--clone-if-missing' option, any of the
user's GitHub repos for which there is not a git working directory
beneath the current location will be cloned (using the
'git-hub' tool's clone
operation, which sets up
the 'upstream' remote if the repo is a fork).
There's also an '--upstream-remote-if-missing' option that will add the 'upstream' remote on existing project working directories that do not have it (only if the project is a fork of another project, of course).
'ads-github-tools-0.2.0' (circa 2017) added the '--missing-only' option (which implies '--clone-if-missing'). The '--missing-only' option limits the program's activity to operating on only the user's "missing" GitHub repos. It can be thought of as creating clones of "all the new stuff (and just the new stuff)"; it operates much more quickly than invocations that operate on all of a user's GitHub repos (assuming only a minority of them are "missing", which tends to be the common case).
ads-github-merge-all-upstreams - Operates on the working directories of a collection of GitHub-hosted git repositories. Each that is found with both 'origin' and 'upstream' remotes defined will have:
$ git merge --ff-only upstream/<DEFAULT_BRANCH_NAME>invoked in it. The user can specify one or more repositories explicitly to restrict operations to just those repos. The program is careful to sanity check the local repository before attempting any operations on it. Also, it will skip any repository for which the git index has any changes recorded. Will (temporarily) check out the default branch before merging (if the working directory happens to have some other branch checked out, or if it happens to be on a detached HEAD); will restore the originally checked out branch (or detached HEAD) when done if the temporary switch was necessary.
With one '--push' option, will invoke
'git push origin <DEFAULT_BRANCH_NAME>'
for each repo that has changes merged into it during the program
invocation.
With two '--push' options, will invoke
'git push origin <DEFAULT_BRANCH_NAME>'
for each repo that is not being skipped over for some other
reason (e.g. local changes in the working directory),
regardless of whether changes are merged into it during the
program invocation. This is useful for pushing changes that
were merged but not pushed during earlier runs of the program
invoked without the '--push' option.
ads-github-repo-create - Creates a new GitHub repo with a given name and attributes (description, default branch, etc.). Newly created repo is empty by default (suitable for importing existing Git repos into GitHub), but can optionally be auto-initialized (more suitable for projects being started from scratch). The ads-github-repo-create command was added in ads-github-tools-0.3.1.
ads-github-nproc - Prints on stdout the number of CPUs available to the current process. It provides a common interface to the rest of the ads-github-tools programs that may need that data. To accomplish its task, however, the program will attempt to use a number of platform specific techniques. The code for this program is adapted from the AX_COUNT_CPUS macro from the GNU Autoconf Archive. See ads-github-nproc(1) and the comments in the code for original authorship of the macro, as well a licensing that is inherited from the original m4 macro source file (though still GPL compatible).
parse-netrc - Parses the user's '~/.netrc' file to obtain the first matching record by host machine name (and optionally by user login name). This is used by ads-github-cache to determine the name of the in-effect GitHub user account, using the approach that emulates what curl(1) does. This allows for entirely offline cache operations against the user-specific cache. The tool name does not start with the 'ads-github-' prefix because it is being considered for extraction into a small stand-alone project.
The ads-github-tools-0.3.2 release (October 2020) introduced the foundations for the long-wanted caching system. It delivers on the ability to conditionally retrieve objects "through the cache" using 'ETag:' and 'If-None-Match:' to pull full objects over the network only when necessary. It also delivers on the goal of being easy to use from shell scripts or even in an interactive shell.
Note that conditionally fetching objects as described also has the benefit of avoiding incurring unnecessary hits against the user's GitHub API rate limit. (Not sure where you stand? Try 'ads-github-show-rate-limits -h' to find out!)
At the moment, the caching foundation is in place, and the ads-github-fetch-all-upstreams tool uses it to good effect. But currently it is the only tool that does so. Others will be enhanced, as well, in upcoming releases.
My earlier thinking was that there will be three levels of tools:
low-level tools that store and retrieve objects from the cache, and related
In the above list, 'KEY' is likey a "normalized" url, or perhaps just any url (and the tools would normalize the url behind the scenes to produce the "key").
mid-level tools that make use of the caching tools
high-level tools the build upon the first two levels above
GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
Unless otherwise stated by a different license notice in a particular file, all files in the `ads-github-tools' project are made available under the GNU GPL version 2, or (at your option) any later version.
See the COPYING file for the full license.
Copyright (C) 2016, 2017, 2019, 2020, 2021, 2022 Alan D. Salewski <ads@salewski.email> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
The ads-github-tools
project code was written by Alan D. Salewski.
The GitHub project page is located at: