Skip to main content
paulund

7 min read

#git#version-control#code-review#git-log#git-shortlog

Git Commands for Reading New Codebases

When you join a new project or start reviewing an unfamiliar repo, opening random files is a slow way to build context. The commit history already contains a map of the codebase. It tells you which files change constantly, who built what, where bugs keep appearing, and whether the team is shipping with confidence.

These five commands take a couple of minutes to run and tell you exactly where to focus your attention.


1. Find the Most-Changed Files

git log --format=format: --name-only --since="1 year ago" | sort | uniq -c | sort -nr | head -20

This lists the 20 files with the most commits over the past year. These are your churn hotspots, the files where active development is concentrated.

Breaking down the flags:

  • --format=format: suppresses all commit metadata (hash, author, message). You only want the file paths.
  • --name-only outputs the filenames affected by each commit, one per line.
  • --since="1 year ago" limits the search to the last 12 months. You can adjust this window depending on the project's age.

The piped commands do the counting: sort groups identical filenames together, uniq -c counts consecutive duplicates, sort -nr sorts by count in descending order, and head -20 caps the output at 20 results.

Files at the top of this list are where the action is. If a configuration file ranks high, the project probably went through infrastructure changes recently. If a single business logic file dominates, that file is doing too much and is likely a candidate for refactoring.


2. See Who Built the Project

git shortlog -sn --no-merges --since="6 months ago"

This ranks every contributor by commit count over the last six months.

Breaking down the flags:

  • -s (summary) collapses each author's commits into a single count instead of listing every message.
  • -n sorts by number of commits, highest first.
  • --no-merges excludes merge commits, which would inflate the count for whoever handles pull request merges.
  • --since="6 months ago" focuses on recent activity, not the full project lifetime.

What you are looking for: if one person accounts for 60% or more of the commits, that is your bus factor. The project depends heavily on one contributor. If that person is still active, they are your best source of context. If they left six months ago and nobody picked up the pace, you are looking at a maintenance risk.

Drop the --since flag to see the all-time picture and compare it with the recent window. A healthy project shows knowledge spreading across the team over time, not concentrating.


3. Find Where Bugs Cluster

git log -i -E --grep="fix|bug|broken" --name-only --format='' | sort | uniq -c | sort -nr | head -20

This filters the commit history down to messages containing "fix", "bug", or "broken", then counts which files appear most often in those commits.

Breaking down the flags:

  • -i makes the grep case-insensitive, catching "Fix", "FIX", and "fix" alike.
  • -E enables extended regular expressions so the | (or) operator works without escaping.
  • --grep="fix|bug|broken" filters commits whose messages match any of those terms.
  • --name-only and --format='' work the same way as in command 1, stripping everything except filenames.

The real insight comes from cross-referencing this list with the churn hotspots from command 1. Files that appear in both lists are your highest-risk code. They keep changing and they keep breaking. That pattern usually means the code was patched repeatedly without anyone addressing the root cause.

You can extend the grep pattern to include terms your team uses. If the project follows Conventional Commits, try --grep="^fix" to match only commits typed as fixes.


4. Check if the Project Is Accelerating or Dying

git log --format='%ad' --date=format:'%Y-%m' | sort | uniq -c

This gives you commit counts grouped by month for the entire history of the repository.

Breaking down the flags:

  • --format='%ad' outputs only the author date for each commit. The %ad placeholder is one of many format specifiers git log supports.
  • --date=format:'%Y-%m' formats the date as year-month (e.g., 2026-04), which groups commits by calendar month when piped through sort | uniq -c.

The output is a simple time series. Look for trends. A steady or growing count means active development. A sharp drop-off could mean the project shipped, the team shrank, or people moved on. A sudden spike often lines up with a deadline or a major rewrite.

This also helps calibrate your expectations. If the project averages 200 commits a month, a 20-file change is routine. If it averages 10 commits a month, that same change is a major event.


5. Measure How Often the Team Is Firefighting

git log --oneline --since="1 year ago" | grep -iE 'revert|hotfix|emergency|rollback'

This searches the last year of commit messages for signs of reactive work: reverts, hotfixes, emergency patches, and rollbacks.

Breaking down the flags:

  • --oneline condenses each commit to a single line (short hash + message), making it easy to scan and pipe.
  • --since="1 year ago" limits the window.

The grep -iE part is not a git flag but a standard Unix filter:

  • -i makes the search case-insensitive.
  • -E enables extended regex for the | operator.
  • 'revert|hotfix|emergency|rollback' matches any of those keywords.

If you see reverts every couple of weeks, the team does not trust its deploy process. That points to unreliable tests, a missing staging environment, or a broken deploy pipeline. Occasional reverts are normal and healthy. Frequent reverts are a process problem, not a code problem.


The Common Patterns

These five commands share a few building blocks worth understanding on their own.

git log format placeholders

The --format flag (also written --pretty=format:) accepts placeholders for any piece of commit metadata:

PlaceholderOutput
%HFull commit hash
%hShort commit hash
%anAuthor name
%aeAuthor email
%adAuthor date (respects --date)
%sSubject (first line of message)

Setting --format=format: or --format='' outputs nothing, which is useful when you combine it with --name-only and only want file paths.

git log filtering

You can narrow git log output with several filters:

  • --since and --until for date ranges
  • --author="name" for a specific contributor
  • --grep="pattern" for commit message text
  • -- path/to/file for changes to a specific file or directory

These filters compose. You can combine --author, --since, and --grep in a single command to answer very specific questions like "what did Alice fix in the payments module last quarter?"

git log --author="Alice" --grep="fix" --since="3 months ago" -- src/payments/

git shortlog vs git log

git shortlog is a convenience wrapper around git log designed for summarizing contributions. The -sn flags are the most common combination, giving you a sorted leaderboard of commit counts. Without -s, it groups commit messages under each author's name, which is useful for generating changelogs.

The Unix pipeline

Every command above pipes git output through standard Unix tools:

  • sort orders lines alphabetically (needed before uniq)
  • uniq -c collapses adjacent identical lines and prefixes each with a count
  • sort -nr sorts numerically (-n) in reverse order (-r)
  • head -20 takes the top 20 results

This pattern of sort | uniq -c | sort -nr | head -N is a general-purpose frequency counter. It works on any line-oriented output, not just git.


Putting It All Together

Run all five commands on a new codebase and you will know:

  1. Which files to read first (the ones that change the most)
  2. Who to ask questions (the top contributors)
  3. Which areas are fragile (bug hotspots that overlap with churn)
  4. Whether the project is healthy (commit velocity trends)
  5. Whether the team ships with confidence (revert/hotfix frequency)

That is more context than you would get from an hour of browsing random files. Start with the high-churn, high-bug files from commands 1 and 3. Read them with the knowledge of who wrote them (command 2) and how the project's pace has changed (command 4). The commit history is the best documentation most projects have.


Related

  • Writing Good Commit Messages covers how to write the kind of commit messages that make these commands useful in the first place.
  • Git Hooks shows how to automate quality checks locally before commits reach the shared history.
  • Reviewing Pull Requests takes the next step after orienting yourself in a codebase.

Related notes


Newsletter

A weekly newsletter on React, Next.js, AI-assisted development, and engineering. No spam, unsubscribe any time.