7 min read
Git Commands for Reading New Codebases
When you join a new project or start reviewing an unfamiliar repo, opening random files is a slow way to build context. The commit history already contains a map of the codebase. It tells you which files change constantly, who built what, where bugs keep appearing, and whether the team is shipping with confidence.
These five commands take a couple of minutes to run and tell you exactly where to focus your attention.
1. Find the Most-Changed Files
git log --format=format: --name-only --since="1 year ago" | sort | uniq -c | sort -nr | head -20
This lists the 20 files with the most commits over the past year. These are your churn hotspots, the files where active development is concentrated.
Breaking down the flags:
--format=format:suppresses all commit metadata (hash, author, message). You only want the file paths.--name-onlyoutputs the filenames affected by each commit, one per line.--since="1 year ago"limits the search to the last 12 months. You can adjust this window depending on the project's age.
The piped commands do the counting: sort groups identical filenames together, uniq -c counts consecutive duplicates, sort -nr sorts by count in descending order, and head -20 caps the output at 20 results.
Files at the top of this list are where the action is. If a configuration file ranks high, the project probably went through infrastructure changes recently. If a single business logic file dominates, that file is doing too much and is likely a candidate for refactoring.
2. See Who Built the Project
git shortlog -sn --no-merges --since="6 months ago"
This ranks every contributor by commit count over the last six months.
Breaking down the flags:
-s(summary) collapses each author's commits into a single count instead of listing every message.-nsorts by number of commits, highest first.--no-mergesexcludes merge commits, which would inflate the count for whoever handles pull request merges.--since="6 months ago"focuses on recent activity, not the full project lifetime.
What you are looking for: if one person accounts for 60% or more of the commits, that is your bus factor. The project depends heavily on one contributor. If that person is still active, they are your best source of context. If they left six months ago and nobody picked up the pace, you are looking at a maintenance risk.
Drop the --since flag to see the all-time picture and compare it with the recent window. A healthy project shows knowledge spreading across the team over time, not concentrating.
3. Find Where Bugs Cluster
git log -i -E --grep="fix|bug|broken" --name-only --format='' | sort | uniq -c | sort -nr | head -20
This filters the commit history down to messages containing "fix", "bug", or "broken", then counts which files appear most often in those commits.
Breaking down the flags:
-imakes the grep case-insensitive, catching "Fix", "FIX", and "fix" alike.-Eenables extended regular expressions so the|(or) operator works without escaping.--grep="fix|bug|broken"filters commits whose messages match any of those terms.--name-onlyand--format=''work the same way as in command 1, stripping everything except filenames.
The real insight comes from cross-referencing this list with the churn hotspots from command 1. Files that appear in both lists are your highest-risk code. They keep changing and they keep breaking. That pattern usually means the code was patched repeatedly without anyone addressing the root cause.
You can extend the grep pattern to include terms your team uses. If the project follows Conventional Commits, try --grep="^fix" to match only commits typed as fixes.
4. Check if the Project Is Accelerating or Dying
git log --format='%ad' --date=format:'%Y-%m' | sort | uniq -c
This gives you commit counts grouped by month for the entire history of the repository.
Breaking down the flags:
--format='%ad'outputs only the author date for each commit. The%adplaceholder is one of many format specifiersgit logsupports.--date=format:'%Y-%m'formats the date as year-month (e.g.,2026-04), which groups commits by calendar month when piped throughsort | uniq -c.
The output is a simple time series. Look for trends. A steady or growing count means active development. A sharp drop-off could mean the project shipped, the team shrank, or people moved on. A sudden spike often lines up with a deadline or a major rewrite.
This also helps calibrate your expectations. If the project averages 200 commits a month, a 20-file change is routine. If it averages 10 commits a month, that same change is a major event.
5. Measure How Often the Team Is Firefighting
git log --oneline --since="1 year ago" | grep -iE 'revert|hotfix|emergency|rollback'
This searches the last year of commit messages for signs of reactive work: reverts, hotfixes, emergency patches, and rollbacks.
Breaking down the flags:
--onelinecondenses each commit to a single line (short hash + message), making it easy to scan and pipe.--since="1 year ago"limits the window.
The grep -iE part is not a git flag but a standard Unix filter:
-imakes the search case-insensitive.-Eenables extended regex for the|operator.'revert|hotfix|emergency|rollback'matches any of those keywords.
If you see reverts every couple of weeks, the team does not trust its deploy process. That points to unreliable tests, a missing staging environment, or a broken deploy pipeline. Occasional reverts are normal and healthy. Frequent reverts are a process problem, not a code problem.
The Common Patterns
These five commands share a few building blocks worth understanding on their own.
git log format placeholders
The --format flag (also written --pretty=format:) accepts placeholders for any piece of commit metadata:
| Placeholder | Output |
|---|---|
%H | Full commit hash |
%h | Short commit hash |
%an | Author name |
%ae | Author email |
%ad | Author date (respects --date) |
%s | Subject (first line of message) |
Setting --format=format: or --format='' outputs nothing, which is useful when you combine it with --name-only and only want file paths.
git log filtering
You can narrow git log output with several filters:
--sinceand--untilfor date ranges--author="name"for a specific contributor--grep="pattern"for commit message text-- path/to/filefor changes to a specific file or directory
These filters compose. You can combine --author, --since, and --grep in a single command to answer very specific questions like "what did Alice fix in the payments module last quarter?"
git log --author="Alice" --grep="fix" --since="3 months ago" -- src/payments/
git shortlog vs git log
git shortlog is a convenience wrapper around git log designed for summarizing contributions. The -sn flags are the most common combination, giving you a sorted leaderboard of commit counts. Without -s, it groups commit messages under each author's name, which is useful for generating changelogs.
The Unix pipeline
Every command above pipes git output through standard Unix tools:
sortorders lines alphabetically (needed beforeuniq)uniq -ccollapses adjacent identical lines and prefixes each with a countsort -nrsorts numerically (-n) in reverse order (-r)head -20takes the top 20 results
This pattern of sort | uniq -c | sort -nr | head -N is a general-purpose frequency counter. It works on any line-oriented output, not just git.
Putting It All Together
Run all five commands on a new codebase and you will know:
- Which files to read first (the ones that change the most)
- Who to ask questions (the top contributors)
- Which areas are fragile (bug hotspots that overlap with churn)
- Whether the project is healthy (commit velocity trends)
- Whether the team ships with confidence (revert/hotfix frequency)
That is more context than you would get from an hour of browsing random files. Start with the high-churn, high-bug files from commands 1 and 3. Read them with the knowledge of who wrote them (command 2) and how the project's pace has changed (command 4). The commit history is the best documentation most projects have.
Related
- Writing Good Commit Messages covers how to write the kind of commit messages that make these commands useful in the first place.
- Git Hooks shows how to automate quality checks locally before commits reach the shared history.
- Reviewing Pull Requests takes the next step after orienting yourself in a codebase.