100 commits a day does not prove a strong engineer.

As a recruiter, raw commit volume does not tell you whether the work is genuine. Use this ML-based tool to inspect commit quality, file mix, and implementation depth.

Looks past activity spamDiscounts tiny diffs, docs-only churn, generated files, and repetitive noise.

Checks real engineering signalRewards source-heavy work, test coverage, multi-file implementation, and stronger commit structure.

Turns history into a readoutSummarizes meaningful ratio, padding risk, strongest signal, weakest signal, and top commits.

∞repo-aware scoring

A→Freport grade

40commits sampled

Grade

Meaningful

Impact / commit

Padding risk

Report

No analysis yet

Connect GitHub and select repositories, or analyze a single repo directly.

Commit mix

No data yet.

File weight

No data yet.

Top commits

High-signal commits appear here after analysis.

Backend

How the scoring pipeline works

A GitGrade product already exists, but this project is my own attempt to rebuild the idea from scratch. Instead of treating commit count as proof of skill, this version analyzes commit structure, labels likely signal with ML, and blends that with rule-based impact scoring.

Training corpus

2,234

labeled commits in the current merged training set, combining open-source examples with local and user-specific history.

Manual review

332

manually reviewed label overrides currently available to correct weak supervision and sharpen borderline classifications.

User history

1,074

user-history examples tracked separately for error analysis, review queues, and future personalization work.

Feature space

engineered features extracted from commit structure, file mix, message patterns, change size, and source-vs-non-source ratios.

Pipeline steps

Ingest

Load recent commits from GitHub App-authorized repositories and normalize file-level change statistics.

Feature engineering

Build 47 model features including file ratios, message-type cues, tiny-diff flags, and source/test pair signals.

ML prediction

Run a Random Forest classifier with 200 trees and max depth 8 to predict noise, low, medium, or high value.

Hybrid scoring

Blend deterministic impact heuristics, label weights, and model confidence into one final weighted commit score.

Scoring blend

Deterministic impact55%

Predicted label weight35%

Model confidence10%

The deterministic pass rewards source-heavy, multi-file, test-backed implementation work and discounts tiny diffs, docs-only edits, generated files, and non-code churn before the classifier adjusts the final result.