Training corpus
labeled commits in the current merged training set, combining open-source examples with local and user-specific history.
Recruiter View
As a recruiter, raw commit volume does not tell you whether the work is genuine. Use this ML-based tool to inspect commit quality, file mix, and implementation depth.
Report
Connect GitHub and select repositories, or analyze a single repo directly.
Commit mix
No data yet.
File weight
No data yet.
Top commits
High-signal commits appear here after analysis.
Backend
A GitGrade product already exists, but this project is my own attempt to rebuild the idea from scratch. Instead of treating commit count as proof of skill, this version analyzes commit structure, labels likely signal with ML, and blends that with rule-based impact scoring.
Training corpus
labeled commits in the current merged training set, combining open-source examples with local and user-specific history.
Manual review
manually reviewed label overrides currently available to correct weak supervision and sharpen borderline classifications.
User history
user-history examples tracked separately for error analysis, review queues, and future personalization work.
Feature space
engineered features extracted from commit structure, file mix, message patterns, change size, and source-vs-non-source ratios.
Pipeline steps
Load recent commits from GitHub App-authorized repositories and normalize file-level change statistics.
Build 47 model features including file ratios, message-type cues, tiny-diff flags, and source/test pair signals.
Run a Random Forest classifier with 200 trees and max depth 8 to predict noise, low, medium, or high value.
Blend deterministic impact heuristics, label weights, and model confidence into one final weighted commit score.
Scoring blend
The deterministic pass rewards source-heavy, multi-file, test-backed implementation work and discounts tiny diffs, docs-only edits, generated files, and non-code churn before the classifier adjusts the final result.