Enhancing Git Documentation: A Data Model and Reader-Driven Improvements
Introduction: A Fresh Approach to Git Documentation
This past fall, I decided to invest time in improving Git's documentation. While I've often written blog posts or zines to clarify confusing aspects of open-source projects, I wondered if I could directly contribute to the official docs. With help from Marie, we made several enhancements. This article outlines our efforts, focusing on a new data model document and evidence-based updates to key manual pages.
A Comprehensive Data Model for Git
During our documentation work, we noticed that Git frequently uses the terms “object,” “reference,” and “index,” but lacks a clear explanation of how these relate to core concepts like “commit” and “branch.” To address this, we wrote a new “data model” document. It's currently available for preview, and after the next release it will likely appear on the official Git website.
Why This Matters
Understanding Git's data organization—how commits and branches are stored—has always helped me reason about Git's behavior. The new document provides a concise (1,600 words!) yet accurate overview. Achieving accuracy was challenging: I knew the basics, but during the review process I learned new details, such as how merge conflicts are stored in the staging area. The final version reflects those insights.
Updates to git push, git pull, and More
In addition to the data model, I worked on improving the introductions to several core manual pages. Early on, I realized that simply rewriting them based on my own judgment wouldn't convince maintainers that the changes were superior. A common problem in open-source documentation discussions is that two experts argue about clarity, but experts are notoriously poor judges of what non-experts find confusing. I needed an evidence-based approach.
Gathering Feedback from Test Readers
I turned to Mastodon and asked volunteers to read the current documentation and note confusing parts or questions they had. About 80 test readers responded, providing invaluable feedback. They highlighted:
- Unclear terminology – e.g., “What is a pathspec? What does 'reference' mean? Does 'upstream' have a specific meaning in Git?”
- Confusing sentences – specific phrasings that led to misinterpretation.
- Suggestions for additions – “I use feature X all the time; it should be explained here.”
This reader-driven approach allowed me to identify real pain points rather than relying on assumptions. The feedback was incorporated into revisions of the man pages for git push, git pull, and other commands.
Conclusion: Open Source Documentation That Works
Our project demonstrates that combining a clear conceptual model (the data model) with evidence-based revisions (via test readers) can significantly improve official documentation. I hope this encourages others to contribute to Git's docs—or any open-source project—by focusing on what users actually find confusing. The data model and test reader experiments are templates for future efforts.