Git Query Language: Unleashing the Power of SQL in Your .git Files! šŖ
How I Developed a SQL-Inspired Language for Executing Queries on Local Git Repositories
How I Created a SQL-like Language to Run Queries on Local Git Repositories’ (Original English text)
Greetings, dear readers! As a software engineer with an affinity for low-level programming, compilers, and tool development, I recently embarked on a quest to learn the Rust programming language. But I didnāt stop there! I decided to take my newfound knowledge and build something remarkableāa Git client that would revolutionize simplicity and productivity. And boy, did it turn out to be quite the adventure! š
Now, we all love the analysis page on GitHub, right? The way it tells you how many commits each developer has made and how many lines theyāve inserted or deleted is just fascinating. But what if I told you that you could take this analysis a step further? What if you could query your Git repository for specific information and organize it in unique and exciting ways? Sounds intriguing, doesnāt it?
This got me thinking, āWhy settle for a custom sorting option when I can make it dynamic and SQL-like?ā And so, a brilliant idea was born. I imagined being able to run queries on my local .git files, just like you would with a traditional SQL database. Picture this: executing a query that looks like this on your local git repositories:
SELECT name, COUNT(name) AS commit_num
FROM commits
GROUP BY name
ORDER BY commit_num DESC
LIMIT 10
And guess what? I brought this idea to life with a project I call GQL (Git Query Language). In this article, Iām going to take you through the thrilling journey of designing and implementing the extraordinary functionality of GQL. So buckle up and letās dive right in! š
From SQLite Struggles to Creating a Query Language from Scratch
At the start of this endeavor, I considered using SQLite for the task. However, I encountered a few roadblocks. Customizing the syntax and avoiding the need to store .git files in a separate database were crucial to me. I wanted everything to run seamlessly on the fly. It was at this moment that inspiration struckāif I had experience creating compilers before, why not take a leap of faith and build a SQL-like language from scratch? And so, the exciting journey began! ššØ
Abstract Syntax Trees and Type Checking: The Building Blocks of GQL
To kick things off, I decided to focus on supporting the SELECT
command initially, without diving straight into the depths of advanced features like aggregations and joining. My plan involved parsing the query into an Abstract Syntax Tree (AST), which would make it a breeze to validate and evaluate (cue type checking and helpful error messages). Then, armed with this refined data structure, I could seamlessly apply the query to my .git files. Sounds like a foolproof plan, doesnāt it? š
Choosing the Right Data Structure: In the world of compilers, ASTs are the go-to data structure. With their unparalleled flexibility, they make it a breeze to traverse and compose nodes within nodes. For GQL, I only needed the essential information for subsequent steps (hence the name, Abstract). It was a match made in heaven! š
Validation and Type Checking: The path to a rock-solid AST led me to the most critical aspectātype checking. It was imperative to ensure that each value used in the query was valid and in its rightful place. For instance, what happens if someone attempts to multiply text by more text? Clearly, that wonāt fly! By implementing robust type checks, I could catch these potential mishaps and provide informative error messages to guide users in the right direction. Itās all about making our usersā lives easier! šÆ
Creating Schema Representations and Error Detection: In addition to verifying operators and their respective operands, I had to ensure that each identifier was appropriately used. Whether it was a table, field, alias, or function name, everything needed to be defined in the right places. No room for undefined definitions! Furthermore, if our beloved branches table only had a couple of fields, there was no way I would let that slide. I set up a comprehensive table that represented all tables and fields, ensuring meticulous type checking. If things didnāt add up, I was there to report an error with utmost clarity. In GQL, we believe in excellence! š
The Astounding Evaluation Process: Armed with our rock-solid AST, it was time to dive into evaluation. We traversed the syntax tree with precision, evaluating each node along the way. When all was said and done, weād have ourselves a beautifully crafted listāa true culmination of hard work and intelligent querying. Letās take a moment to walk through the evaluation process step by step, using an example query:
SELECT *
FROM branches
WHERE name LIKE "%/main"
ORDER BY commit_count
LIMIT 5
With this query in mind, letās explore how the AST representation looks. Brace yourselves, folks:
AbstractSyntaxTree {
Select(*, "branches")
Where(Like(name, "%/main"))
OrderBy(commit_count)
Limit(5)
}
Itās time to traverse and evaluate each nodeāthe order of operations is of utmost importance here. We canāt jump the gun or miss a beat. Just like SQL, we need to follow the same sequence to achieve the desired result. For instance, the WHERE
statement must be executed before GROUP BY
, and HAVING
must come later. Luckily, our example is all set to go! Letās see what each statement does:
Select(*, "branches")
: We select all the fields from the table named ābranchesā and push them into a list calledobjects
. You might be wondering, āBut how does one select directly from a local repository?ā Well, my friends, all the essential information about commits, branches, tags, and more is stored by Git in files within the.git
folder of each repository. To make magic happen, I employed the mightylibgit2
libraryāa pure C implementation of Git core methods. Accompanied by the powerfulgit2
crate, we can effortlessly extract branch information like so:
let local_branches = repo.branches(Some(BranchType::Local));
let remote_branches = repo.branches(Some(BranchType::Remote));
let local_and_remote_branches = repository.branches(None);
And just like that, we have a delightful list of branches ready to play with!
-
Where(Like(name, "%/main"))
: Time to filter theobjects
list and get rid of anything that doesnāt align with our conditions. In this case, weāre only interested in items ending with ā/mainā. Goodbye, distractions! š -
OrderBy(commit_count)
: Ah, the delights of sorting! Weāll arrange theobjects
list based on the values of thecommit_count
field. An orderly world brings joy to our hearts, doesnāt it? š -
Limit(5)
: Life is sometimes about prioritizing, and our query is no exception. We triumphantly choose the first five items and bid farewell to the rest. Goodbye, excess baggage! āļø
Lo and behold, we arrive at our destinationāa valid result that looks something like this:
Sample Queries:
SELECT 1
SELECT 1 + 2
SELECT LEN("Git Query Language")
SELECT "One" IN ("One", "Two", "Three")
SELECT "Git Query Language" LIKE "%Query%"
SELECT commit_count FROM branches WHERE commit_count BETWEEN 0 .. 10
SELECT * FROM refs WHERE type = "branch"
SELECT * FROM refs ORDER BY type
SELECT * FROM commits
SELECT name, email FROM commits
SELECT name, email FROM commits ORDER BY name DESC
SELECT name, email FROM commits WHERE name LIKE "%gmail%" ORDER BY name
SELECT * FROM commits WHERE LOWER(name) = "amrdeveloper"
SELECT name FROM commits GROUP BY name
SELECT name FROM commits GROUP BY name HAVING name = "AmrDeveloper"
SELECT * FROM branches
SELECT * FROM branches WHERE is_head = true
SELECT name, LEN(name) FROM branches
SELECT * FROM tags
SELECT * FROM tags OFFSET 1 LIMIT 1
Multi-Repository Support: Analyzing on a Grand Scale! š
The joy of developing GQL didnāt stop at its initial release. Soon, I received fantastic feedback from the community, along with some feature requests. One suggestion, in particular, caught my attentionāsupporting multiple repositories and allowing filtering by repository path. What a marvelous idea! Not only could I perform analyses on multiple projects, but I could also do so concurrently, harnessing the power of multiple threads. Implementation seemed within reach, so I delved deep! š”
Evaluation for Multiple Repositories: With the validation step of our AST complete, it was time to tackle evaluationābut with a twist. Instead of evaluating the query once, we would evaluate it for each repository and merge the results. We call it the recipe for success.
Incorporating Repository Path Filtering: But wait, thereās more! How about filtering queries based on the repository path? What seemed challenging initially turned out to be a piece of cake. I introduced a new field, repository_path
, to represent the repositoryās local path in the schema. Consequently, all tables welcomed this new addition as well. And with that, GQL gained the ability to run queries like:
SELECT *
FROM branches
WHERE repository_path LIKE "%GQL"
Voila! There you have it! š
Join the GQL Adventure Today! š
Well, dear readers, that brings us to the end of this exhilarating journey with GQL. If you find yourself dazzled by the possibilities it presents, donāt forget to show it some love on GitHub by giving it a well-deserved āļø star. Visit the GQL website for a thrilling guide on how to download and use the project on various operating systems. Remember, this is just the beginningāthere are countless opportunities to contribute, brainstorm ideas, and report bugs. Join the GQL community, and letās shape the future of Git querying together! šš»
Leave a Reply