Git Query Language: Unleashing the Power of SQL in Your .git Files! 💪

How I Developed a SQL-Inspired Language for Executing Queries on Local Git Repositories

How I Created a SQL-like Language to Run Queries on Local Git Repositories’ (Original English text)

Greetings, dear readers! As a software engineer with an affinity for low-level programming, compilers, and tool development, I recently embarked on a quest to learn the Rust programming language. But I didn’t stop there! I decided to take my newfound knowledge and build something remarkable—a Git client that would revolutionize simplicity and productivity. And boy, did it turn out to be quite the adventure! 🚀

Now, we all love the analysis page on GitHub, right? The way it tells you how many commits each developer has made and how many lines they’ve inserted or deleted is just fascinating. But what if I told you that you could take this analysis a step further? What if you could query your Git repository for specific information and organize it in unique and exciting ways? Sounds intriguing, doesn’t it?

This got me thinking, “Why settle for a custom sorting option when I can make it dynamic and SQL-like?” And so, a brilliant idea was born. I imagined being able to run queries on my local .git files, just like you would with a traditional SQL database. Picture this: executing a query that looks like this on your local git repositories:

SELECT name, COUNT(name) AS commit_num 
FROM commits 
GROUP BY name 
ORDER BY commit_num DESC 

And guess what? I brought this idea to life with a project I call GQL (Git Query Language). In this article, I’m going to take you through the thrilling journey of designing and implementing the extraordinary functionality of GQL. So buckle up and let’s dive right in! 🌊

From SQLite Struggles to Creating a Query Language from Scratch

At the start of this endeavor, I considered using SQLite for the task. However, I encountered a few roadblocks. Customizing the syntax and avoiding the need to store .git files in a separate database were crucial to me. I wanted everything to run seamlessly on the fly. It was at this moment that inspiration struck—if I had experience creating compilers before, why not take a leap of faith and build a SQL-like language from scratch? And so, the exciting journey began! 🚗💨

Abstract Syntax Trees and Type Checking: The Building Blocks of GQL

To kick things off, I decided to focus on supporting the SELECT command initially, without diving straight into the depths of advanced features like aggregations and joining. My plan involved parsing the query into an Abstract Syntax Tree (AST), which would make it a breeze to validate and evaluate (cue type checking and helpful error messages). Then, armed with this refined data structure, I could seamlessly apply the query to my .git files. Sounds like a foolproof plan, doesn’t it? 😉

Choosing the Right Data Structure: In the world of compilers, ASTs are the go-to data structure. With their unparalleled flexibility, they make it a breeze to traverse and compose nodes within nodes. For GQL, I only needed the essential information for subsequent steps (hence the name, Abstract). It was a match made in heaven! 🌈

Validation and Type Checking: The path to a rock-solid AST led me to the most critical aspect—type checking. It was imperative to ensure that each value used in the query was valid and in its rightful place. For instance, what happens if someone attempts to multiply text by more text? Clearly, that won’t fly! By implementing robust type checks, I could catch these potential mishaps and provide informative error messages to guide users in the right direction. It’s all about making our users’ lives easier! 🎯

Creating Schema Representations and Error Detection: In addition to verifying operators and their respective operands, I had to ensure that each identifier was appropriately used. Whether it was a table, field, alias, or function name, everything needed to be defined in the right places. No room for undefined definitions! Furthermore, if our beloved branches table only had a couple of fields, there was no way I would let that slide. I set up a comprehensive table that represented all tables and fields, ensuring meticulous type checking. If things didn’t add up, I was there to report an error with utmost clarity. In GQL, we believe in excellence! 🔍

The Astounding Evaluation Process: Armed with our rock-solid AST, it was time to dive into evaluation. We traversed the syntax tree with precision, evaluating each node along the way. When all was said and done, we’d have ourselves a beautifully crafted list—a true culmination of hard work and intelligent querying. Let’s take a moment to walk through the evaluation process step by step, using an example query:

FROM branches 
WHERE name LIKE "%/main" 
ORDER BY commit_count 

With this query in mind, let’s explore how the AST representation looks. Brace yourselves, folks:

AbstractSyntaxTree {  
  Select(*, "branches")   
  Where(Like(name, "%/main"))  

It’s time to traverse and evaluate each node—the order of operations is of utmost importance here. We can’t jump the gun or miss a beat. Just like SQL, we need to follow the same sequence to achieve the desired result. For instance, the WHERE statement must be executed before GROUP BY, and HAVING must come later. Luckily, our example is all set to go! Let’s see what each statement does:

  • Select(*, "branches"): We select all the fields from the table named “branches” and push them into a list called objects. You might be wondering, “But how does one select directly from a local repository?” Well, my friends, all the essential information about commits, branches, tags, and more is stored by Git in files within the .git folder of each repository. To make magic happen, I employed the mighty libgit2 library—a pure C implementation of Git core methods. Accompanied by the powerful git2 crate, we can effortlessly extract branch information like so:
let local_branches = repo.branches(Some(BranchType::Local));
let remote_branches = repo.branches(Some(BranchType::Remote));
let local_and_remote_branches = repository.branches(None);

And just like that, we have a delightful list of branches ready to play with!

  • Where(Like(name, "%/main")): Time to filter the objects list and get rid of anything that doesn’t align with our conditions. In this case, we’re only interested in items ending with “/main”. Goodbye, distractions! 👋

  • OrderBy(commit_count): Ah, the delights of sorting! We’ll arrange the objects list based on the values of the commit_count field. An orderly world brings joy to our hearts, doesn’t it? 😌

  • Limit(5): Life is sometimes about prioritizing, and our query is no exception. We triumphantly choose the first five items and bid farewell to the rest. Goodbye, excess baggage! ✈️

Lo and behold, we arrive at our destination—a valid result that looks something like this:


Sample Queries:

  • SELECT 1
  • SELECT 1 + 2
  • SELECT LEN("Git Query Language")
  • SELECT "One" IN ("One", "Two", "Three")
  • SELECT "Git Query Language" LIKE "%Query%"
  • SELECT commit_count FROM branches WHERE commit_count BETWEEN 0 .. 10
  • SELECT * FROM refs WHERE type = "branch"
  • SELECT * FROM refs ORDER BY type
  • SELECT * FROM commits
  • SELECT name, email FROM commits
  • SELECT name, email FROM commits ORDER BY name DESC
  • SELECT name, email FROM commits WHERE name LIKE "%gmail%" ORDER BY name
  • SELECT * FROM commits WHERE LOWER(name) = "amrdeveloper"
  • SELECT name FROM commits GROUP BY name
  • SELECT name FROM commits GROUP BY name HAVING name = "AmrDeveloper"
  • SELECT * FROM branches
  • SELECT * FROM branches WHERE is_head = true
  • SELECT name, LEN(name) FROM branches
  • SELECT * FROM tags

Multi-Repository Support: Analyzing on a Grand Scale! 🌍

The joy of developing GQL didn’t stop at its initial release. Soon, I received fantastic feedback from the community, along with some feature requests. One suggestion, in particular, caught my attention—supporting multiple repositories and allowing filtering by repository path. What a marvelous idea! Not only could I perform analyses on multiple projects, but I could also do so concurrently, harnessing the power of multiple threads. Implementation seemed within reach, so I delved deep! 💡

Evaluation for Multiple Repositories: With the validation step of our AST complete, it was time to tackle evaluation—but with a twist. Instead of evaluating the query once, we would evaluate it for each repository and merge the results. We call it the recipe for success.

Incorporating Repository Path Filtering: But wait, there’s more! How about filtering queries based on the repository path? What seemed challenging initially turned out to be a piece of cake. I introduced a new field, repository_path, to represent the repository’s local path in the schema. Consequently, all tables welcomed this new addition as well. And with that, GQL gained the ability to run queries like:

FROM branches 
WHERE repository_path LIKE "%GQL"

Voila! There you have it! 😄

Join the GQL Adventure Today! 🎉

Well, dear readers, that brings us to the end of this exhilarating journey with GQL. If you find yourself dazzled by the possibilities it presents, don’t forget to show it some love on GitHub by giving it a well-deserved ⭐️ star. Visit the GQL website for a thrilling guide on how to download and use the project on various operating systems. Remember, this is just the beginning—there are countless opportunities to contribute, brainstorm ideas, and report bugs. Join the GQL community, and let’s shape the future of Git querying together! 🌟💻

Leave a Reply

Your email address will not be published. Required fields are marked *