../

Git and GitHub - Full Course

Notes from this https://www.youtube.com/watch?v=rH3zE7VlIMs

Type of git commands

If we read the first commit of git, we see the following excerpt

	GIT - the stupid content tracker

"git" can mean anything, depending on your mood.

 - random three-letter combination that is pronounceable, and not
   actually used by any common UNIX command.  The fact that it is a
   mispronounciation of "get" may or may not be relevant.
 - stupid. contemptible and despicable. simple. Take your pick from the
   dictionary of slang.
 - "global information tracker": you're in a good mood, and it actually
   works for you. Angels sing, and a light suddenly fills the room. 
 - "goddamn idiotic truckload of sh*t": when it breaks

It is from the last bullet point, that we derive the naming convention for git commands:

  • Porcelain
    • This is the outer high level polished stuff
  • Plumbing
    • This is the low level nitty gritty stuff

We will mostly work with the Porcelain commands.

git init

  • git init creates an empty .git folder in current working directory
.git/
├── config
├── description
├── HEAD
├── hooks/
│   ├── applypatch-msg.sample*
│   ├── commit-msg.sample*
│   ├── fsmonitor-watchman.sample*
│   ├── post-update.sample*
│   ├── pre-applypatch.sample*
│   ├── pre-commit.sample*
│   ├── pre-merge-commit.sample*
│   ├── pre-push.sample*
│   ├── pre-rebase.sample*
│   ├── pre-receive.sample*
│   ├── prepare-commit-msg.sample*
│   ├── push-to-checkout.sample*
│   ├── sendemail-validate.sample*
│   └── update.sample*
├── info/
│   └── exclude
├── objects/
│   ├── info/
│   └── pack/
└── refs/
    ├── heads/
    └── tags/

9 directories, 18 files
  • We see that this directory is pretty empty.
  • We have a bunch of sample hook files.
  • objects/ and refs/ are all empty because we don’t have any commits

git hashing mechanism

  • git by default uses sha1
  • But it doesn’t just sha1 the file contents
  • It first appends the type , which is blob for a file, then the size followed by null terminator \0 and then the file contents
    • For example an empty file called foo.txt has the hash e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    • But when you run sha1 foo.txt you will get da39a3ee5e6b4b0d3255bfef95601890afd80709
    • Running echo -n "blob 0\0" | sha1 - will return e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 which is the hash id used by git for foo.txt
    • Notice that the calculation of the hash never involved the filename. If we create a new file empty called bar.txt it will also hash to e69de29bb2d1d6434b8b29ae775ad8c2e48c5391. git only uses the contents of the file to generate the object id

index file

  • We saw that git only uses the contents of the file to perform the hash
  • What happens if we rename the file?
  • Consider the following scenario:
touch foo.txt
git add foo.txt
mv foo.txt bar.txt
  • git status will output
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   foo.txt

Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	deleted:    foo.txt

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	bar.txt
  • Our .git/objects folder still looks like this
.git/objects
├── e6
│   └── 9de29bb2d1d6434b8b29ae775ad8c2e48c5391
├── info
└── pack

4 directories, 1 file
  • How did git know that we changed the file name?
  • It used the .git/index file
  • When we say that we are staging a file, we are technically adding it to the index file
  • When we moved foo.txt to bar.txt, git status did the following
    • First it outputs all the staged changes
    • Second it realizes that one of the staged files is missing and hence showed that as a change to stage
    • Third it saw that bar.txt is not in the index file and said that it is untracked
  • See 20251229T142012-git_index_file_format for more details

File status

git at its core is a file tracker. It tracks how a file evolves over its lifetime. A file can be in three states

  • untracked
    • .git/index doesn’t have the file and there are no corresponding objects
  • staged
    • git/index has the file but it is not committed yet
  • commited
    • .git/index has the file and there is commit object and a blob object corresponding to that file

commit

  • Refer to 20240130191938-git-moc for a primer of how a COMMIT is designed under the hood
  • If we use git cat-file -p 3c82b84b3db4de6139871ef7c49609d26b410d13 we get
tree 09a13b897d3d0f528d487c704da540cb952d7606
author Deebakkarthi C R <dbk@deebakkarthi.com> 1767037689 -0500
committer Deebakkarthi C R <dbk@deebakkarthi.com> 1767037689 -0500

Add foo.txt
  • This is literally what a commit is
    • It has an ID to a tree object
    • An Author
    • Committer
    • COMMIT_MSG
  • What is a tree object? We can run git cat-file -p 09a13b897d3d0f528d487c704da540cb952d7606 to find out
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	foo.txt
  • tree represents the snapshot of the root directory

Why store hashes like 09/a13b897d3d0f528d487c704da540cb952d7606

  • You may notice that instead of storing object directly under .git/objects, git stores them in a directory prefixed with the first two characters of the hash. Why is that?
  • It is due to a phenomenon known as inode busting. You have limited amount of inodes on your system. You can find it using df -i.