Using Overleaf with Git Submodules#

Overleaf is useful for editing manuscripts on the cloud, sometimes synchronously with collaborators.

However, editing synchronously in real-time with collaborators on an Overleaf project is exceedingly rare (for me, at least). One may also find compiling on Overleaf to be slower than working locally on a LaTeX editor. In any case, one may still prefer the manuscript as an Overleaf project to be embedded within a GitHub repo that contains other scripts (e.g., Python/R/Stata scripts).

Importing an existing GitHub repo from GitHub#

The easiest way for relatively small and trivial projects is to use the existing Overleaf function to import an existing GitHub repo as an Overleaf project by clicking on the New Project button.

Importing a GitHub repo as an Overleaf Project

However, this does not always work. One of the reasons it won’t work is if the Github repo (as the superproject) is large. Overleaf won’t support these repos for import.

GitHub repos larger than 50mb are not supported by Overleaf

The alternative is to use the Overleaf project as a standalone manuscript project and as the submodule in a superproject containing all other project assets (e.g., data files).


Using Git Submodules for Overleaf Manuscripts Pt. 0#

First, the Overleaf manuscript project should be linked to a GitHub repo. If it’s already linked, skip forward to pt. 1.

If it is not already a Github repo, link it by selecting Menu button and then the GitHub button, which gets you to the GitHub sync modal.

Syncing GitHub repo

The Overleaf project should now reside in a GitHub repo like user/manuscript on GitHub (https://github.com/user/manuscript). (* You can obviously name it anything else.)

Now that the Overleaf manuscript is linked to a GitHub repo, we can add it as a submodule to the existing project.


Using Git Submodules for Overleaf Manuscripts Pt. 1#

CD to the remote GitHub repo folder containing some existing assets of the project (e.g., a folder with data files data/, a readme documentation README.md, and a folder with coding scripts scripts/).

$ cd my_project
$ ls 
data/ README.md scripts/ 

Add the submodule from https://github.com/user/manuscript (If the GitHub repo does not exist yet, refer to this). You can see the new changes in your local repo when you check the git status.

$ git submodule add https://github.com/user/manuscript.git
$ git status
On branch main
Your branch is up-to-date with 'origin/main'.

Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

  new file:   .gitmodules
  new file:   manuscript

$ ls
data/ manuscript/ README.md scripts/ 

Git submodules added the manuscript submodule into the root directory of my_project. There is also a new .gitmodules file. This is a configuration file so that Git knows how to map from the local directory to the manuscript submodule on GitHub.

$ cat .gitmodules
[submodule "manuscript"]
        path = manuscript
        url = https://github.com/user/manuscript.git

Push the submodule to the project’s Git repo.

$ git commit -m "Add manuscript submodule"
[main fbace23] Add manuscript submodule
 2 files changed, 4 insertions(+)
 create mode 100644 .gitmodules
 create mode 160000 manuscript

$ git push origin main

The 160000 mode just means that the commit is as a directory entry rather than a subdirector or a file.

Pull upstream changes from the submodule remote, if changes exist. This checks for new work in the submodule and prevents future merge conflicts. Just cd into the submodule and git pull as usual.

$ cd manuscript
$ git pull origin main

To change locally and push to submodule repo from the local repo, just cd to the submodule folder and do the usual add-commit-push to the submodule’s remote repo. (changes are sometimes are collected in detached heads.)

$ cd manuscript
$ git add ***
$ git commit -m "Some changes from local"
$ git push origin main

Finally, we may want to push changes in the submodule to the project repo. That’s because changes have been pushed to the manuscript repo but not to the overall project repo. This can be seen by going back to the project root directory. A git status will show that the changes in the submodule folder as modified but not pushed to the project repo.

$ cd ../
$ ls
data/ manuscript/ README.md scripts/ 

$ git status
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   manuscript (new commits)

The usual add-commit-push will resolve this.


Gotcha with detached Heads#

Sometimes the changes inside the submodule folder might be collected in a detached HEAD. To confirm this, do a git branch.

$ git branch 
  * (HEAD detached at 660da63) 
  * main

So we need to make a branch, switch back to main (or master) and then merge so that the new changes are in main. First, make a temporary tmp branch for the detached head. Then checkout main. Merge commits from previously detached head into main. Delete the temporary branch and go back to the add-commit-push.

$ git branch tmp
$ git checkout main
$ git merge tmp
$ git branch -d tmp
$ git branch
  * main

Cloning a Git repo with a submodule#

Start by cloning a git repo as usual.

$ git clone https://github.com/user/project.git
Cloning into 'project'...
...
$ ls
data/ manuscript/ README.md scripts/

But cding into the manuscript submodule folder reveals that it’s still empty.

$ cd manuscript
$ ls
.

So we need to init the local config file, and then do a git submodule update to fetch all the assets from that project and check out the appropriate commits listed in the superproject.

$ git submodule init
Submodule 'manuscript' (https://github.com/user/manuscript.git) registered for path './'

$ git submodule update
Cloning into 'D:/project/manuscript'...
Submodule path './': checked out '405998645a301ee47ab43125ec01fd2e7a48671c'

$ ls
figs/  ms/  tabs/

Resources#