# Inspecting manuscripts in `LaTeX`
<!-- 
````{dropdown} Open dropdown
:open:

```makefile
# Makefile
.PHONY: repeated_strings
repeated_strings: ## Check for repeated words
    @echo "==> $@"
    @echo "Check for repeated words (e.g. 'the the table shows...')"
    ./inspecting/repeated-strings.sh
```
````

````{toggle} test title
:show:

```makefile
# Makefile
.PHONY: repeated_strings
repeated_strings: ## Check for repeated words
    @echo "==> $@"
    @echo "Check for repeated words (e.g. 'the the table shows...')"
    ./inspecting/repeated-strings.sh
```
````
 -->

This vignette walks through some utilities to inspect manuscripts compiled from `LaTeX` ([https://github.com/lsys/texCheckmate/](https://github.com/lsys/texCheckmate/)). Examples of inspection include retrieving text with hardcoded numbers (so that they can checked by hand), finding unreferenced labels, or finding dead links. Much of this is adapted directly from  Jonathan Dingel's project template ([https://github.com/jdingel/projecttemplate/](https://github.com/jdingel/projecttemplate/)) which includes a `Makefile` to inspect ("review") a manuscript in `LaTeX`.

<!---------------------------------------------------------------------------->
<!---------------------- Repeated strings ---------------------->
## Repeated strings

To find repeated strings (one of the most common example being "... the the ..."), use the extended `grep`

```bash
$ egrep "\b([a-zA-Z]+) \1\b" paper.tex -n
```
```text
# Example output
243:the the
```
which grabs repeated words and report the line number (`-n`). This can be adapted to more than just single words (thanks ChatGPT):
```bash
$ egrep "\b(\S+) (\S+) \1 \2\b" paper.tex -n
```
```text
# Example output
246:flag this flag this
```

A simple shell script for this would look like this:
```bash
# inspecting/repeated-strings.sh
#!/bin/bash
SRC_TEX="paper.tex"

echo "Repeated unigrams:"
egrep "\b([a-zA-Z]+) \1\b" "$SRC_TEX" -n

echo "Repeated bigrams:"
egrep "\b(\S+) (\S+) \1 \2\b" "$SRC_TEX" -n

echo "Repeated trigrams:"
egrep "\b(\S+) (\S+) (\S+) \1 \2 \3\b" "$SRC_TEX" -n
```
and the `Make` recipe would be:
```makefile
# Makefile
REPEATED_STRINGS_SRC = ./inspecting/repeated-strings.sh
.PHONY: repeated_strings
repeated_strings: ## Check for repeated words
    @echo "==> $@"
    @echo "Check for repeated words (e.g. 'the the table shows...')"
    -$(REPEATED_STRINGS_SRC)
```

```{admonition} Regex nuance
:class: note

`[a-zA-Z]` and `\S` are not the same. `\S` includes digits. So a repeated "Fig 3 Fig 3" is flagged with `\S` but not with `[a-zA-Z]`.

```

<!---------------------------------------------------------------------------->
<!---------------------- Finding duplicated labels ---------------------->
## Duplicated labels
To check for duplicated labels (`TeX` does warn for this too), e.g., having used the same `\label{tab:summary}` more than once in all the `*.tex` files, `grep` all such labels
```bash
$ grep -o '\\label{[^}]*}' paper.tex | uniq -cd
```

Unpacking the above:

* `grep -o '\\label{[^}]*}' paper.tex` finds all "\label{**}" in the source.
* `sort` sorts the labels so that repeats can be detected.
* `uniq` (assumes results have been sorted, hence `sort`) then finds duplicates. 
* `c` prefix lines by number of occurences. 
* `d` reports only duplicated lines, one for each group.

A `Make` recipe can then be defined as:
```makefile
# Makefile
.PHONY: duplicated_labels
duplicated_labels: ## Check for duplicated labels
    @echo "==> $@"
    @echo "Check for duplicated labels"
    grep -o '\\label{[^}]*}' paper.tex | sort | uniq -cd
```

<!---------------------------------------------------------------------------->
<!---------------------- Finding unreferenced labels ---------------------->
## Unreferenced labels
To find unreferenced labels in the `tex` files, use the `diff` to compare the labels defined from `\label{**}` and labels used in all the `\*ref*{**}`

A simple shell script would look like:
```bash
# unreferenced-labels.sh
#!/bin/bash
SRC_TEX="paper.tex"

diff \
  --side-by-side \
  --suppress-common-lines \
  <(grep -o --no-filename 'ref{[A-Za-z0-9:_]*}' "$SRC_TEX" | sed 's/ref//' | sort | uniq) \
  <(grep -o --no-filename 'label{[A-Za-z0-9:_]*}' "$SRC_TEX" | sed 's/label//' | sort | uniq) 
```
```text
# Example output
	> {fig:un}
	> {sec:intro}
	> {sm:results}
```
A `Make` recipe would look like:
```makefile
# Makefile
.PHONY: unreferenced_labels
unreferenced_labels: ## Check for label referencing
    @echo "==> $@"
    @echo "Check for unreferenced labels"
    ./inspecting/unreferenced-labels.sh
```


<!---------------------------------------------------------------------------->
<!---------------------- Hardcoded numbers ---------------------->
## Hardcoded numbers
One of the things to check with a manuscript is whether hardcoded numbers are correct (e.g., "...sample size of 42..."). A quick way to grab all lines of a `tex` file that contains some hardcoded number is using `grep` (again). 

For example, a simple shell script would be:
```bash
# hardcoded-numbers.sh
#!/bin/bash
do
cat paper.tex | 
  sed -n '/\\begin{document}/,$p' |  # Skip lines until \begin{document}
  sed '/\\begin{table}/,/\\end{table}/d' | # Remove lines within table environments
  sed 's/\\cite{[A-Za-z0-9:,\-]*}//g' | #Drop citations with \cite{} that may contain numbers
  grep '[0-9]'
done
```
which excludes lines before `\begin{document}`, lines within `table` environments, and lines where numbers come only from the `\cite{}` command. The last line `grep '[0-9]'` then extracts lines with numbers and prints them.
```text
# Example output
Hello world! The tex also, contains error of \emph{gramar and and spelling}. or use this text too see an few of of the
problems that LanguageTool can detecd. What do you thinks of grammar checkers? Please not that they are not perfect. Style 
ssues get a blue marker: It's 5 P.M. in the afternoon. LanguageTool 3.8 was released on Thursday, 27 June 2017.
Here,I forget to put a space after a comma. I refer to \ref{fig:thisfig1} in the text, but I do not refer to the second one. \\
const seedUrl = "http://127.0.0.1:8080/";
\email{asdfgh.zxcvbnm@123.12345678.ca}
```
using the `example.tex` from [textidote](#textidote-grammar-and-latex-checker). Amongst other numbers, I can then, for example, check that the "5 P.M." is correct or that the `LanguageTool` version is correct, and if not, make the necessary amendments.

The `hardcoded-numbers.sh` (fully adapted to omit lines from certain environments at [the end](#all-together)) can be called with a `Make` recipe:
```makefile
# Makefile
.PHONY: hardcodednumbers
hardcodednumbers: ## Find hardcoded numbers
    @echo "==> $@"
    ./inspecting/hardcoded-numbers.sh	
```

<!---------------------------------------------------------------------------->
<!---------------------- Acronyms ---------------------->
## Acronyms
A shell one-liner to find all the acronyms in the file with a tally of count usage is 
```bash
$ cat *.tex | grep -wo "[A-Z]\+\{2,10\}" | sort | uniq -c | sort -gr
```
* `grep -wo "[A-Z]\+\{2,10\}"` matches upper case words between 2 to 10 characters long
* `sort | uniq -c` sorts the results (like in [Duplicated labels](#duplicated-labels)) before creating a counter (`-c`)
* `sort -gr` then sorts the results numerically (`-g`) in descending order (`-r`)
```text
# Example output
    167 PFAS
    123 PFBS
     59 GUSTO
     30 SD
     ...
```
This can be adapted to exclude certain parts of a `tex` file (like [Harcoded numbers](#hardcoded-numbers))
```bash
# acronyms.sh
#!/bin/bash
do
cat paper.tex | 
  sed -n '/\\begin{document}/,$p' |  # Skip lines until \begin{document}
  sed '/\\begin{table}/,/\\end{table}/d' | # Remove lines within table environments
  sed 's/\\cite{[A-Za-z0-9:,\-]*}//g' | #Drop citations with \cite{} that may contain numbers
  grep -wo "[A-Z]\+\{2,10\}" | sort | uniq -c | sort -gr
done
```
which can then be called from the `Makefile`:
```makefile
# Makefile
.PHONY: acronyms
acronyms: ## Find and tally acronyms
    @echo "==> $@"
    ./inspecting/acronyms.sh
```

**Resource:**

* [https://doofussoftware.blogspot.com/2012/09/a-linux-one-liner-to-find-all-acronyms.html](https://doofussoftware.blogspot.com/2012/09/a-linux-one-liner-to-find-all-acronyms.html)


<!---------------------------------------------------------------------------->
<!---------------------- Check URLs ---------------------->
## Check URLs
Another thing to check is that the URLs referenced in the manuscripts are live and working. This can be done by combining the use of `grep` to find all URLs specified in the `\url{}` and `\href{}{}` commands before using `wget` to check that the link works:
```bash
# urls.sh
#!/bin/bash
HREF_LIST=$(grep -o 'href{[A-Za-z0-9:/\._?#=]*}' paper.tex | sed 's/.*href{//' | sed 's/}//;s/\\#/#/')
for URL in $HREF_LIST; 
do
  wget --spider --no-verbose "$URL"
done
```
to get output that looks like:
```text
2023-09-29 14:42:49 URL: https://en.wikipedia.org/wiki/ 200 OK
2023-09-29 14:42:49 URL: https://developers.google.com/maps/documentation/javascript 200 OK
wget: unable to resolve host address ‘www.this-should-not-work.com’
...
...
```

```{admonition} fake.com
:class: attention

I tried using [www.fake.com](www.fake.com) as an example of a failure but apparently that's a real website..

```

The `Make` recipe would then be:
```makefile
# Makefile
.PHONY: checkURLs
checkURLs:
    @echo "==> $@"
    ./inspecting/urls.sh
```

<!---------------------------------------------------------------------------->
<!---------------------- Counting words ---------------------->
## Wordcount

To count words, `TeXcount` comes in handy. The basic `TeXcount` command-line syntax is:

```bash
$ texcount paper.tex [opts]
```
```text
# Example output
File: paper.tex
Encoding: utf8
Words in text: 4382
Words in headers: 47
Words outside text (captions, etc.): 266
Number of headers: 20
Number of floats/tables/figures: 6
Number of math inlines: 32
Number of math displayed: 3
Subcounts:
  text+headers+captions (#headers/#floats/#inlines/#displayed)
  1+0+3 (1/0/0/0) _top_
  442+1+0 (1/0/0/0) Section: Introduction}\label{sec:intro
  ...
  ...
```

A simple `.sh` file can be made to log the wordcount. The `-nobib` option exclude bibliography and `-quiet` suppresses error messages.
```bash
# wordcount.sh
#!/bin/bash
DETAILED_OUTPUT_FILE="inspecting/wordcount-detailed.txt"
STORE=$(texcount \
  -nobib \
  -quiet \
  paper.tex
)
echo "$STORE" > "$DETAILED_OUTPUT_FILE"
```
which can then be called from a Makefile:
```makefile
# Makefile
.PHONY: wordcount
wordcount: ## Wordcount via texcount
    @echo "==> $@"
    @echo "Check word count using texcount"
    ./inspecting/wordcount.sh
```

```{admonition} Ignore text count from within tex
:class: note

Within the `tex` file(s), use `%TC:ignore` to start ignoring text and end it with `%TC:endignore`.

```

```{admonition} Commented text are ignored
:class: caution

Apparently `TeXcount` intelligently ignores text (including notes to self) that are commented out in the `*.tex` files.
```

**Resources:**

* [Texcount document](https://app.uio.no/ifi/texcount/download.php?doc=QuickReference_3_2.pdf)

<!---------------------------------------------------------------------------->
<!---------------------- Textidote ---------------------->
## Textidote: Grammar and `LaTeX` checker
A helpful tool to check for both `LaTeX` and language (using [LanguageTool](https://languagetool.org/)) is [textidote](https://github.com/sylvainhalle/textidote). `textidote` can take a pre-specified dictionary of ignores (using `--dict` `path_to_dict`) and output a HTML file with detected errors.

```bash
$ textidote --check en paper.tex
```
![[https://github.com/sylvainhalle/textidote](https://github.com/sylvainhalle/textidote)](textidote.png)
(*Source:* [https://github.com/PatrBal/vim-textidote](https://github.com/PatrBal/vim-textidote))

```{admonition} Installation
:class: tip

I had issues installing <code>textidote</code>. But turns out it's available via [`brew`](https://formulae.brew.sh/formula/textidote) (although undocumented).

```

The `Make` recipe would look something like:
```makefile
# Makefile
.PHONY: textidote
textidote: ./inspecting/textidote_dict.txt
    @echo "==> $@"
    @echo "Check doc with textidote"
    textidote --check en --dict $< --output ../paper.tex > inspecting/textidote.html
```

which outputs the results like in the image above to `inspecting/textidote.html`. It looks like the distinct colors indicate types of error but the colors cannot be configured.

![[https://sylvainhalle.github.io/textidote/](https://sylvainhalle.github.io/textidote/)](textidote2.png)

(Source: [https://sylvainhalle.github.io/textidote/](https://sylvainhalle.github.io/textidote/))

Color meanings:

* <span style="color:red">Red</span>: Spelling errors
* <span style="color:orange">Orange</span>: Grammar errors
* <span style="color:yellow">Yellow</span>: `TeX` syntax style suggestions


**Resources**

* [https://github.com/sylvainhalle/textidote](https://github.com/sylvainhalle/textidote)
* [https://formulae.brew.sh/formula/textidote](https://formulae.brew.sh/formula/textidote)


<!---------------------------------------------------------------------------->
<!---------------------- Due to ---------------------->
## Bonus: "Due to"
<!-- The first time I read about the (still contentious) debate about "due to"s in writing is from the classic (and controversial) `Elements of Style` by Strunk and White. For this and other reasons, I develop a distaste for *bureaucratese*, including phrases like "due to" <s>due to</s> because of some crazy bureaucratic fetish for brevity (including the use of acronyms! and the omission of "the" in writing?! Hello? dude, where is *the* determiner?!) at some soulless attempt at administrative efficiency. YOUR "EFFICIENCY" IS AN OPTICAL ILLUSION. So, `grep` is (again) useful to find all lines containing "due to" in the text. -->
All it takes to find *bureaucratese* like "due to" (`Elements of Style` by Strunk and White) is:
```bash
$ grep -n 'due to' paper.tex
```
which then spits out lines containing "due to" that can then be nuked. 
<!-- (This is after understanding that alternatives are longer---e.g., "because of" is longer than "due to"---and so brevity suffers.)  -->

The corresponding `Make` recipe is:
```makefile
# Makefile
.PHONY: dueto
dueto: ## Find "due to"s; Did you mean "because of", "owing to", or "from"?
    @echo "==> $@"
    @echo "Find all the 'due to's in writing"
    grep -n 'due to' paper.tex
```

**Resources**

* Strunk, William, and E. B. White. The Elements of Style. 4th ed., Pearson, 2000.

<!---------------------------------------------------------------------------->
<!---------------------- End note ---------------------->
## All together

All together, the directory containing the `tex` files, the `Makefile`, and the shell scripts would look something like 
```console
.
├── inspecting
│  ├── logs
│  ├── acronyms.sh
│  ├── ay2numeric.sh
│  ├── hardcoded-numbers.sh
│  ├── linkchecker.sh
│  ├── numeric2ay.sh
│  ├── repeated-strings.sh
│  ├── textidote_dict.txt
│  ├── unreferenced-labels.sh
│  ├── wordcount.sh
├── Makefile
├── paper.tex
├── paper.pdf
└── references.bib
```

`./inspecting/logs` contains the log files for reference. `ay2numeric.sh` and `numeric2ay.sh` swaps citations from author-year to numeric, and vice versa.

*Click to expand and see the full `.sh` scripts ([https://github.com/lsys/texCheckmate/](https://github.com/lsys/texCheckmate/)) that logs the output:*

````{dropdown} acronyms.sh
```bash
#!/bin/bash

SRC_TEX="paper.tex"
LOG_DIR="./inspecting/logs"
LOG_FILE="$LOG_DIR/acronyms.log"

# Ensure the log directory exists
mkdir -p "$LOG_DIR"

# Overwrite the log file at the start
> "$LOG_FILE"

# Function to process files, renamed to "acronyms"
acronyms() {
  local file="$1"
  
  # Check if file exists
  if [[ ! -f "$file" ]]; then
    echo "Error: $file not found" | tee -a "$LOG_FILE"
    return 1
  fi

  echo "Checking file: $file" | tee -a "$LOG_FILE"
  
  # Output file marker
  echo '===' "$file" '===' | tee -a "$LOG_FILE"
  
  cat "$file" | 
  # Process the file content and log results
  # =======================================================================
  # Ignore lines relating to environment
  sed -n '/\\begin{document}/,$p' |  # Skip lines until \begin{document}
  grep -Ev '^\\newcommand' | # Ignore lines starting with \newcommand (e.g., \newcommand{\nc}{} )
  grep -Ev '^\\setcounter' | # Ignore lines starting with \setcounter
  grep -Ev '^\\vspace' | 
  grep -Ev '^\\hspace' | 
  grep -Ev '^\\usepackage' | 
  grep -Ev '^\\setlength' | 
  # =======================================================================
  # Ignore math, tables, figures
  sed '/begin{equation}/,/end{equation}/d' | sed '/begin{equation\*}/,/end{equation\*}/d' | sed '/begin{align}/,/end{align}/d' | sed '/begin{align\*}/,/end{align\*}/d' | # Remove equation environments
  sed 's/\\input{[A-Za-z0-9_\/\.]*}//g' | # Drop input files that might contain numbers
  grep -v 'includegraphics' | # Drop lines that are graphics filepaths or numbers setting the figure size
  sed 's/[0-9\.]*\\textwidth//g' |
  sed 's/\$[A-Za-z0-9+=_{}\ ]*\$//g' |  # Drop inline equations that may contain numbers
  sed '/\\begin{figure}/,/\\end{figure}/d' | # Remove lines within figure environments
  sed '/\\begin{table}/,/\\end{table}/d' | # Remove lines within table environments
  # =======================================================================
  # Ignore commented-out lines
  sed '/\\iffalse/,/\\fi/d' | # Remove lines between \iffalse and \fi
  sed '/^%/d' | # Remove lines beginning with %
  # =======================================================================
  # Ignore citations
  sed 's/\\cite{[A-Za-z0-9:,\-]*}//g' | # Drop citations with \cite{} that may contain numbers
  sed 's/\\citealt{[^}]*}//g' |  # Drop citations with \citealt{} that may contain numbers
  sed 's/\\citep{[^}]*}//g' |  # This seems to work for \citep and not line 16..
  sed 's/\\citet{[^}]*}//g' |  # This seems to work for \citep and not line 16..
  # =======================================================================
  grep -wo "[A-Z]\+\{2,20\}" | sort | uniq -c | sort -gr |
  tee -a "$LOG_FILE"  # Log results to file
}

# Main loop to find and process all files named $SRC_TEX
find . -name "$SRC_TEX" | while IFS= read -r file; do
  acronyms "$file"
done
```
````

````{dropdown} hardcoded-numbers.sh
```bash
#!/bin/bash
#This script allow you to review hardcoded numbers appearing in the manuscript
#From https://github.com/jdingel/projecttemplate/blob/master/paper/reviewing/hardcodednumbers.sh
SRC_TEX="paper.tex"
LOG_DIR="./inspecting/logs"
LOG_FILE="$LOG_DIR/hardcoded-numbers.log"

for file in $(find -name $SRC_TEX)
do
    echo '===' ${file} '==='
    cat ${file} | 
    # =======================================================================
    # Ignore lines relating to environment
    sed -n '/\\begin{document}/,$p' |  # Skip lines until \begin{document}
    grep -Ev '^\\newcommand' | # Ignore lines starting with \newcommand (e.g., \newcommand{\nc}{})
    grep -Ev '^\\setcounter' | # Ignore lines starting with \setcounter
    grep -Ev '^\\vspace' | 
    grep -Ev '^\\hspace' | 
    grep -Ev '^\\usepackage' | 
    grep -Ev '^\\setlength' | 
    # =======================================================================
    # Ignore math, tables, figures
    sed '/begin{equation}/,/end{equation}/d' | sed '/begin{equation\*}/,/end{equation\*}/d' | sed '/begin{align}/,/end{align}/d' | sed '/begin{align\*}/,/end{align\*}/d' | #Remove equation environments
    sed 's/\\input{[A-Za-z0-9_\/\.]*}//g' | # Drop input files that might contain numbers
    grep -v 'includegraphics' | #Drop lines that are graphics filepaths or numbers setting the figure size
    sed 's/[0-9\.]*\\textwidth//g' |
    sed 's/\$[A-Za-z0-9+=_{}\ ]*\$//g' |  #Drop inline equations that may contain numbers
    sed '/\\begin{figure}/,/\\end{figure}/d' | # Remove lines within figure environments
    sed '/\\begin{table}/,/\\end{table}/d' | # Remove lines within table environments
    # =======================================================================
    # Ignore commented-out lines
    sed '/\\iffalse/,/\\fi/d' | # Remove lines between \iffalse and \fi
    sed '/^%/d' | # Remove lines beginning with %
    # =======================================================================
    # Ignore internal links
    sed 's/\\cref{[^}]*}//g' |  # Ignore numbers from \cref{}
    sed 's/\\ref{[^}]*}//g' |  # Ignore numbers from \ref{}
    sed 's/\\nameref{[^}]*}//g' |  # Ignore numbers from \nameref{}
    sed 's/\\crefrange{[^}]*}//g' |  # Ignore numbers from \nameref{}
    # =======================================================================
    # Ignore citations
    sed 's/\\cite{[A-Za-z0-9:,\-]*}//g' | #Drop citations with \cite{} that may contain numbers
    sed 's/\\citealt{[^}]*}//g' |  #Drop citations with \citealt{} that may contain numbers
    sed 's/\\citep{[^}]*}//g' |  # This seems to work for \citep and not line 16..
    sed 's/\\citet{[^}]*}//g' |  # This seems to work for \citep and not line 16..
    sed 's/\\cite{[^}]*}//g' |  
    grep '[0-9]' -n 
done > $LOG_FILE

cat $LOG_FILE
```
````


````{dropdown} linkchecker.sh
```bash
#!/bin/bash
SRC_TEX="paper.tex"
LOG_DIR="./inspecting/logs"
LOG_FILE="$LOG_DIR/linkchecker.log"

# Ensure the log directory exists
mkdir -p "$LOG_DIR"

# Overwrite the log file at the beginning
> "$LOG_FILE"

# Log and output both stdout and stderr
{
    echo "---> Retrieving URL(s) in \href{}{}:"
    # This one reverses the escaping of \# in the URL
    HREF_LIST=$(grep -o 'href{[A-Za-z0-9:/\._?#=]*}' "$SRC_TEX" | sed 's/.*href{//' | sed 's/}//;s/\\#/#/')

    echo "---> Retrieving URL(s) in \url{}:"
    URL_LIST=$(grep -o '\\url{[^}]*}' "$SRC_TEX" | sed 's/\\url{//;s/}//;s/\\#/#/g')

    echo "---> Retrieving all other URLs"
    GLOBAL_LIST=$(cat "$SRC_TEX" | egrep -o "(http|https)://[a-zA-Z0-9./?=_%:-]*" | sort -u)

    ALL_URLS=$(echo "$HREF_LIST $URL_LIST $GLOBAL_LIST" | tr ' ' '\n' | sort -u)
    for URL in $ALL_URLS; 
    do
        echo "Checking URL: $URL"
        wget --spider --no-verbose "$URL"
    done
} 2>&1 | tee -a "$LOG_FILE"
```
````

````{dropdown} repeated-strings.sh
```bash
#!/bin/bash
SRC_TEX="paper.tex"
LOG_DIR="./inspecting/logs"
LOG_FILE="$LOG_DIR/repeated-strings.log"

# Overwrite the log file at the beginning
> "$LOG_FILE"

{
    echo "Repeated unigrams:"
    egrep "\b([a-zA-Z]+) \1\b" "$SRC_TEX" -n

    echo "Repeated bigrams:"
    egrep "\b(\S+) (\S+) \1 \2\b" "$SRC_TEX" -n

    echo "Repeated trigrams:"
    egrep "\b(\S+) (\S+) (\S+) \1 \2 \3\b" "$SRC_TEX" -n
} 2>&1 | tee -a "$LOG_FILE"
```
````


````{dropdown} unreferenced-labels.sh
```bash
#!/bin/bash
# From https://github.com/jdingel/projecttemplate/blob/master/paper/reviewing/Makefile
SRC_TEX="paper.tex"
LOG_DIR="./inspecting/logs"
LOG_FILE="$LOG_DIR/unreferenced-labels.log"

diff \
  --side-by-side \
  --suppress-common-lines \
  <(grep -o --no-filename 'ref{[A-Za-z0-9:_]*}' "$SRC_TEX" | sed 's/ref//' | sort | uniq) \
  <(grep -o --no-filename 'label{[A-Za-z0-9:_]*}' "$SRC_TEX" | sed 's/label//' | sort | uniq) |
  tee "$LOG_FILE"
```
````


````{dropdown} wordcount.sh
```bash
#!/bin/bash
SRC_TEX="paper.tex"
LOG_DIR="./inspecting/logs"

# total: Do not give sums per file, only total sum.
# nobib: Do not include bibliography in count (default).
# sum Make sum of all word and equation counts. May also use
             # -sum=#[,#] with up to 7 numbers to indicate how each of the
             # counts (text words, header words, caption words, #headers,
             # #floats, #inlined formulae, #displayed formulae) are summed.
             # The default sum (if only -sum is used) is the same as
             # -sum=1,1,1,0,0,1,1.
# sub: Generate subcounts. Option values are none, part, chapter,
             # section or subsection. Default (-sub) is set to subsection,
             # whereas unset is none. (Alternative option name is -subcount.)
# inc: Parse included TeX files (as separate file).
# quiet: Quiet mode, no error messages. Use is discouraged!
TEXCOUNT_OUTPUT=$(texcount \
    -total \
    -nobib \
    -sum \
    -sub \
    -inc \
    -quiet \
    "$SRC_TEX"
)

echo "Word count report for $SRC_TEX: $TEXCOUNT_OUTPUT"

# Store total word count
OUTPUT_FILE="$LOG_DIR/wordcount.txt"

STORE=$(texcount \
    -1 \
    -nobib \
    -nosum \
    -quiet \
    "$SRC_TEX")

echo "$STORE" > "$OUTPUT_FILE"

# Store detailed word count
DETAILED_OUTPUT_FILE="$LOG_DIR/wordcount-detailed.txt"

STORE=$(texcount \
    -nobib \
    -quiet \
    "$SRC_TEX")

echo "$STORE" > "$DETAILED_OUTPUT_FILE"
```
````

The `Makefile` with all the recipes would be

````{dropdown} Makefile
```makefile
# .SUFFIXES: .pdf .tex
.DEFAULT_GOAL := help

#==============================================================================
# Inspecting
#==============================================================================
WORDCOUNT_SCR := ./inspecting/wordcount.sh
.PHONY: wordcount
wordcount: ## Wordcount via texcount
    @echo "==> $@"
    @echo "Check word count using texcount"
    dos2unix $(WORDCOUNT_SCR)
    chmod +x $(WORDCOUNT_SCR)
    $(WORDCOUNT_SCR)

ACRONYMS_SRC = ./inspecting/acronyms.sh
.PHONY: acronyms
acronyms: ## Find and tally acronyms
    @echo "==> $@"
    @echo "Check for acronyms"
    dos2unix $(ACRONYMS_SRC)
    chmod +x $(ACRONYMS_SRC)
    $(ACRONYMS_SRC)

.PHONY: dueto
dueto: ## Find "due to"s; Did you mean "because of", "owing to", or "from"?
    @echo "==> $@"
    @echo "Find all the 'due to's in writing"
    grep -n "due to" $(SRC_TEX) > inspecting/logs/duetos.log

.PHONY: duplicated_labels
duplicated_labels: ## Check for duplicated labels
    @echo "==> $@"
    @echo "Check for duplicated labels"
    grep -o '\\label{[^}]*}' $(SRC_TEX) | sort | uniq -cd | tee ./inspecting/logs/duplicated-labels.log

HARDCODEDNUMBERS_SRC = ./inspecting/hardcoded-numbers.sh
.PHONY: hardcodednumbers
hardcodednumbers: ## Find hardcoded numbers
    @echo "==> $@"
    @echo "Check for hardcoded numbers"
    dos2unix $(HARDCODEDNUMBERS_SRC)
    chmod +x $(HARDCODEDNUMBERS_SRC)
    $(HARDCODEDNUMBERS_SRC)

LINKCHECKER_SRC = ./inspecting/linkchecker.sh
.PHONY: linkchecker
linkchecker: ## Check URLs
    @echo "==> $@"
    @echo "Check that URLs work"
    dos2unix $(LINKCHECKER_SRC)
    chmod +x $(LINKCHECKER_SRC)
    $(LINKCHECKER_SRC)

REPEATED_STRINGS_SRC = ./inspecting/repeated-strings.sh
.PHONY: repeated_strings
repeated_strings: ## Check for repeated words
    @echo "==> $@"
    @echo "Check for repeated words (e.g. 'the the table shows...')"
    dos2unix $(REPEATED_STRINGS_SRC)
    chmod +x $(REPEATED_STRINGS_SRC)
    -$(REPEATED_STRINGS_SRC)

.PHONY: textidote
textidote: ## Check with textidote
textidote: ./inspecting/textidote_dict.txt
    @echo "==> $@"
    @echo "Check doc with textidote"
    -textidote --check en --dict $< --output html $(SRC_TEX) > inspecting/logs/textidote.html

UNREFERENCED_LABELS_SRC = ./inspecting/unreferenced-labels.sh
.PHONY: unreferenced_labels
unreferenced_labels: ## Check for label referencing
    @echo "==> $@"
    @echo "Check for unreferenced labels"
    dos2unix $(UNREFERENCED_LABELS_SRC)
    chmod +x $(UNREFERENCED_LABELS_SRC)
    -$(UNREFERENCED_LABELS_SRC)

.PHONY: inspect
inspect: ## Do all inspections of manuscript
inspect: duplicated_labels \
         repeated_strings \
         unreferenced_labels \
         hardcodednumbers \
         acronyms \
         linkchecker \
         textidote \
         dueto \
         wordcount


#==============================================================================
# Additional utilities
#==============================================================================
AY2NUMERIC_SRC = ./inspecting/ay2numeric.sh
.PHONY: aynumeric
aynumeric: ## Change author-year to numeric citation
    @echo "==> $@"
    @echo "Change author-year to numeric citations"
    dos2unix $(AY2NUMERIC_SRC)
    chmod +x $(AY2NUMERIC_SRC)
    $(AY2NUMERIC_SRC)


NUMERICAY_SRC = ./inspecting/numeric2ay.sh
.PHONY: numericay
numericay: ## Change author-year to numeric citation
    @echo "==> $@"
    @echo "Change numeric to author-year citations"
    dos2unix $(NUMERICAY_SRC)
    chmod +x $(NUMERICAY_SRC)
    $(NUMERICAY_SRC)


#==============================================================================
# Help
#==============================================================================
.PHONY: help
help: ## Show this help message and exit    
    @grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-16s\033[0m %s\n", $$1, $$2}'
```
````

Then, to run all inspections, just do:
```bash
$ make inspect
```

Some shell scripts like the `diff` to check unreferenced labels can return non-`0` status codes and might (annoyingly) turn up as a `Make` error (`make: *** [Makefile: ... Error 1`) even if there are no errors.
```text
# Example output
...
...
==> unreferenced_labels
Check for unreferenced labels
dos2unix ./inspecting/unreferenced-labels.sh
dos2unix: converting file ./inspecting/unreferenced-labels.sh to Unix format...
./inspecting/unreferenced-labels.sh
    ...
    ...
make: *** [Makefile:93: unreferenced_labels] Error 1
```
To make `Make` ignore this, do a `make inspect -k` (where the `-k` prevents `Make` from stopping). 

```bash
$ make inspect -k
```

Or, add a `-` before the command that would return non-zero status codes (e.g., `-./inspecting/unreferenced-labels.sh` instead of `./inspecting/unreferenced-labels.sh`).

## Summary
And this, in summary, is how to procrastinate on submitting a manuscript.

<br>
<a href="https://www.lucasshen.com">
  <img src="../homepage.png" alt="Home" style="width: 25px; height: 25px;"/>
</a>
<strong>Back to <a href="https://www.lucasshen.com">homepage</a>.</strong><br><br>

<a href="https://www.lucasshen.com/notes">
  <img src="../writing.png" alt="Notes" style="width: 25px; height: 25px;"/>
</a>
<strong>See more <a href="https://www.lucasshen.com/notes">notes</a>.</strong><br><br>
