Some R-related gists

Mar 5, 2019

pdf-merge-text.r

A common problem when scanning using a one-sided scanner with a document feeder is ending up with two PDFs: one for the front of each page and one for the back. This script writes LaTeX which combines those two PDFs, so that you end up with front1, back1, front2, back2, etc.

The resulting pdfmergetex.tex file looks like the following:

\includepdf[pages=1]{file1.pdf}
\includepdf[pages=12]{file2.pdf}
\includepdf[pages=2]{file1.pdf}
\includepdf[pages=11]{file2.pdf}

which may be input to a LaTeX document like so:

\documentclass[12pt,letterpaper]{article}
\usepackage[utf8]{inputenc}
\usepackage{graphicx}
\usepackage[left=1in,right=1in,top=1in,bottom=1in]{geometry}
\usepackage{pdfpages}
\begin{document}

\input{pdfmergetex.tex} % Input the .tex generated by R here

\end{document}

When compiled, the latter should produce a single, properly collated PDF. If the "backs" are in reverse order from what you expected, then try reverse = FALSE in the inclpdf call.

inclpdf <- function(fronts, backs, npages, reverse = FALSE){
  if (reverse){
    for (i in 1:npages){
      cat("\\includepdf[pages=", i, "]{", fronts, "}\n",
          "\\includepdf[pages=", npages - i + 1, "]{", backs, "}\n",
          sep = "")
    }
  } else {
    for (i in 1:npages){
    cat("\\includepdf[pages=", i, "]{", fronts, "}\n",
           "\\includepdf[pages=", i, "]{", backs, "}\n",
        sep = "")
    }
  }
  
}

sink("C:/OCR/pdfmergetex.tex")
inclpdf("file1.pdf", "file2.pdf", npages = 12, reverse = TRUE)
sink()

image-greyer.r

For every file ending in .png, .jpg, or .jpeg (and not beginning with blank-) in the current working directory, the script will generate a grey image of the same dimensions with text of the image name. The resulting images are named blank-<file name>. For example, if the original image name is e.png, then the script will create the above image named blank-e.png. The script may be useful for generating lightweight placeholders for images in other documents, such as a LaTeX article.

In the preamble of a LaTeX document, one could put

\usepackage{etoolbox}
\newtoggle{blank_images}
\toggletrue{blank_images}
%\togglefalse{blank_images}

and in the body where one wishes to include the image

\iftoggle{blank_images}{%
	\includegraphics[scale=1]{blank-e.png}
}{%
	\includegraphics[scale=1]{e.png}
}

As written, the above document will use the blank version blank-e.png. If \togglefalse{blank_images} is uncommented, then it will use the original e.png.

library(magick)

imagefiles <- list.files(pattern = "(\\.png|\\.jpg|\\.jpeg)")
imagefiles <- imagefiles[!grepl("blank-", imagefiles)]

for (imgname in imagefiles){
  img <- image_read(imgname)
  img <- image_fill(img, "lightgray", fuzz = 100)
  img <- image_annotate(img, imgname)
  
  image_write(img, path = paste0("blank-", imgname))
}

npp-r-function-parser.xml

Notepad++ has a Function List Panel which, by default, lists the functions defined in the current document for common languages, including C and Python. This Gist provides a simple parser to provide the same functionality for R.

functionList.xml, found in %APPDATA%\Notepad++\ or in the Notepad++ install directory must be edited to include the Gist code.

First, add the association in <associationMap>...</associationMap> next to the similar looking lines:

<association langID="54" id="r_function"/>

Second, add the parser code from the Gist (lines 5 through 13) in <parsers>...</parsers>. Again, one should be able to infer the location based on the other parsers defined in the file.

The parser is quite simplistic and will not capture every possible way of defining an R function. However, it should capture the most common syntax. It may also include unwanted leading whitespace in the function's name in the Function List.

<!--Put this in associationMap-->
<association langID="54" id="r_function"/>

<!--Put this in parsers-->
<parser id="r_function" displayName="R function" commentExpr="(#.*?$)">
  <function
    mainExpr="(^\s*[a-zA-Z0-9_\.\$]+[\s<\-=]+function\s*\()"
    displayMode="$functionName">
    <functionName>
      <nameExpr expr="^\s*([a-zA-Z0-9_\.\$]+)"/>
    </functionName>
  </function>
</parser>