Best practices for scientific writing

August 30, 2023   

Best practices for scientific writing

This is a collection of rules and best practices for scientific writing which was compiled mainly by my Ph.D. supervisor Eric Medvet. I am mainly responsible with curating and updating this list regularly. In this page, (AmE) refers to American English, while (BrE) refers to British English. Normally, scientific publications are written in AmE. If you may find some of these rules imprecise or wrong, please notify it to me.

Typography

  • (AmE) i.e. and e.g. have a trailing comma (blellow, i.e., the mix of blue and yellow); in (BrE) there is no trailing comma (blellow, i.e. the mix of blue and yellow).
  • There are several types of dashes:
    • The hyphen (-) is used to separate words (thirty-three years of age, out-of-distribution data, late-evening meeting…).
    • The en-dash (–) is a seldom-used dash used to indicate ranges (people aged 20–25) and compound words where the first concept is composed of multiple words (this is a Neural Network–based method, which could also be written as Neural-Network-based method). In LaTeX, you can use the en-dash by typing two consecutive hyphens --.
    • The em-dash (—) is a punctuation mark used in place of commas, parentheses, etc. to introduce a subordinate sentence (Convolutional Neural Networks, which we previously introduced in lecture 2, are a model…Convolutional Neural Networks—which we previously introduced in lecture 2—are a model…). Notice that the em-dash is never surrounded by whitespaces. Due to the absence of the em-dash from most keyboards, it is often replaced by a hyphen with leading and trailing whitespaces (" - “), although this is usage is, under strict typographic rules, wrong. In LaTeX, you can easily use the em-dash by typing three consecutive hyphens ---.
  • The usage of the Oxford comma for lists of more than two elements is highly suggested (a cat, a dog and a fisha cat, a dog, and a fish).
  • Words in math formulae should always be written as normal text (i.e., not italic as LaTeX and Word Equation Editor tend to do with letters in formulae). Example: accuracy=… → accuracy =…. This is especially important in LaTeX where natural text in math environment can become weirdly formatted. Here you can use the commands \text{} or \mathrm{} for writing natural language in formulae.

Form

  • In Scientific Writing, never used contracted forms: it’s becomes it is, can’t becomes cannot
  • Acronyms: repeated technical terms can be written as acronyms (Neural NetworkNN) but only after the first usage: at the first usage, the term has to be written in full and then the acronym introduced in parentheses (Neural Network (NN)). If, as in this case, you want to avoid nested parentheses, you can use the em dash (Neural Network—NN).
    • If the first occurrence of a technical term is in plural case, the acronyms needs to include a lowercase plural ’s' (Neural Networks—NNs).
    • Notice that the abstract, the main body, and the conclusion constitute, for this sake, three distinct parts of the manuscript, and you should always reintroduce acronyms in these parts (so, if you introduce the acronym Neural Network (NN) in the abstract, you have to reintroduce it again in the introduction and in the conclusion). There is no need for introducing an acronym if you use the term only once (so, even if you use a lot of times the acronym NN in the main body, you don’t need to introduce it in the abstract if you use it only once there).
    • If writing a long thesis or a book, you may supplement the manuscript with an additional table of acronyms, which can be of great help for the reader.
  • The passive form should be used only when talking about things others did (…they trained a model for 30 epochs……the model was trained for 30 epochs…). If you use the passive for describing things you did, to the reader it is often unclear who the author of the action is (…the model was trained for 30 epochs…we [or I] trained the model for 30 epochs…).
  • There is a common praxis in scientific writing for verb tenses:
    • Abstract and introduction: present tense (we consider … we propose)
    • Related work: past tense (they did … he experimented with)
    • Materials: present tense (let x be … we compute)
    • Methods and exposition of results: past tense (we trained the model for 30 epochs … we were able to achieve an accuracy of x%)
    • Comments on results: present tense (we can see that this model performs better…)

LaTeX-specific rules

  • Emphasis should be formatted using \emph{}, not \textit{}. The latter is used for, e.g., highlighting foreign words (like per se).
  • Quotation marks: opening quotation marks use two backticks (``), closing quotation marks use two apostrophes (''). Conversely to Word and Docs, LaTeX does not format the quotation marks according to their position within the text: “quote” will be formatted as “quote“ (notice the wrong orientation of the first quotation mark). ``quote'' (notice the leading double backtick) will be correctly formatted as ❝quote“.
  • Numbers: when writing numbers as quantities outside of math formulae, prefer using \num{} from package siunitx. This will help with the visualization of large numbers (e.g., \num{123456} → “123 456”) for increasing readability.
  • When writing a list of numbers, use \numlist{} fro the package siunitx: \numlist{1;2;3} → “1, 2, and 3” (notice the use of ; as a number separator in the code). If you want to format a list as range, you can similarly use \numrange{}.
  • In LaTeX, a dot with a whitspace (or newline) afterwards is interpreted as an end-of-sentence punctuation mark. If no newline is requested, the formatting will put the following sentence on the same line as the previous, with a whitespace between the dot and the beginning of the next word. This whitespace is, specifically, a bit longer than the usual whitespace for separating words. In some case, you might have dots which are not end-of-sentence, like in some contracted forms (category → cat.). In LaTeX, the sentence “the image belongs to cat. X” should be written as the image belongs to cat.\ X (by forcing a normal whitespace between “cat.” and “X”) to avoid LaTeX from formatting the whitespace after “cat.” as an end-of-sentence whitespace.
  • For cross-referencing another section, figure, equation… from the same project, use \Cref{...} from the package cleveref. Suppose we want to reference the introduction, which we have labeled as \label{sec:introduction}. In this way, \Cref{sec:introduction}, which will be rendered as “Section 1”. We don’t need to write Section \ref{sec:introduction} in the LaTeX.
  • When using IEEE-style bibliography (which indexes the bibliographic items with numbers, like this [1]), use \citet from natbib if you want to include the authors’ names in the citation. \cite{vaswani2017attention} -> “[1]”; \citet{vaswani2017attention}→ “Vaswani et al. [1]”. If using APA-style bibliography, you can use the command \citeA{} for including the name of the author(s) in the citation.
  • Read this article for formatting quotations.
  • Inline item lists (this is good because (a) … and (b) …) can be rendered by writing them like a normal \begin{enumerate}: just use a \begin{enumerate*} environment. For using alphabetic labels, postpone [label=(\alph*)] to the \begin{enumerate*} command written before.
  • Line break after ending each sentence (NB: LaTeX will not print a line break unless [a] the line break is rendered explicit using \newline, or [b] two line breaks are printed. This is very useful if using Overleaf, whereas, by double-clicking a word on the compiled PDF, the line of code corresponding to the word is highlighted. If you are writing multiple sentences on the same code line, it will be much harder to idenfity the part of text responsible, for example, of an error.
  • 3 empty lines after ending each chapter/(sub)section… This makes the code way more readable.