Andrews University Seminary Studies, Vol. 59, No. 2, 251-271. Copyright © 2022 Andrews University Seminary Studies APPLICATION OF THE TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY WEIGHTING SCHEME TO THE PAULINE CORPUS BranpDoN [.S. vAN DER VENTEL RicHArD T. NEWMAN Stellenbosch University Stellenbosch University Abstract The term frequency-inverse document frequency (TF-1DF) weight- ing scheme is applied to the text of the thirteen epistles tradition- ally associated with the apostle Paul. The data for the analysis is the morphologically tagged text of the Society for Biblical Litera- ture’s Greek New Testament. The TF-IDF scheme is then used to construct the document term matrix (DTM) for a corpus under consideration. The DTM allows each document to be represented by a multi-dimensional document vector. A query document is then chosen and a vector representation of it is constructed. The cosine similarity between the query document and documents in the corpus is calculated. The following pairs of documents are consistently found to have the highest similarity: (1) Romans and Galatians, (2) Ephesians and Colossians, and (3) 1 Timothy and Titus. It is shown that computational methods may be applied to the thirteen epistles and that the results are in accordance with those obtained from theological or literary analysis. Keywords: New Testament, Paul, authorship, term frequency-inverse document frequency Introduction The New Testament was written in Koine Greek, an ordinary human language, which implies that the text can be analyzed using methods of computational linguistics.! Stylometry is one aspect of computational linguistics in which a statistical analysis of a corpus is done. One application of stylometry is authorship-attribution and one of the earliest examples, applicable to biblical studies, is that of Augustus De Morgan who, in 1851, suggested that average word length could be used for author-attribution. In a letter addressed to a ' William D. Mounce, Basics of Biblical Greek (Grand Rapids, MI: Zondervan, 2009), loc. 457. 251