Skip to content

chrible.blog

Writing about my research and anything interesting.

  • About
  • Publications
  • About
  • Publications

How to Write PDF Metadata with LuaTeX

  • Markup Languages Typesetting
  • December 30, 2023

LuaTeX is a powerful extension of TeX which allows to fully harness the capabilities of traditional TeX systems without the huge development overhead this usually entails. LuaTeX contains the Lua scripting language which is executed on top of the TeX runtime. The multi-paradigm programming language Lua incorporated into LuaTEX provides a very convenient bridge to parts of the TEX engine through interfaces and callback hooks making it easier to perform complex tasks and customize the typesetting process.

This blog entry explores how we can use LuaTeX to embed metadata – or just about any data – in PDF documents.

lua tex and pdf logos

Creating an object

PDF files can contain a variety of different content types such as text, pictures, metadata, forms, interactive elements an so on. All these different parts are packed into containers in the PDF file format which are called objects. In LuaTeX we can use the pdf backend to create a PDF object. The obj function creates a pdf object and returns its object number. These are the allowed parameters of the function:

Lua
pdf.obj {
 type = ,
 immediate = , 
 objnum = , 
 attr = , 
 compresslevel = , 
 objcompression = , 
 file = , 
 string = , 
 nolength = , 
}

Now to create a simple object which contains the string “hello world!” for example we can use the following snippet:

Lua
local new_obj = pdf.obj {
  type = 'stream',
  attr = '/Type /Text /Subtype /Plain',
  immediate = true,
  compresslevel = 0,
  string = 'hello world!',
}

To write this object at the end of the PDF document we can run the code in the finish_pdffile callback.

Lua
luatexbase.add_to_callback('finish_pdffile', function()
  pdf.obj {
    type = 'stream',
    attr = '/Type /Text /Subtype /Plain',
    immediate = true,
    compresslevel = 0,
    string = 'hello world!',
  } 
end, 'finish')

Or directly where we want it to be in the document using LaTeX:

TeX
\directlua{%
  pdf.obj {
    type = 'stream',
    attr = '/Type /Text /Subtype /Plain',
    immediate = true,
    compresslevel = 0,
    string = 'hello world!',
  }
}%

Adding Metadata to the PDF file

PDF Metadata contains information like the title, author or modification dates of the document. Starting with PDF 1.4, metadata can be stored either in metadata streams which contain XML data or in a document information dictionary. While the document information dictionary as a key-value store is rather restricted in terms of content flexibility, metadata streams can contain arbitrary XML constructs. For metadata streams Adobe recommends the eXtensible Metadata Platform (XMP) standard (sections 1.6.1 and 2.2). XMP is based on the RDF language which relies on triple notation to document information. Here is an example how an uncompressed metadata stream can look like which describes earth as a sphere.

XML
<< /Type /Metadata /Subtype /XML /Length 1706 >>
stream
<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>
  <x:xmpmeta xmlns:x="adobe:ns:meta/">
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-...">
      <rdf:Description rdf:about="http://example_uri.net#Earth" xmlns:ex="http://example_uri.net">        
        <ex:has_shape>
          <ex:Sphere/>
        </ex:has_shape>
      </rdf:Description>
    </rdf:RDF>
  </x:xmpmeta>
<?xpacket end='w'?>
endstream
endobj

PDF files can have multiple packets of XMP metadata. The “main” Metadata object can be referenced in the top level PDF dictionary, so an application that understands PDF can find the newest metadata packet. Let’s see how we can achieve this in LuaTeX.

Lua
metadata_obj = pdf.obj {
  type = 'stream',
  attr = '/Type /Metadata /Subtype /XML',
  immediate = true,
  compresslevel = 0,
  string = "<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?> \
  <x:xmpmeta xmlns:x='adobe:ns:meta/'> \
    <rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-...'> \
      <rdf:Description rdf:about='http://example_uri.net#Earth' xmlns:ex='http://example_uri.net'> \
      <ex:has_shape> \
        <ex:Sphere/> \
      </ex:has_shape > \
    </rdf:Description> \
  </rdf:RDF> \
</x:xmpmeta > \
<?xpacket end='w'?>",
}
-- adding the new object to the catalog as a metadata entry
local catalog = pdf.getcatalog() or ''
pdf.setcatalog(catalog..string.format('/%s Metadata 0 R', metadata_obj))

And that’s how you can save a custom XMP metadata object with LuaTeX. Of course the general principle here applies to arbitrary additional data added to a PDF file. The XMP standard is not enforced by PDF.

If you are interested in a practical example applying these techniques consider to check out a paper I wrote with colleagues Oliver Karras and Ildar Baimuratov about a new LaTeX package to embed main scientific contributions of a paper to PDF metadata here and the code repository of the project here.

LuaTeX manual
XMP standard

Tags: LaTeXLuaLuaTeXMetadataPDFPDF creationRDFXMP

Share
  • Previous Deriving the Least Square Solution of Polynomial Regression

You may also like...

  • Deriving the Least Square Solution of Polynomial Regression

  • A Prototype for a Fungible, Soulbound and Revocable Token

  • 7 Problems With the Scientific Publishing Process

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

chrible.blog © 2025. All Rights Reserved.