Retrieve Table of Contents

Retrieve Table of Contents in Java

GroupDocs.Parser, a Java API (which is a part of Conholdate.Total for Java) allows you to extract table of contents from documents, please use the getToc method:

Iterable<TocItem> getToc();

TocItem class has the following members:

Member Description
getDepth The depth level.
getPageIndex The page index.
getText The text.
extractText Extracts a text from the document to which TocItem object refers.

Follow the steps below to extract extract table of contents from the document:

  • Instantiate Parser object for the initial document;
  • Call getToc method and obtain collection of TocItem objects;
  • Check if collection isn’t null (table of contents  extraction is supported for the document);
  • Iterate through the collection and get page index to extract a page text from the document.

The following example shows how to extract table of contents from CHM file:

// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleChm)) {
    // Check if text extraction is supported
    if (!parser.getFeatures().isText()) {
        System.out.println("Text extraction isn't supported.");
        return;
    }
    // Check if toc extraction is supported
    if (!parser.getFeatures().isToc()) {
        System.out.println("Toc extraction isn't supported.");
        return;
    }
    // Get table of contents
    Iterable<TocItem> toc = parser.getToc();
    // Iterate over items
    for (TocItem i : toc) {
        // Print the Toc text
        System.out.println(i.getText());
        // Check if page index has a value
        if (i.getPageIndex() == null) {
            continue;
        }
        // Extract a page text
        try (TextReader reader = parser.getText(i.getPageIndex())) {
            System.out.println(reader.readToEnd());
        }
    }
}