Retrieve Table of Contents

Retrieve Table of Contents in Java

GroupDocs.Parser, a Java API (which is a part of Conholdate.Total for Java) allows you to extract table of contents from documents, please use the getToc method:

Iterable<TocItem> getToc();

TocItem class has the following members:

Member	Description
getDepth	The depth level.
getPageIndex	The page index.
getText	The text.
extractText	Extracts a text from the document to which TocItem object refers.

Follow the steps below to extract extract table of contents from the document:

Instantiate Parser object for the initial document;
Call getToc method and obtain collection of TocItem objects;
Check if collection isn’t null (table of contents extraction is supported for the document);
Iterate through the collection and get page index to extract a page text from the document.

The following example shows how to extract table of contents from CHM file:

// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleChm)) {
    // Check if text extraction is supported
    if (!parser.getFeatures().isText()) {
        System.out.println("Text extraction isn't supported.");
        return;
    }
    // Check if toc extraction is supported
    if (!parser.getFeatures().isToc()) {
        System.out.println("Toc extraction isn't supported.");
        return;
    }
    // Get table of contents
    Iterable<TocItem> toc = parser.getToc();
    // Iterate over items
    for (TocItem i : toc) {
        // Print the Toc text
        System.out.println(i.getText());
        // Check if page index has a value
        if (i.getPageIndex() == null) {
            continue;
        }
        // Extract a page text
        try (TextReader reader = parser.getText(i.getPageIndex())) {
            System.out.println(reader.readToEnd());
        }
    }
}