Extract Images from PDF
Contents
[
Hide
]
Extract Images from PDF in Java
GroupDocs.Parser for Java(which is a part of Conholdate.Total for Java) provides the functionality to extract images from PDF by the getImages method:
Iterable<PageImageArea> getImages();
This method returns a collection of PageImageArea objects:
Member | Description |
---|---|
getPage | The page that contains the text area. |
getRectangle | The rectangular area on the page that contains the text area. |
getFileType | The format of the image. |
getRotation | The rotation angle of the image. |
getImageStream | Returns the image stream. |
getImageStream(ImageOptions) | Returns the image stream in a different format. |
save(String) | Saves the image to the file. |
save(String, ImageOptions) | Saves the image to the file in a different format. |
ImageOptions class is used to define the image format into which the image is converted. The following image formats are supported:
- Bmp
- Gif
- Jpeg
- Png
- WebP
Here are the steps to extract images from the whole document:
- Instantiate Parser object for the initial document;
- Call getImages method and obtain collection of PageImageArea objects;
- Check if collection isn’t null (images extraction is supported for the document);
- Iterate through the collection and get sizes, image types and image contents.
The following example shows how to extract all images from the whole document:
// Create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleImagesPdf)) {
// Extract images
Iterable<PageImageArea> images = parser.getImages();
// Check if images extraction is supported
if (images == null) {
System.out.println("Images extraction isn't supported");
return;
}
// Iterate over images
for (PageImageArea image : images) {
// Print a page index, rectangle and image type:
System.out.println(String.format("Page: %d, R: %s, Type: %s", image.getPage().getIndex(), image.getRectangle(), image.getFileType()));
}
}