Home:ALL Converter>How to Extract Images and Text in Order from PDF file using iText on Android

How to Extract Images and Text in Order from PDF file using iText on Android

Ask Time:2012-11-25T09:11:13         Author:Christian Eric Paran

Json Formatter

I have already done Extracting text from a PDF but now i want to extract the images. the first problem is that the images are between the texts per page. what i want to know is how to Extract Images in Order even the file is a 2 columned per page and how to determine where the Image is placed in the text.

Here are some codes that i have tried.

Image Extraction:

ExtractImages.java:
public static final String RESULT = "results/part4/chapter15/Img%s.%s";
public void extractImages(String filename)
    throws IOException, DocumentException {
    PdfReader reader = new PdfReader(filename);
    PdfReaderContentParser parser = new PdfReaderContentParser(reader);
    MyImageRenderListener listener = new MyImageRenderListener(RESULT);
    for (int i = 1; i <= reader.getNumberOfPages(); i++) {
        parser.processContent(i, listener);
    }
}

MyImageRenderListener:
public MyImageRenderListener(String path) {
    this.path = path;
}

public void renderImage(ImageRenderInfo renderInfo) {
    try {
        String filename;
        FileOutputStream os;
        PdfImageObject image = renderInfo.getImage();
        if (image == null) return;
        filename = String.format(path, renderInfo.getRef().getNumber(), image.getFileType());
        os = new FileOutputStream(filename);
        os.write(image.getImageAsBytes());
        os.flush();
        os.close();
    } catch (IOException e) {
        System.out.println(e.getMessage());
    }
}

the code process the contents of the pdf and checks for images, then render those images to a image file(.png,.jpg, etc).

The problem i got here is that it do not extract images in order. I want the image in order so i will know what image comes first in a page and last. How do i do that? then, Is it possible to extract the Images without rendering it to a file? My goal with the image is to display it in my android application as Image without turning it in a file. If I its not possible then I will stick to deleting the images when the user is done using it.

My Purpose is to EXTRACT(NOT VIEW) text and images from a pdf file and display it in order in a android application.

Author:Christian Eric Paran,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/13547359/how-to-extract-images-and-text-in-order-from-pdf-file-using-itext-on-android
yy