I have already done Extracting text from a PDF but now i want to extract the images. the first problem is that the images are between the texts per page. what i want to know is how to Extract Images in Order even the file is a 2 columned per page and how to determine where the Image is placed in the text.
Here are some codes that i have tried.
Image Extraction:
ExtractImages.java:
public static final String RESULT = "results/part4/chapter15/Img%s.%s";
public void extractImages(String filename)
throws IOException, DocumentException {
PdfReader reader = new PdfReader(filename);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MyImageRenderListener listener = new MyImageRenderListener(RESULT);
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
parser.processContent(i, listener);
}
}
MyImageRenderListener:
public MyImageRenderListener(String path) {
this.path = path;
}
public void renderImage(ImageRenderInfo renderInfo) {
try {
String filename;
FileOutputStream os;
PdfImageObject image = renderInfo.getImage();
if (image == null) return;
filename = String.format(path, renderInfo.getRef().getNumber(), image.getFileType());
os = new FileOutputStream(filename);
os.write(image.getImageAsBytes());
os.flush();
os.close();
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
the code process the contents of the pdf and checks for images, then render those images to a image file(.png,.jpg, etc).
The problem i got here is that it do not extract images in order. I want the image in order so i will know what image comes first in a page and last. How do i do that? then, Is it possible to extract the Images without rendering it to a file? My goal with the image is to display it in my android application as Image without turning it in a file. If I its not possible then I will stick to deleting the images when the user is done using it.
My Purpose is to EXTRACT(NOT VIEW) text and images from a pdf file and display it in order in a android application.