Home:ALL Converter>Extract images from PDF page section

Extract images from PDF page section

Ask Time:2022-12-14T19:06:07         Author:Saumini Navaratnam

Json Formatter

I need extract images from a PDF page section.

For example consider there is a PDF page which has couple of images on top of the page & couple of images on bottom of the page. I want to extract the images on top of the page.

So far what I tried is :

  • Using ghostscript cropped the pdf - gs -o$croppedPdfFilepath -sDEVICE=pdfwrite -c "[/CropBox [31.46 690.22 560.54 839]" -c "/PAGES pdfmark" -sPageList=12 -f $originalPdfFilepath
  • Then pass the cropped image to pdfimages to extract the images - pdfimages -j "$croppedPdfFilepath" $outputDirectory/image

But the problem is pdfimages is extracting all the images on that page (From the top & the bottom), even though when I view the cropped PDF it has only the images on top of the page.

After some research it looks like the CropBox only hides the cropped content from view but the PDF source still has the content.

Any guidance to remove the content from the PDF page or any other approach will be helpful. I'm using php to do it programatically.

References

Author:Saumini Navaratnam,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/74797285/extract-images-from-pdf-page-section
yy