I need extract images from a PDF page section.
For example consider there is a PDF page which has couple of images on top of the page & couple of images on bottom of the page. I want to extract the images on top of the page.
So far what I tried is :
- Using ghostscript cropped the pdf -
gs -o$croppedPdfFilepath -sDEVICE=pdfwrite -c "[/CropBox [31.46 690.22 560.54 839]" -c "/PAGES pdfmark" -sPageList=12 -f $originalPdfFilepath
- Then pass the cropped image to pdfimages to extract the images -
pdfimages -j "$croppedPdfFilepath" $outputDirectory/image
But the problem is pdfimages
is extracting all the images on that page (From the top & the bottom), even though when I view the cropped PDF it has only the images on top of the page.
After some research it looks like the CropBox
only hides the cropped content from view but the PDF source still has the content.
Any guidance to remove the content from the PDF page or any other approach will be helpful. I'm using php
to do it programatically.
References