Perhaps a better option would be to convert the filetype the documents are stored as ?
According to my research, tiff is quite big when it comes to filesize.
Perhaps gif or jpg would be a better option ?
source:
The tif are compressed, most of them at barely 100 kb. Does it matter for the DivideScannedImages script solution? If it does, I can try that. And get back here.
SteelMassimo wrote:
GIMP Version: 2.10.8
Operating System: Windows
OS Version: 7 Pro
GIMP Experience: New User
URL or Image link:
List any relevant plug-ins or scripts:
DivideScannedImages, G'MIC and Batch Image Manipulation
The issue I'm having is the following: I have an archive of about 700,000 tif images that were originally microfilms.
They are all black and white and are basically paper with information on it. I'm trying to figure out a way to trim/crop out the black background, so as to reduce the image size/resolution to show only the paper part as the whole image, but preserving everything inside the paper part.
Thing is, the images are REALLY messy and noisy. The black parts are really noisy, full of white dots and lines, and the paper section of the image is no better, only in reverse (full of black dots on white paper), and the actual information on the paper can be really blurry, sometimes looking like a bunch of black blotches.
So far I've had more success with ImageMagick, thanks to the help an amazing guy in their forum, but the variety of the quality and resolutions of the images seems to be making it impossible to write a single line command line that manages to properly remove the black background and trim the image in order to leave it the way I need it.
I've also had some success using the DivideScannedImages script, but for some reason its ignoring most of the images in the folder I point it towards.
Do you guys have any suggestions?
Possible algorithm:
- make a copy of the image
- blur it heavily, this should normally give a light center and very dark edges
- threshold this (threshold to be determined experimentally)
- use the result as a mask to crop the initial image
Of course, on the 700K images some images may require a lighter/stronger threshold. But you can make a first batch with a given threshold value, check visually the results and rerun the rejected with a lighter/stronger threshold.
Giving a URL whe we could find a few sample images would help.
I did, it's in the drive link on the beggining. Some 40 images there for testing.