060-510 MetaTool Extraction – Advanced Bar Code Rule
Kofax Express includes a very performant barcode reader. It works very accurately and fast as long as barcodes are printed according to standards and strict specifications. However, barcodes that are outside the standard specifications are often not recognized. MetaTool’s Advanced Barcode reader makes it possible to read barcodes which don’t comply with standard specifications and it also reads more barcode types than the barcode reader included with Kofax Express. It can also perform zonal barcode reading, which makes it possible to define the location where the barcode is printed to avoid confusion with other barcodes on the document.
Advanced Barcode is defined in the MetaTool Extract tab.
Press the Add button and select Zonal Extraction / Advanced Barcode to add the extraction rule.
The Advanced Barcode Setup window opens.
Select the index field to hold the extracted data, and select the zone you would like to extract the barcodes from. The zone can be full page, top or bottom half or a custom zone specified with the lasso tool.
Next, adjust the Image Processing and/or Barcode settings.
01 Advanced Barcode Rule – Image Processing Settings
By default the Image Processing settings are not enabled, check the Apply option to enable Image Processing.
In the example below, you see an image generated by a digital copier. Often these images are dithered which is a technique to simulate gray by using patterns of black and white pixels. This results in holes in the black bars causing barcode reading failures.
When we press the Test button without Image Processing enabled, the barcode reader doesn’t recognize the barcode data.
In this case, we activate Image Processing and adjust the Thickening settings to fill the holes in the black bars until the barcode is read correctly. We’ll review every Image Processing setting step by step.
02 – Brightness (represented by the small sun symbol): By increasing the brightness value, you make the scanned image brighter. This can be very useful when working with documents that contain a lot of noise or background pattern. In most cases it is not necessary to adjust the default value.
03 – Drop out: When working with forms with lines and labels in red, green or blue, we can filter these by using the drop out setting.
04 – Thickening: you can use this option to make the barcode bolder in the selected direction(s). The thickening removes the white speckles in the black bars and makes them solid. This improves the recognition considerably.
Increase the vertical thickening with 2 and we can see that the barcode in our example becomes solid and readable.
05 Advanced Barcode – Barcode Settings
06 – On Document(s): you can select a condition by pressing the drop-down arrow:
1) First document: only the first document of the batch will be processed.
2) If field value is equal to…: only documents that pass this condition will be processed. You can set up the condition by pressing the Setup button.
You can set a fixed value or select different system and index values to compose your value by pressing the Setup button.
06 – On Page(s): Sometimes the information is on another page than page 1. With this option, you can exactly define which page to extract data from.
You can choose between the 4 following options:
1) Page: only reads the page with the selected page number
2) All: reads all the pages of the whole document
3) Range: type in the page numbers or ranges separated by commas. Negative numbers identify pages starting from the end of the document.
For example: -2–1 has a range of the page before the last page to the last page
4) First document only: only reads the page(s) of the first document in the batch
07 – Align Zone: when documents in a batch are of varying sizes or mixed orientations (portrait and landscape mixed together), you can align your zone in relation to any of the 4 corners of the image: the top left or right corner or the bottom left or right corner. That way the zone will be positioned correctly on all sizes and orientations.
Top right alignment of a zone on a portrait oriented image
Top right alignment of the same zone on a landscape oriented image
08 – Append to original value: the bar code reader’s result will be added to the value that was already in the index field. Disable this option to overwrite the previous value with the new result.
09 – Clear original value if result is blank: when the barcode reader returns nothing, any value already in the index field generated by previous rules or by Kofax Express will be cleared.
10 Advanced Barcode Rule – Reading Options
The reading options are divided in three categories:
11 – Standard Options:
1) Barcode section:
Types: MetaTool Advanced Barcode supports the same and more barcode types as Kofax Express.
For example, Kofax Express supports three types of patch codes (II, III and T) to separate documents. MetaTool’s Advanced Barcode can detect all 6 types (I, II, III, IV, VI and T) and any of these types can be defined to trigger the creation of a new document in MetaTool.
The MetaTool Advanced Barcode supports the following barcode types:
Codabar: mostly used for applications that require serial numbers, such as airway bills and member cards. Codabar barcodes can be smaller in size compared to Code 39 barcodes.
Code 2 of 5 interleaved: a very simple and compact numeric code that can display digits from 0 to 9. It is used commercially on 135 film and on cartons of some products. The code always contains an even number of digits. Uneven number of digits are prefixed with a 0 to make them even.
Code 2 of 5 non-interleaved: mostly used for the airline industry, distribution systems and warehouse sorting systems.
Code 39: mostly used for document management and in non-retail environments like the US Department of Defense, the health industry and postal services. It's unique by the fact that it can be generated using a font. Any software, like Word or Excel can be used to generate a Code 39 using a font like, for example, the free3of9 font. A Modulo 43 check-digit is optional.
It can contain upper case letters and digits and following special characters: -, ., $, /, +, %, and space
Because it is very easy to generate, Code 39 is popular. However, barcode 128 is preferred because it is much more compact and features a full character set.
Extended Code 39: looks identical to a standard Code 39, but it supports the full ASCII character set by combining two standard code 39 characters to represent a single extended code 39 character. For example, “+A” in extended mode is decoded as “a”. A Modulo 43 check-digit is optional.
Code 93: similar to Code 39, but it can fit more characters in the same space. A Modulo 47 check-digit is optional.
Code 128: can encode the complete ASCII-character set and has an internal check digit that won’t be displayed in the text below the code. It is the most widely used linear barcode used across industries and is often used for document management and mail tracking codes because it does not take a lot of space on the documents.
There is an even more compact variation of the Code 128 format without start/stop characters called Short Code 128.
Databar: designed for point of sales scanning and very small item identification. It’s mainly used in the healthcare and retail industry (for example: coupons).
Datamatrix: a 2D barcode consisting of black and white "cells" or modules arranged in either a square or rectangular pattern, also known as a matrix. The length of the encoded data depends on the number of cells in the matrix. Thanks to a feature called “redundancy” a damaged datamatrix code can still be decoded using error correction and recovering all data.
A Data Matrix symbol can store up to 2,335 alphanumeric characters. It’s mostly used to mark small items.
EAN 13: contains 13 digits and is designed for point of sales scanning and product identification. It’s mainly used in the retail industry. Contains a check digit that is calculated according to modulo 10.
EAN 8: the short (less common) form of EAN 13, contains only 8 digits. This code is used if the article is too small for an EAN 13 barcodes.
PDF417: a 2D barcode based on stacked barcodes. It also applies error correction based on the code length. It’s mostly used for airline boarding passes, ID cards, inventory management and document management. Like most 2D barcodes, it features “redundancy” to make it possible to decode damaged barcodes using error correction.
Micro PDF417: Micro PDF417 was designed for situations where a full PDF417 barcode would be too large. It has the same functions as PDF417 barcodes.
QR Code: a modern 2D barcode with marks indicating the orientation. Mostly used for mobile tagging for cell phones. Contains a check digit that is calculated according to Reed-Solomon-Error correction. This makes it possible to decode damaged QR-Codes. Because it is such an efficient and compact barcode type, it is very popular in document management applications and across most industries.
CaptureBites developed a barcode generator software optimized for document management to easily generate your own QR labels.
UPC-A: very similar to the EAN barcode used in the US for product coding. It contains 12 digits. Contains a check digit that is calculated according to modulo 10.
UPC-E: the short version of the UPC-A barcode with 8 digits, always starting with a zero. Has the same functions as UPC-A.
Patch Codes: there are 6 different types of patch codes. They are mostly used on separator sheets. They are often printed along all four edges of the separator sheet. It’s common that separator sheets are discarded after separation has been applied to the document.
Short Code 128 minimum length: defines the minimum length of a Short Code 128 barcode value, including the checksum characters.
Checksum options: Enable this if your barcodes contain a checksum in Code 2 of 5 or Code 39 type of barcodes. The checksum feature is especially useful with Code 2 of 5 which does not feature start and stop characters. This could cause partial reads if part of the Code 2 of 5 is damaged. Adding a checksum to the Code 2 of 5 value would reject partially read barcodes.
Code 39 includes start/stop: As mentioned before, Code 39 can be used as a font and therefore easily be made by anyone with any text-editor. But it also uses a start/stop character represented by an asterisk (*).
In the example below you can see that the first portion of bars perfectly matches the end portion. These bars represent the * start/stop characters.
When you forget to include the start/stop characters at the beginning and the end of the Code 39, Kofax Express cannot recognize the barcode. MetaTool however will recognize it, even without the start/end character by disabling the option Code 39 includes start/stop.
Multiple read: select this option if you want to read more than one barcode.
Maximum barcodes: the barcode reader stops searching for barcodes once the maximum number of barcodes is found. If the number of barcodes expected on a page is known, then it’s highly recommended that this value is set to that value and not higher than that value. Unnecessary high values will slow down the barcode reading process. Set the value to 0 to set the maximum number to unlimited (read all barcodes on the image regardless the quantity).
Confidence level: any barcode with a score that’s higher or the same as the Preferred value will be recognized. When no barcode meets this standard, the barcode with the highest score that’s higher or the same as the Minimum value will be recognized.
2) Image processing section:
Color threshold: this value is the color level used to decide whether a pixel should be considered to be black or white and disables the automatic processing explained in the next setting (Color processing level). Every value besides 0 disables the Color processing level (see next setting). It’s recommended to keep the default value (0) and make use of the Color processing level instead.
Color processing level: if Color threshold (see previous setting) is set to 0, color processing is automatic. This automatic process can be tuned with this setting. A high value will result in higher accuracy and read-rate levels, but it will be slower than when you would set a low value.
Skew tolerance: the maximum angle from the horizontal or vertical at which a barcode will be read. When using bar code labels on the documents which are not always stuck perfectly straight on the document, we recommend to keep the default value of 29°. However when bar codes are pre-printed on the documents, it is recommended to set this to a lower value such as 5° to increase speed.
Skew line jump: this is the frequency with which scan lines are sampled when searching for skewed barcodes. Increasing the value will increase the speed at which an image is processed but may decrease the read-rate level.
12 – Advanced options:
1) Barcode section:
Convert UPC-E to EAN13: A UPC-E barcode is actually an EAN13 barcode reduced to an 8 digit number. Using this option will let the engine restore the original EAN13 value of the UPC-E barcode.
This can be useful when working with US-exclusive UPC-E barcodes that need to be used in the rest of the world where EAN13 is the standard barcode format.
Barcodes at top of page: This will process an image from the top of the page downwards and will speed up barcode detection when barcodes are mostly located on the top of the page.
This is different to Scan directions, which sets the orientations of barcodes that the barcode reader will recognize.
It should only be enabled if either Multiple read is disabled or if the Maximum number of barcodes is not equal to zero. For other cases it is recommended to leave this option disabled.
Minimum length: this defines the minimum length of a barcode value, including the checksum characters. Barcodes with a value shorter than the set length will be ignored.
Maximum length: this defines the maximum length of a barcode value, including the checksum characters. Barcodes with a value longer than the set length will be ignored.
Numeric barcode: When enabled, only bar codes with a numeric value will be recognized.
Pattern: Enter a regular expression to compare the found barcode to (uses POSIX extended syntax)
Scan directions: Only barcodes printed in the specified orientation(s) will be recognized.
Quiet zone size: the size, in pixels, of the space around a barcode. 0 implies a quiet zone equal to 10% of the image resolution.
For example: when you have an image with a resolution of 300 dpi, the quiet zone size would be 30 pixels.
Minimum space bar width: the minimum size for a space between bars. 0 automatically selects the best value.
Minimum separation: defines the minimum distance between barcodes with the same value and height. When the distance between 2 barcodes is less than the Minimum separation value, the engine will assume it’s a single barcode that was mistakenly split in 2 parts.
Error correction: when selected, the engine will make a best guess at unreadable barcodes and let it pass through.
2) Image processing:
Timeout: the maximum time in seconds that the engine will allow for scanning a page in a document.
Read skewed linear: when selected, skewed linear barcodes are read without the need to set Skew tolerance. This setting only relates to Codabar, Code 25, Code 39 and Code 128 type barcodes.
Read skewed Datamatrix: when selected, skewed Datamatrix barcodes are read without the need to set Skew tolerance.
Gamma correction: with a value other than the default 100, the set gamma correction is applied to color images.
Noise reduction: this filter removes marks from bitonal (2 colored) images. The higher the Noise reduction value, the larger the marks from the image that will be removed.
Be careful, because it could also destroy vital barcode information when set too high. A typical value is around 10.
Despeckle: when the Noise reduction is set to a value that is not 0, this filter will remove white speckles inside the bars of a barcode before removing black marks from the spaces between bars.
Median filter: this filter is useful for cleaning high resolution images that contain speckles of black and white. Not recommended for images where the black bars or white spaces are less than 2 pixels wide.
Use over sampling: this forces the engine to sample 3 scan lines at a time and take the average pixel value. This is useful for images containing both black and white speckles.
Line jump: this is the frequency with which scan lines are sampled when searching for barcodes. Increasing this value will increase the speed at which an image is processed, but it may decrease the read-rate.
Use fast scan: when selected, a fast scan of the image is performed before conducting the normal scan. This only applies when a single barcode is required on a page (so Multiple read should be off or Maximum barcodes should be set to 1).
Fast scan line jump: this is the frequency with which scan lines are sampled for a fast scan.
The default value is 25, this allows a quick capture of any easy-to-read barcodes. Decreasing the value may decrease the overall speed.
13 – Returned Values options:
1) Encoding: Experiment with this option if you use extended ASCII mapping. This applies to barcode types that use full character sets, such as QR codes, PDF417, etc…
Please select the correct encoding type for your application.
2) Allow duplicates: when enabled, multiple barcodes with identical values on the same page will all be reported.
3) Return check digit: returns the barcode check digit. This only applies to barcode types with built in check digits, like Code 128.
4) Unread barcodes: the barcode engine first looks in the document if there is a barcode, then it looks for its value. When the value can’t be read, it will still return the type of barcode it detected with an unrecognized value. You can choose which barcode types to detect by selecting them from the drop down menu.
5) Codabar start/stop: Codabar barcode values are always returned with a start/stop character pair, which can be either a/t, b/n, c/* or d/n. This option includes the start and stop characters when returning the value of a codabar barcode. It can also be configured to return these start and stop characters:
Blank: don’t return the start/stop characters
a, b, c, d / t, n, *,e: return a, b, c or d as the start character and t, n, * or e as the end character
A, B, C, D / T, N, *, E: return A, B, C or D as the start character and T, N, * or E as the end character
A, B, C, D / A, B, C, D: return A, B, C or D as the start and the end character.