MetaTool > Help > Extraction > OCR (extra languages)
060-590 MetaTool Extraction – Find Number
MetaTool’s Find Number makes it possible to find numbers in documents such as a Total Amount or quantity. It’s frequently combined with a Remove Characters rule and a Replace Text rule.
The Find Number rule is very useful when you need to extract a number from documents that don’t have a fixed format. A classic example is when you need to extract the total amount on invoices. The data is also not always located in the same place, it depends on the invoice layout of each supplier.
For example to extract the Total Amount on an invoice, you first define an OCR extraction rule to hold the full text or part of the text of a scanned document in an index field we typically call Text Block or Full Text. We then use a Replace Text rule to detach the amounts from currency symbols such as $ or € and remove any redundant spaces between digits with the Remove Characters rule. Both these rules are actually used to clean up the OCR text and make it ready to extract the amount.
Finally you would define a Find Number rule to extract the actual Total Amount which is the highest amount in cleaned-up text block.
In our use case below we use a set of French invoices to explain how the rules work.
01 Find Number – Add Rule
Find Number is defined in the MetaTool Extract tab.
Press the Add button and select Find – Number to add the find rule.
02 Find Number – Setup
In our example we will make use of the CB MetaTool Factures job. This job is automatically installed when you install CaptureBites MetaTool.
From below image samples we want to extract the Total Amount which is the highest amount in the right bottom corner of each invoice.
Then we remove the spaces around thousand separators and decimal points (“.” and “,”) using a Replace Text rule.
The rule looks like this:
Next we remove the spaces between digits using the Remove Characters rule.
The rule looks like this:
Finally, we extract the Total Amount with the Find Number rule. Select the index field to hold the extracted data.
In this case we select the index field “Montant Total” (Total Amount in French).
Optionally enter a description.
Match whole word: will only return numbers that are not connected to any other words. This means if the number is connected to other characters, for example: 500KG, 500 will not be found. If it would be written like 500 KG, 500 becomes a whole word and it would be returned. Disable match whole word to find numbers attached to other words.
For example, with the “Match whole word” option disabled a Find Number rule would find 500 in the word WEIGHT500KG
1) Keep all matches: this will return all numbers.
For example:
From this source:
2) Keep first match: this will return the first number and will skip all following numbers.
For example:
From this source:
3) Keep last match: this will return the last number and will skip all other numbers.
For example:
From this source:
4) Keep highest: this will return the highest number.
For example:
From this source:
5) Keep lowest: this will return the lowest number and will ignore every higher number.
For example:
From this source:
6) Keep highest positive or lowest negative: this will return the highest absolute value (the highest number regardless of its sign). So, in other words, it will keep the highest positive number or lowest negative number.
For example: -5 and 4 have an absolute value of 5 and 4. It will return -5, the lowest negative number or highest absolute value.
09 – Digits after decimal: by inserting a minimum and maximum value, it will only return numbers with the given number of digits after the decimal symbol.
Examples:
10 – Range: this gives you the possibility to decide whether you would like the number to be in a certain range or not.
For example, if the number is never above 5000 and never negative, you can give it a range from 0 to 5000: