Table of contents
Open Table of contents
Drunk and angry: the motivation
A few weeks ago, I was at a bar with my coworkers from Fintoc discussing how uncomfortable the process of actually paying for the bill is. Typically bars don’t allow tables to split the bill between every participant, so what ends up happening is that one person pays the whole bill and then has to somehow charge the rest of the participants.
As we all know, this is a huge PITA. When you pay for the bill, the rest of the participants usually mumble something vague like “send me a picture of the receipt I will transfer you the money immediately” or some other form of the same 🐂shit. But you know that most of them will have forgotten about the damn receipt picture as soon as they get into their cars. Idiots.
At Fintoc, we use a different strategy: whoever pays the bill writes a Google Sheet that contains every item on the bill as rows and every participant as columns. This Google Sheet then gets sent to the rest of the participants, so all they have to do is find their column, write the amount that they consumed of each item on its corresponding row and get the amount they owe directly on the Google Sheet. The idea is to make it so simple for everyone that no one has any excuse not to pay.
As you can imagine, this works! But the participant that actually pays for the bill has to manually write down each one of the items consumed at the bar on the Google Sheet (this can sometimes be near 30 different items). No wonder no one ever wants to pay!
So that’s how one night coming home back from the bar I decided that I would give it a shot. Bear in mind, I didn’t (and still fundamentally don’t) know anything about image processing or image to text recognition, so everything I talk about here is a result of me trying to make my friends (and I) have a better time when paying the complete bill (also, it was 1 AM).
The Application
Disclaimer: This post won’t really talk much about the application itself. More of that is coming soon…
Obviously, what I needed to do was to write a complete application. This application needed the following characteristics:
- The participant actually paying the bill needs to simply upload the receipt and the application must read the receipt image and have a URL created for the rest of the participants with the detected items.
- Once participants enter to the generated URL, they can select the items that they consumed directly on the UI.
- Finally, participants can see the total amount they owe directly on the UI.
But in order for me to build this application (which, by the way, I already built), I needed a way to first transform the receipt image into the items used by the application. Since I didn’t really find anything remotely resembling what I was looking for (and since I really love to learn about technology), I decided to write the module myself.
The Receipt Scanner
Disclaimer: Everything I talk about here I learned empirically, so I most certainly am wrong about most of it. If you see an error or a way in which I can improve the algorithm, please contact me! 💖
Enter: receipt-scanner. The idea was simple: write a basic module that receives an image path (or URL), processes said image and retrieves the text found on the receipt. Simpler said than done, I found.
To achieve the goal of creating a receipt scanner, I discovered that there are roughly 3 steps involved in reading the text from a receipt image:
- Finding the borders
- Processing the image
- Actually reading the text.
Throughout this section, I will show the incremental changes that some filters have over a receipt image example. I will be using the following image as the base:
Finding the borders
To find the borders of the receipt, you first have to play with some magic over the image.
First, I apply a black border around the image. This is a hacky trick to be able to actually detect images where the receipt isn’t showing completely inside the frame, so one of the borders wouldn’t be detected and no rectangle would be found.
Then, you need to compress and resize the image. This is because handling a large image might unnecessarily eat up the RAM on your server (you don’t really need details to find the largest rectangle border available), but also because too much detail on the image makes it impossible to accurately find borders (letters get confused with borders, for example) and you are going to remove detail later anyways, so you might as well compress and resize.
After resizing you can proceed to soften the image as much as possible. My implementation starts by using a morphological closing operation on the image, which removes stuff like text from the receipt and textures in general. After that, I apply some blurs over the image and then I apply the canny edge detection algorithm. Finally, I apply a dilation filter so that the edges that are almost touching together will touch.
Once the image is all chewed up, I find the best contour for the receipt.
The code that executes each of the steps mentioned above looks something like this:
original_image = open_image(file_name)
chewed_image = Filter.apply(
original_image,
CompressFilter(),
ResizeFilter(EDGE_DETECTION_TARGET_WIDTH),
MorphologicalCloseFilter(iterations=4),
MedianBlurFilter(),
GaussianBlurFilter(size=3),
CannyFilter(),
DilateFilter(),
)
contour = find_contour(chewed_image)
Tip: notice how abstracted is every filter. You can read the implementation of each filter (or the custom-made filter system, for that matter) on the receipt-scanner repository.
Processing the image
Once I (hopefully) find the contour on the chewed image, the original image is ready to be processed.
I start by wrapping the perspective of the original image (like stretching it, think CamScanner). In this process I extract the contour found on the chewed image, I project it over the original image and then I cut that portion of the image and transform it into a perfect rectangle.
Once the perspective has been wrapped, I resize the resulting image to a fixed target width. This helps a bit with small images, and doesn’t really hurt huge images so I can save some RAM for those receipts.
With my resized image, I proceed to apply median blur, denoise and then apply gaussian blur (this was the application order that behaved the best during my not-so-thorough tests).
Finally, I change the color of the image to a grayscale and I binarize it (which means that any pixel above a threshold gets transformed to pure black and every other pixel gets transformed to pure white).
The code that executes each of the steps mentioned above looks something like this:
processed_image = Filter.apply(
original_image,
PerspectiveWrapperFilter(contour),
ResizeFilter(TEXT_CLEANUP_TARGET_WIDTH),
MedianBlurFilter(),
DenoiseFilter(),
GaussianBlurFilter(),
GrayscaleFilter(),
BinarizeFilter(),
)
Reading the text
The last part is so simple it might sound like a joke, but reading the text from the image is actually something of a “solved problem” (not really, but the solutions are pretty awesome).
This part (to the extent of my knowledge) needs to be executed by a ML algorithm. Luckily, a very robust algorithm already exists. It is called Tesseract. So actually, the last bit of code looks like this:
import pytesseract
text = pytesseract.image_to_string(processed_image, config="--psm 4 -l spa+eng")
Nothing fancy, just using the available tools will do. What I found about Tesseract is that it is very grumpy, and needs to receive an almost perfectly processed image to work OK.
Closing
Oh boy did I learn a thing or two about image processing and using Tesseract. But I was able to learn only because of the awesome resources I found online on the subject (countless blogs and how-to’s about many different pieces of the algorithm).
Because I had to work so much to be able to write this module, I opened it with an MIT license and uploaded to PyPi, so that you can use it for your projects too!
I invite you to try Split (the application I wrote using the Receipt Scanner) and to write your own tools using receipt-scanner! I enjoyed the journey, and hope to continue learning about this subject 💖.
If you want to talk to me about anything I mentioned on this post or if you simply want to chat, contact me on one of my socials!