Digitized graph from its image(jpg/png)
I am working on some OCR to check exam answer-sheets of school students. One big challenge is to scan graphs from the jpg images and have a rough digitized graph from JPG images. I noticed many online software for the same, but I want to apply it from scratch.
There are several steps you can take to digitize charts from images.
Image preprocessing: This step prepares the image for processing. This includes tasks such as converting images to grayscale, applying thresholds, and removing noise.
Edge detection: This step identifies the edges of the graph in the image. This can be done using edge detection algorithms such as Canny edge detection, Sobel edge detection.
Hough Transform: This step identifies lines in the image. Hough transform is a popular algorithm for this task. Transform the image into a parameter space where lines are represented as points.
Line intersections: Once lines are identified, line intersections can be used to locate data points on the chart. You can use an image processing library such as OpenCV to detect lines on an image.
Data Extraction: Once you have your data points, you can extract the x and y coordinates of the points and use them to plot a graph. Plot Charts: Charts can be plotted using plotting libraries such as Matplotlib and Seaborn.
Note that this process may not be perfect, especially if the image is not sharp or noisy. You can also use OCR to extract text data from images.
I hope this gives you an idea of the steps you can take to digitize the chart from the image.If you have any specific questions or would like further information, feel free to contact me. .
Muhammad John Abbas
Welcome to the community, @Husain_Malwat!
I think it’s great you want to apply it from scratch since I am convinced this way you can learn a lot. As mentioned before, I would also suggest to take a look at OpenCV!
Regarding the Hough Line Transform:
It will surely will work if you have only linear plots (or circles) but in general the suitability is limited for non-linear patterns.
In non-linear cases, OpenCV should still get you covered. Feel free to take a look at these tutorials:
After detecting your edges and extracting data, you could describe your data with a model, either w/ splines or e.g. with a regression model. Here ScikitLearn might also be helpful.
[During my studies we had to implement the SIFT algorithm and also here lots of manual pre- and postprocessing of data was needed as we now it from the CRISP-DM. I guess dependent on your plot complexity this will also be the case, especially if you have several overlaying or crossing plots: additional data processing will be needed, like separation or contour detection, filtering etc. ]
As some idea:
- dependent on your model you can (preferably) draw the line in a vectorized way (so that the quality will not suffer if you zoom in). This tutorial can be interesting for you.