Newspaper Ad Extraction And Classification

Newspaper Ad Extraction And Classification

N-A-E-A-C-01

During these changing times, it is inevitable to have knowledge of the information that is being published by the original sources. Today, technology offers many digital opportunities to bring in the required data to us. Along with the benefits, we also have the problems from fake publishers targeting user’s payment modes, collecting users personal information etc.  In this scenario, one reliable source is an authentic newspaper ad information. 

Every day we see different types of advertisements being posted in various categories like Real estate, tender notice, education and job vacancies etc. Classifying and extracting the details like date, organization, location, address and collecting stuff like application end date, vacant post, salary details manually from these advertisements is a difficult task and errors are inevitable.

Cognub has a data extraction framework which addresses all the problems in collecting, extracting and securing the valuable data. Our web app helps in uploading pdf versions of the newspaper. Detection and classification of ads, extraction of valuable information are done simultaneously in the given tabular format. Since the data is secured and maintained well, it is easy to check and gather the previously collected information.  This approach is customizable for various industry sectors and can be added to the data extraction pipeline of the business so that our customers can solely concentrate on enhancing their core capabilities to achieve their targets. 

Technologies we use

* OCR and random forest ML models.

* Image processing techniques.

* NER (named entity recognition) model for further classification and extraction.

Benefits

* Extraction and classification of bulky ads become easier.

* Details that are secured in a tabular format is easy to understand and maintain. 

* All the collected data in day to day bases is collected and secured.

Post A Comment

Protected by WP Anti Spam