- Vous publiez gratuitement votre projet informatique ou graphique.
- Vous recevez des offres de la part de prestataires qualifiés et évalués.
- Vous faites réaliser votre projet au prestataire choisi, en toute sécurité.
PROJECT BACKGROUND:
Our company develops a Price comparison web service. when a visitor searches a product, we display all the products corresponding to its search by grouping those that have identical “model”. A product is characterized among others by a category (e.g. digital camera), a brand (e.g. Sony), a model (e.g. DSC T9)…
Example:
A user searches on our site for a digital camera: he clicks on the “HiFi-Photo-Video” category then on “digital Camera” category. We then display all the digital cameras numerical models. For example, those whose model is “DSC T9” are gathered and the price range 319€ to 369€ is displayed.
The product data is sent to us from about 50 internet vendors (Virgin, Fnac ...) and affiliation networks (NetAffiliation, TradeDoubler…). This is received every night using xml or csv files that we need to proceed and then load into our database. We receive about a total of 500.000 products. The products are characterized in these flows by a category, a brand, a product name, a model, a price, a stock…
Example:
The flow of merchant A contains the following product data:
Category: Digital Camera
Brand: Sony
Model:
ProductName: Cybershot DSC T9
Description: This digital camera is the last born from Sony, it combines… Price: 349€
The flow of the merchant B contains the following product data:
Category: Camera
Brand: Sony
Model:
ProductName: DSC-T9
Description: Sony Cybershot T9 is a of 5 megapixel digital camera, it… Price: 369€
The flow of the merchant C contains the following product data:
Category: Numerical camera
Brand: Sony
Model: Cybershot DSC T9
ProductName: Sony T9
Description: T9 of Sony has a stabilizer of image which…
Price: 319€
PROBLEM:
There are 2 distinct problems:
1. The model is sometimes not defined in the model field, but is specified in the ProductName or in the description.
2. the same model is named in different ways (here DSC T9, DSC-T9, Cybershot DSC T9…) It is impossible for us to ask all the internet merchants to use the same model naming and to write the model always in the "model" field
DESCRIPTION OF THE NEED:
What we need is the development of a "matching module" to identify in this example that the products from merchants A, B and C correspond actually to the same model (same Sony Cybershot DSC T9). To do this matching, it is possible to use a database or a table that contains all "known" brands and models. This module will have to do the matching automatically every night when the product data feeds are received from the internet merchants. Part of the module can be manual to validate new brands or models. The module can be developed in any language but will need afterwards to be integrated in our platform that is based on PHP and MYSQL.