Journal IJACSA (Special Issue): ''Automating the Shaping of Metadata Extracted from a Company Website with Open Source Tools'' - 13/07/2014 - RobertViseur.Be - Journal personnel

Navigation


RSS: billets



Nouvelle du 13/07/2014

[Mes publications]
[13-07-2014] Journal IJACSA (Special Issue): ''Automating the Shaping of Metadata Extracted from a Company Website with Open Source Tools''

Début 2014, la revue internationale en open access IJACSA lançait un appel à communication pour une édition spéciale dédiée au traitement automatique de la langue: ''IJACSA Special Issue on Natural Language Processing 2014''.

L'article que j'y ai publié présente une application pratique, à savoir l'extraction de terminologie et d'entités nommées (de type Person) au sein d'un ensemble de pages Web en vue de générer automatiquement des nuages de tags reflétant le contenu des sites Web. Les problèmes pratiques, les outils utilisés et les résulats atteints y sont présentés.



Résumé:

As part of a market analysis process, the objective was to automate the task of identifying the activities and skills of a collection of enterprises, namely Belgian and French open source companies. In order to avoid manual annotation through visual analysis of the websites' content, a tool chain was developed to collect the content of websites and extract the important terms. Standard software libraries were identified, allowing to clean up HTML documents and to perform the part-of-speech tagging process used for extracting terminology. This procedure is supplemented by the extraction and the recognition of named entities. The terms extracted in the HTML pages of a company website were then merged and filtered and a circular tags cloud was generated. This presentation facilitates the identification of important terms, commonly referred to as activities and technologies supported by the company. Several changes are planned for this prototype, including, in particular, the extension to the texts in French, the association of extracted terms to the vocabulary of a classification scheme and the automatic generation of dashboards to facilitate the monitoring of the evolution of the industrial sector.

L'article peut être téléchargé en local. Le numéro spécial IJACSA NLP2014 est consultable en ligne.

[Commentaires (0)]     [Lien permanent]


Faites connaître ce billet:

 

Publicité:


A propos de

Robert VISEUR Robert VISEUR
Mons (BE), 40 ans
Profil sur LinkedIn Profil sur Twitter


Publicité


Mes services


Mes portails


Mes moteurs


Mes comparateurs





Abonnez-vous à ce blog (RSS)