AUFAURE Marie-Aude

 

Home  | Bio Sketch    | Research |      Publications    | Projects         | Teaching      

 

 

 

Research themes

 

·         Document mining (content and/or structure) and its use

 

1.     Ontology extraction from web pages

 

We proposed an approach for ontology construction from Web pages which is based on a contextual and incremental clustering of terms. Our approach defines and evaluates a context-based clustering algorithm for ontology learning included in a global architecture for knowledge discovery for the semantic Web. This algorithm is based on an incremental use of the partitioning K-means algorithm and is guided by a structural context. This context is based on the HTML structure and the location of words in the documents. This contextual representation guides the clustering algorithm to delimit the context of each word by improving the word weighting, the word pair’s similarity and the semantically closer cooccurents selection for each word. Our algorithm refines the context of each word cluster and improves the conceptual quality of the resulting clusters and consequently of the extracted concepts. We have defined a set of criteria for evaluating the ontological concepts. We experiment the contextual clustering algorithm on HTML document corpus related to the tourism domain (in French) and we evaluate the extracted ontological concepts with our contextual algorithm. The results show that the appropriate context definition and the successive refinements of clusters improve the relevance of the extracted concepts in comparison with a simple K-means algorithm. Our evaluation of ontological concepts can be applied to any domain and provides qualitative and quantitative criteria.

 

2.     Contextual Information Retrieval and Extraction – Social Networks analysis – visualisation paradigms for large data sets

 

In this work, we define an information retrieval methodology which uses Formal Concept Analysis in conjunction with semantics to provide contextual answers to Web queries. The conceptual context defined can be global - i.e. stable- or instantaneous- i.e. bounded by the global context. Our methodology consists first in a pre-treatment providing the global conceptual context and then in an online contextual processing of users requests, associated to an instantaneous context. The pre-treatment consists in computing offline a conceptual lattice from data sources in order to build an overall conceptual context. Then, the information retrieval is performed in real-time: users formulate their query with terms from the thesaurus/ontology. Users may then navigate within the lattice by generalizing or on the contrary by refining their query. A similarity measure has been defined to find the closer concepts starting from an entry point of the lattice, in order to help the user to navigate. Our information retrieval process was illustrated through experimentation results in the tourism domain. One interest of our approach is to perform a more relevant and refined information retrieval, closer to the user’s expectation. We add a semantic layer to the conceptual and data ones. The similarity measure helps the user to navigate through big lattices by ranking the neighbour concepts. This method is generic and can be applied to any heterogeneous data sources (Web data, personal data, social networks, etc.). We also define conceptual and visual footprints for online social networks characterization. We experiment large data sets visualization methods based on pixel-oriented techniques and compare them with some traditional visualisation methods based on Multi-Dimensional Scaling.

 

Collaboration: University Paris 6 (LIP 6 Lab.), Complex Networks Team (http://www.complexnetworks.fr/, http://www-rp.lip6.fr/site_npa/site_rp/graphs.php)

 

 

·         Knowledge Management

 

1.     A Knowledge Base for Ontology Building: application to Semantic Information Retrieval

 

Our objective here is to propose a semiautomatic construction of ontologies from web pages. To achieve such an objective, we build a knowledge base to represent web knowledge which is specified using a metaontology containing the knowledge related to the task of domain knowledge extraction. Our architecture is based on ontological components, defined by the metaontology, and related to the content, the structure and the services of a determined domain. In this architecture, we specify three interrelated ontologies: the domain ontology, the structure ontology and the services ontology. Our metaontology is able to store the knowledge related to different techniques and methods for ontology construction. We have defined a semantic on-line information retrieval system using this web knowledge architecture. This on-line information retrieval system enriches the user query with domain concepts and classifies the web documents according to the concepts and the services; it also gives the user the opportunity to detect a set of services related to a given concept. The comparison with other systems shows that the precision is improved.

 

Collaboration: ENSI Tunis (RIADI Lab.)

Funded by STIC INRIA-Tunisia project on geographical web information retrieval and personalisation

 

1.     Ontological Knowledge Evolution Methodology      LogoOntoEvoal

 

Ontologies are used as a key for semantic modelling, offering consensual and formal knowledge specification. They are more and more applied to open and dynamic environments and modelling knowledge that evolve continuously. To take into account all evolving aspects, ontologies have to be adapted to change requirements. In this work, we propose a methodological approach for ontological knowledge maintenance focusing particularly on OWL ontologies. Several problems emanate from ontology evolution: capturing change requirements, change specification, change application, change traceability, change propagation to dependant artefacts, etc. The goal of the methodology is to manage ontology evolution in a systematic and optimized manner while maintaining consistency and evaluating change impact on ontology quality. In this paper, we propose a pattern oriented ontology change management approach, namely Onto-Evoal. The modelled patterns correspond to changes, inconsistencies and resolution alternatives. Based on these patterns and the links between them, we propose an optimized and automated change management process guiding and controlling change application while maintaining consistency of the evolved ontology. In addition, a quality model is proposed to evaluate the impact of the different alternatives on ontology quality and guide the user on the resolution of inconsistencies.

Change management depends on the ontology representation model, we focus on OWL model and we take into account change impacts on logical consistency with respect to OWL DL constraints.

 

Funded by RNTL Dafoe Project

 

 

·         Semantic Information Retrieval and Personalisation

 

1.     Semantic Information Retrieval using personal fuzzy ontologies

 

Ontology can be seen as a semantic layer allowing finding more relevant documents according to a user’s query. Fuzzy logic is used in IR to solve the ambiguity and vagueness issues, by defining flexible queries or fuzzy indexes. In this work, we have extended an existing prototype (see Knowledge Management section) with fuzzy ontologies. SIROF uses the fuzzy ontology for query reformulation and for documents and query indexing. A fuzzy ontology is owned by each user, and the weights are modified according to the user’s queries. Each user have an own personalized fuzzy ontology.

The main contributions of our system are: (1) automatic fuzzification of a domain ontology taking account of both taxonomic and non taxonomic relations, (2) query reformulation based on the weights associated to all the relations existing in the fuzzy ontology, and (3) use of this fuzzy ontology to classify documents by services.

 

 

2.     Integration of spatial constraints in a personalised information retrieval system

 

We propose new approach to personalize information and especially spatial information by considering together the spatial and the semantic contexts. This approach can be considered as an aid to navigate for example while travelling on an urban space by highlighting the locations which might be of interest respecting the spatial constraints. The personalization approach tends to analyse users navigations to build a user profile describing his interests. The user profile is used to filter the contents and extract the semantically relevant information. The proposed approach develops a model oriented towards the representation and approximation of users' profiles and preferences. We build a user’s network and use both spatial information of the network peers and information on the semantic similarity. A prototype was developed and applied to the tourism domain.

 

 

Funded by STIC INRIA-Tunisia project on geographical web information retrieval and personalisation

 

3.     Personalized web content retrieval based on web usage mining

 

This work is a part of the Eiffel project, which aims at developing a semantic search engine dedicated to tourism. We have tried to address the exploration search problem by adding personalization facilities to our solution according to user preferences and profile. These user’s preferences and profile are part of a user model that represents the whole context of navigation of the user in tourism websites. This user model is enriched with information extracted from log files using Web Usage Mining techniques. We have defined a methodology for processing web logs acquired from many sources. The extracted information as well as additional information (such as a spatial localization for example) is stored in a data warehouse.

 

Funded by RNTL Eiffel project “Semantic Web and e-Tourism”

Students

 

·         PhD students

 

Myriam Hadjouni (PhD ENSI Tunis & University Paris 11 – 3rd year): Spatial Web Personalisation

Nesrine Ben Mustapha (PhD ENSI Tunis & co-direction Centrale Paris – 2nd year): Collaborative ontology learning and Semantic Search

Rania Soussi (PhD ENSI Tunis & Centrale Paris – 2nd year): Social Network extraction from relational databases

Raphaël Thollot (PhD Centrale Paris and SAP Business Objects – 1st year): A situational platform for Business Intelligence

Micheline Elias (PhD Centrale Paris and SAP Business Objects – 1st year): Human Computer Information Retrieval in BI dashboards

Nicolas Beauger (PhD Centrale Paris and SAP Business Objects – 1st year): Query & Answering in a Business Intelligence Context

 

·         Post doctoral students

 

Rim Djedidi: CSDL Project (Complex Systems Design Lab): decision-making collaborative environment for complex systems

 

·         Past students

 

Rim Djedidi (PhD University Paris 11): Ontological Knowledge Evolution Methodology (2009)

Riadh Trad (ENSI Tunis – internship): Visualisation and interpretation of large conceptual graphs (2009)

Ramzi Haddad (ENSI Tunis - internship): Integration of spatial constraints in a personalisation system (2008)

Paul Barbotin and François Thisse (Naval Academy - internships): Analysis of data modelled by graphs (2008)

Lobna Karoui (PhD Supelec & University Paris 11): Ontology extraction from web pages (2008)

Zeina Jrad (post-doc): User modelling and web personalisation (2006 – 2008)

Zied Boulila (internship – ENSI Tunis): Ontology Evolution (2008)

Nesrine Ben Mustapha (master - ENSI Tunis): A framework for ontology building: application to the Semantic Web (2007)

Rania Soussi (master - ENSI Tunis): Fuzzy Ontology and Semantic Information Retrieval (2007)

Thomas Monjo and Lucile Beguin (Naval Academy - internships): Visualisation of Large Data Sets (2007)

Saoussen Sakji (master - Supelec): Semantic Interpretation of Web sites’ content (2006)

Hassane Abboute (master - Supelec): Evolution and Enrichment of a domain ontology (2006)

Christine Bonhomme (PhD student): A visual language for querying Geographical Information Systems (2000)

Ahmed Lbath (PhD student): A Visual Tool Case for Geographic Information Systems (1997)