ZIVIANI PROJETO DE ALGORITMOS PDF
Nivio Ziviani. Berthier Ribeiro-neto .. Projeto de Algoritmos com Implementa¸co˜ es em Pascal e C. Pioneira Thomson, second edition. Ziviani, N. and Moura. naturalswiss-csalas.info~fortnow/beatcs/columnpdf, accessed  N. Ziviani, Projeto de Algoritmos com Implementações em Java e C++,. Thomson. PDF | We present and compare two methods for evaluating the syntactic similarity between documents. Nivio Ziviani at Federal University of Minas Gerais Download full-text PDF Projeto de Algoritmos com Implementac¸˜oes em.
|Language:||English, Spanish, Portuguese|
|ePub File Size:||23.80 MB|
|PDF File Size:||20.10 MB|
|Distribution:||Free* [*Regsitration Required]|
Ziviani, N. Projeto de Algoritmos com Implementações em Pascal e C, São Paulo , Brazil, Cengage Learning, ISBN 13 , , third edition. Um projeto de Nivio Ziviani, Ph.D. Professor Emérito do Departamento de Ciência Consultoria em Java e C++ de Fabiano Cupertino Botelho, naturalswiss-csalas.info Projeto de algoritmos: com implementações em Pascal e C. N Ziviani. Thomson BRN Paricia Correia Saraiva, Edleno Silva de Moura, Novio Ziviani, Wagner.
Skip to main content. Log In Sign Up. Nivio Ziviani. Berthier Ribeiro-neto. Altigran Silva. Cavalcanti2 Renato A.
One of our main focus of research in this area is that of automatic categorization of Web documents Calado et al. We investigated how link information can be accurate in predicting document categories. We have used a Bayesian network framework Calado et al.
As a result, we have obtained a method that has improved the accuracy from an average micro F1 value of 41 to roughly Further, we realized that the best categorization results can be obtained by using only the title of Web pages, combined with anchor text information and with link information. That means full-text might be discarded during the categorization process, which significantly reduces the computational efforts to determine the class of each page.
In another research direction, we are studying methods for determining the geographical scope of Web pages Moraes, We propose a geographical classification procedure for Web pages that takes advantage of different sources of information to enable the classification of individual Web pages, instead of whole sites.
Since Web pages usually contain very little data, we propose the use of text in other previously classified Web pages as a complementary source of information. Again, we show that classification accuracy can be increased by combining this information with link-based measures, but now using links to estimate how related a page is to a location.
First experiments indicate that, as in the case of general Web page categorization, link measures can significantly improve the accuracy of the final categorization. This work provides, therefore, an effective approach to the problem of classifying Web pages according to their geographical scope, by making use of sources of information commonly available on the Web, links and text. In the same research line, we have also worked on the automatic association between geographic information system Databases and Web documents Borges et al.
The Web represents nowadays the richest data source available online on many different subjects. Some of these sources store daily facts that often involve textual geographic descriptions. These descriptions can be perceived as indirectly geo- referenced data - e.
Under this perspective, the Web becomes a large geospatial database, often providing up-to-date local or regional information. We describe an environment that allows the extraction of geospatial data from Web pages, converts it to XML format, and uploads the converted data into spatial databases for later use in urban GIS.
The effectiveness of our approach is demonstrated by a real urban GIS application that uses street addresses as the basis for integrating data from different Web sources, combining these data with high-resolution imagery. A third research direction related to categorization is to determine a mechanism for detecting and retrieving documents from the Web with a similarity relation to a suspicious document.
The classification problem in this case is to determine whether a document is a plagiarism or not Pereira Jr and Ziviani, a, , b. Such algorithm has impor- tant practical applications, such as the verification of the originality of exams, homework at schools, articles submitted for conferences and possibly also in helping authors to find related work.
We have up to now proposed and studied several strategies for solving this problem. The most successful approach proposed is composed of three stages: In the first stage, the fingerprint of the suspicious document is used as its identification. The fingerprint is composed of representative sentences of the document.
In the second stage, the sentences composing the fingerprint are used as queries submitted to a search engine.
The documents identified by the URLs returned from the search engine are collected to form a set of similarity candidate documents. In the third stage, the candidate documents are compared to the suspicious document. The process of comparing the documents uses two different methods: Shingles and Patricia trees.
Preliminary results with these meth- ods indicates we are getting close to a very efficient solution to this problem Pereira Jr and Ziviani, Another studied practical categorization problem is to determine the best advertise- ment to be shown for each Web page presented to a user in a Web portal.
In this problem, an advertisement company has a set of clients that are willing to pay every time a user clicks on their announces. Thus, advertisements should be presented to the users trying to maximize the chance of reaching people interested in its subjects. The final goal is to maximize the gain with the advertisements shown. To solve this problem we are studying approaches based on k nearest neighbors and a model that represents the problem as a Markovian system.
This work is in a very preliminary stage, but the first experimental and theoretical results obtained indicate that we will probably produce a very competitive solution, when compared against the available commercial products.
An example of problem addressed in this research area is that of data integration, where it is necessary to integrate two databases that may have redundant objects modeled in a different way or with differences on textual elements, such as person names, book titles and so on.
This problem was addressed directly and indirectly in several research efforts of the group in the last two years Camillo et al. This problem was also largely addressed directly and indirectly by the group. One interesting specific work related to data extraction was the development of agents for data extraction Lage et al. As the Web grows, more and more data has become available under dynamic forms of publication, such as legacy databases accessed by an HTML form the so called hidden Web.
In situations such as this, integration of this data relies more and more on the fast generation of agents that can automatically fetch pages for further processing. As a result, there is an increasing need for tools that can help users generate such agents. In our research work, we have created a method for automatically generating agents to collect hidden Web pages.
This method uses a pre-existing data repository for identifying the contents of these pages and takes the advantage of some patterns that can be found among Web sites to identify the navigation paths to follow. To demonstrate the accuracy of our method, we have carried out experiments with sites from different domains Lage et al. Another work related to this topic is the conversion of unstructured queries to struc- tured queries Goncalves et al.
We have proposed and studied different solutions to solve this problem. The best approach proposed by us uses keywords as in a Web search engine for querying databases over the Web. The approach is based on a Bayesian network model and provides a suitable alternative to the use of interfaces based on multiple forms with several fields.
Two major steps are involved when querying a Web database using this approach. First, structured database-like queries are derived from a query composed only of the keywords specified by the user. Next, the struc- tured queries are submitted to a Web database, and the retrieved results are presented to the user as ranked answers. To demonstrate the feasibility of the approach, a simple prototype Web search system was developed and carefully tested.
Experimental results obtained with this system indicate that our approach allows for accurately structuring the user queries and retrieving appropriate answers with minimum intervention from the user.
We have worked on a version of our system for using with on-line information services. Since Web users are non-specialized and have a great variety of interests, interfaces for Web databases must be simple and uniform.
We have experimented our approach for querying Web databases using keywords only. Again, according to our approach, the user inputs a query through a simple search-box interface. From the input query, one or more plausible structured queries are derived and submitted to Web databases.
The results are then retrieved and presented to the user as ranked answers. In these experiments our approach has reduced the complexity of existing on-line interfaces and offered a solution to the problem of querying several distinct Web databases with a single interface. The applicability of the proposed approach was demonstrated by experimental results with 3 databases, obtained with a prototype search system that implements it.
We have also worked on integrating data from multiple Web sources Camillo et al. In this research line, we have studied and proposed ways for identifying and finding similar identities among objects from multiple Web sources. In the best approach we have found up to now, the object identification works like the relational join operation where a similarity function takes the place of the equality condition.
This similarity function is based on information retrieval techniques. Our approach differs from others in the literature since it can be used to identify objects more complexly structured e. In some practical situations the best solution for extracting data from Web is to develop domain-oriented methods. Although several techniques have been developed to the problem of Web data extraction, their use is still not spread, mostly because of the need for high human intervention and the low quality of the extraction results.
In this research line, we discuss the application of our domain oriented approach to automatically extracting news from Web sites. Our approach is based on a highly efficient tree structure analysis that produces very effective results. We have tested it with several important Brazilian on-line news sites and achieved very precise results, correctly extracting A fourth research line related to document management is to develop algorithms to deal with vagueness when processing queries over XML documents.
The classical approaches for accessing data, query languages and keyword search, can not be directly applied to applications accessing data which content the user is unaware about the representation. This can happen in a database where the instances are results of a Web data extraction and when the user conditions can have misspelling errors Dorneles et al.
This problem generates a scenario where queries having equality operators can led to empty results. A solution would be the use of similarity metrics for comparing data. In this research line, we are proposing and studying methods for accessing XML documents that uses textual similarity metrics. Further, as in XML we handle with nested structure - i.
So, for this purpose we are also working on some aggregated metrics for the nested structure. The current results we have obtained in this research line are an useful similarity search approach to deal with vagueness and a set of metrics for comparing elements of different types in XML documents. Finally, as a very important effort related to document and data management we have also studied is the problem of updating relational databases through XML views Bra- ganholo et al.
Using query trees to capture the notions of selection, projection, nesting, grouping, and heterogeneous sets found throughout most XML query languages, we have studied how XML views expressed using query trees can be mapped to a set of corresponding relational views.
We then have studied how updates on the XML view are mapped to updates on the corresponding relational views. Existing work on updating relational views can then be leveraged to determine whether or not the relational views are can be updated with respect to the relational updates, and if so, to translate the updates to the underlying relational database. They determine the accuracy in providing relevant answers to the users, and are also the key technological component for determining the success of an IR system.
Therefore, we have concentrated significant research efforts in the area of developing new information retrieval models Ahnizeret et al.
Nivio Ziviani - Google Scholar Citations
A first research line in this topic is the development of a model that combines knowl- edge from the data mining area with traditional information retrieval models. As a result, we have a new technique for computing term weights for index terms, which leads to a new ranking mechanism, referred to as set-based model. The components in our new model are no longer terms, but termsets.
The novelty is that we compute term weights using a data mining technique called association rules, which is time efficient and yet yields important improvements in retrieval effectiveness. The set-based model function for computing the similarity between a document and a query considers the termset frequency in the doc- ument and its scarcity in the document collection.
Experimental results show that our model improves the average precision of the answer set for all three collections evaluated. Another research line related to IR models is the design of models to improve the quality of Web intra-site search systems.
The idea in this case is to modify the Web site design in order to improve the effectiveness of the IR systems developed for the site modeled. In Web site design, a principle accepted by many authors is separation between information content, navigation structure and visualization.
This idea promotes a bet- ter understanding of the data requirements content , the underlying architecture of the site navigation and an appropriate user interface visualization. Furthermore it makes maintenance tasks easier as each of those components can be managed separately Caval- canti and Robertson, ; Vasconcelos and Cavalcanti, Recent technologies such as XML, XSL and style sheets also promote separation between content and visualization, encouraging and facilitating the development of methods for Web site construction based on those concepts.
Our proposal for Web site development is based on these ideas but innovates by mod- eling IR aspects of the application. Our assumption is by modeling specific IR attributes of the information content of a Web site, it is possible to develop search engines that reach a significative improvement in the overall ranking quality.
Our proposal merges an IR aware methodology and a model aware intra-site search engine development Ahnizeret et al. We are now working on an evolution of this idea where we automatically discover the structure of a Web site and apply our structured IR model using this structure.
We have previously tested this idea in the medical area Vale et al. In this study, the ICD codes are represented as a directed acyclic graph, and supplemented with acronym and synonym dictionaries. For each section of each document the acronyms and synonyms are converted to code strings and root node codes are identified.
A window of document terms around each root node term is created and the longest path from the graph including these terms is extracted.
These codes are assigned to the document in a ranked order by relative path length for that root. As a result, we have a model that allows the development of high quality information retrieval systems and high quality categorization systems that deal with medical documents. We are now working on a generalization of this strategy in order to apply it to other knowledge areas, such as classifying news databases, processing juridical information Silveira and Ribeiro-Neto, , classifying documents at a company, among others.
A fourth research line related to IR models is a work that uses previous queries sub- mitted to a system to determine the relation among these queries and use this information to improve the results for new queries.
In this research line, we have proposed and stud- ied a method to automatically generate suggestions of related queries submitted to Web search engines.
_Aula01 - OrdenacaoInsercao.pdf - Projeto e Análise de...
The method extracts information from the log of past submitted queries to search engines using algorithms for mining association rules. Experimental results per- formed with a commercial searching engine indicate that we can obtain with our model correct suggestions in Further, the related queries can also be used as information for a query expansion model, resulting in an improvement in the final quality of the answers provided by the systems Fonseca et al.
One of our research lines related to this area is the development of new distributed query processing strategies for search engines Badue, The novelty of our study in this research line is a real distributed architecture implementation that offers concurrent query service.
The distributed system we are proposing adopts a network of worksta- tions model and the client-server paradigm. The document collection is indexed with an inverted file. We adopt two distinct strategies of index partitioning in the distributed system, namely local index partitioning and global index partitioning.
In both strate- gies, documents are ranked using the vector space model along with a document filtering technique for fast ranking. We evaluate and compare the impact of the two index par- titioning strategies on query processing performance. Experimental results on retrieval efficiency show that, within our framework, the global index partitioning outperforms the local index partitioning.
Another research direction we are exploiting is to develop new pruning methods for search engines Fernandes, One way to address query processing efficiency without losing effectiveness is to reduce the amount of data to be processed at query time.
We are using text summarization as a compression tool. The novelty of this work arises from the fact that standard text summarization techniques stem from the domain of natural language processing, which in turn pay a premium for maintaining the summarized text readable. Our concern on the other hand is to maintain the summarized text retrievable.
That is, we aim at imposing a much lower overhead at query processing time and also resource consumption while still keeping the loss in retrieval effectiveness at a minimum. A third research line in the area of efficient is try to use data compression algorithms to reduce the size of text and indexes Ziviani, ; Ziviani and Moura, and, at the same time, to improve the efficiency of the IR systems.
We have in the past studied and presented several data compression features to provide economical storage, faster indexing, and accelerated searches in IR.
Using results obtained from these previous efforts, we are now working on the topic of XML compression. XML has become a de facto standard for data exchange over the Internet. However, efficiently storing and querying XML data is still an open problem. Thus, several recent efforts have been made to deploy techniques to directly query over compressed XML data. We have worked on developing a system for efficiently compressing XML files that represent semi structured documents and allow querying over the compressed file without require decompression.
This research line is based on results previously obtained by researchers of our group and consists of adapting compression methods proposed for plain text in the context of semi structured documents.
Moraes, F. Glater, R. Marczewski, A. Veloso, A. Sabir, R. Random Walks on the Reputation Graph. Ribas, S.
Costa, T, Lacerda, A. Santos, A. Hata, I. Pappa, G. Menezes, D. Vidal, M. Okada, K. Henrique, W. Menezes, G. Pereira, D. Genealogical Trees on the Web: Berlt, K, Moura, E. Understanding Content Reuse on the Web: Nasraoui, M. Spiliopoulou, J. Srivastava, B. Mobasher, B. Masand Eds , , Baeza-Yates R. Modesto, M. Azevedo, M. Manaus, Brazil, October Mendes, C. Possas, B. Saraiva, P. Veloso, E.
Silva, I. Ribeiro-Neto, B. Meira Jr. Neubert, M. A,August , Macedo, A. Kitajima, J. Apostolico and J.
Hein Eds. Baeza-Yates Ed. Kitajima J. A Quicksort BasedApproach", In: Barbosa E. I, Campos, I. Goos, G.
TOP Related Files:
- THERMAL ENGINEERING BY DOMKUNDWAR PDF
- HRM NOTES PDF
- SPEAKWELL ENGLISH BOOK MARATHI
- DIN 1076 PDF
- HOUSE OF NIGHT HIDDEN PDF
- TAFSEER IBNE KASEER IN URDU BOOK
- BLACKJACK CARD COUNTING PDF
- EBOOK 101 WAYS TO FLIRT
- KOSTKA SMIERCI PDF
- LYNC ADDRESS BOOK PORTS
- MOORE AND DALLEY CLINICALLY ORIENTED ANATOMY PDF
- EBOOK MEDICAL DIAGNOSIS AND TREATMENT
- WARHAMMER BRETONNIAN ARMY BOOK PDF