The Flamingo was center strip before there was a center strip. It's old vegas on the strip. But since it's a Harrah's property now, the games. We updated our Flamingo Package (2.0.1) for compatibility with the latest GCC version (4.3.2). We are glad to release the second version of our Flamingo Package on approximate string matching. We are glad to release the Flamingo Toolkit that contains UDF functions for MySQL. Find the latest Flamingo promo codes, coupons & deals for October 2020 - plus earn 1.0% Cash Back at Rakuten. Join now for a free $10 Welcome Bonus. The greater flamingo is the tallest of the six different species of flamingos, standing at 3.9 to 4.7 feet (1.2 to 1.4 m) with a weight up to 7.7 pounds (3.5 kg), and the shortest flamingo species (the lesser) has a height of 2.6 feet (0.8 m) and weighs 5.5 pounds (2.5 kg).
Contributors
Alexander Behm (Ph.D. Student)
Shengyue Ji (Ph.D. Student)
Liang Jin, graduated from UC Irvine in 2005.
Chen Li (Faculty)
Jiaheng Lu, postdoc, 2006-2008. Now a faculty at Renmin University, China.
Yiming Lu, graduated from UC Irvine in 2008.
Rares Vernica (Ph.D. Student)
« Back to Flamingo Main Page
Getting Started
Please refer to the Flamingo Getting Started Guide.
Flamingo 1 3 0 X 2
Introduction
This release (in C++) includes the source code of severalalgorithms for approximate string matching developed at UC Irvine. Itincludes algorithms for approximate selection queries, selectivityestimation for approximate selection queries, approximate queries onmixed types, and others. Although an implementation for approximatejoins is included, the focus of this release is on approximateselection queries.
Here is a brief explanation of the terms used above:
Approximate String Search: Given a collection of strings and a single string,how to find those strings in the collection that are 'similar to' thegiven string? This functionality is implemented by the modulesCommon, FilterTree,Listmerger, StringMap,andPartEnum. We recommend getting started withthe FilterTree module for this purpose.
Selectivity Estimation for Approximate String Search: Givena collection of strings and a single string, how can we estimate thenumber of strings that are 'similar to' the given string? Thisfunctionality is implemented in the SEPIA module.
Approximate String Join: Given two collections of strings (possibly the samecollection), how to find those pairs of strings that are 'similar to'each other?
There are various string similarity functions, such as Levenshtein Distance (aka the Edit Distance),Jaccard Similarity,Cosine Similarity, and DiceSimilarity. The following is a description of the modulescorresponding to the source directory structure: Dropdmg 3 5 10000.
Common: This module contains classes for supporting the following similarity functions / distance measures: Levenshtein Distance (aka Edit Distance), Jaccard Similarity, Cosine Similarity, Dice Similarity. It alsoprovides functionality for decomposing strings into grams.
FilterTree: This module provides functionality for approximate string searchusing an inverted-list index. Furthermore, query performance can be improved by adding filters, i.e. partitioning the string collection into disjoint subsets according to some property (e.g. the length of the strings). The use of filters is facilitated by a hierarchical structure (the FilterTree), in which each level in the tree corresponds to one filter. We have implemented the length and charsum filter. This package contains three flavors of indexes: in-memory indexes compressed & uncompressed and a disk-based index.
ListMerger: Answering approximate string queries based on an inverted-list index requires finding elements that occur at least T times on the inverted lists belonging to the grams in the query string (T depends on the similarity metric and the similarity threshold). This problem is commonly referred to as the T-occurrence problem.This module implements several algorithms for solving the T-occurrence problemas described in 'Efficient Merging and Filtering Algorithms for Approximate String Searches',Chen Li, Jiaheng Lu and Yiming Lu, ICDE 2008.In addition, we have implemented efficient algorithms for disk-based indexes.
MAT-Tree: MAT-tree is an indexing structure to supportqueries on data with an approximate string predicate and a numericpredicate. A typical query is: 'Find employee records whose name issimilar to Speilberg and whose age is close to 45.' The indexingstructure is proposed in the following paper: 'Indexing Mixed Typesfor Approximate Retrieval,' Liang Jin, Nick Koudas, Chen Li, AnthonyK.H. Tung, VLDB 2005, Trondheim, Norway.
SEPIA: This technique solves the problem of estimating theselectivity of an approximate string predicate. It can answerquestions such as: 'From a collection of strings, how many of themhave an edit distance within 3 to a given string?'. Such informationcan be used in optimizing queries of approximate string matching. Thetechnique was published in the paper: 'Selectivity Estimation for FuzzyString Predicates in Large Data Sets,' Liang Jin and Chen Li, VLDB2005, Trondheim, Norway.
StringMap: This algorithm maps strings from theedit-distance metric space to a high-dimensional Euclidean space, anduses a multi-dimensional indexing structure to answer approximatequeries. The algorithm is published in the paper: 'Efficient RecordLinkage in Large Data Sets,' by Liang Jin, Chen Li, and SharadMehrotra, in 8th International Conference on Database Systems forAdvanced Applications (DASFAA) 2003, Kyoto, Japan.
PartEnum: This algorithm is published in the paper:'Efficient Exact Set-Similarity Joins,' Arvind Arasu, Venkatesh Ganti,Raghav Kaushik, VLDB 2006. We implemented the algorithm tosupport approximate string matching queries, excluding approximate joins.
TopK: This package contains algorithms for efficient Top-K approximate string search.
In addition, we have provided some commonly used functions in theutil directory.
Changes in Version 3.0 (compared to Version 2.0.1)
Added Compressed Indexers based on the Techniques from: 'Space-Constrained Gram-Based Indexing for Efficient Approximate String Search', by Alexander Behm, Shengyue Ji, Chen Li, and Jiaheng Lu, in ICDE 2009
Added Module for Top-K Approximate String Search from: 'Efficienttop-k algorithms for fuzzy search in string collections', by RaresVernica, Chen Li, in KEYS 2009: 9-14. (Workshop on Keyword Search onStructured Data, collocated with SIGMOD 2009)
Added Disk-Based Inverted Index, Disk-Based StringContainer andEfficient Search Algorithms using the Disk-Based Components from: 'Answering Set-Similarity Selection Queries on Large Disk-ResidentData Sets', by Alexander Behm, Chen Li, Michael J. Carey, UCI TechnicalReport 2010
Added Some Auto-Tuning Features, e.g. Automatic Choice of Partitioning Filter
Bibtex
Name
flamingo-3.0.tgz
2.4M
README.txt
Acknowledgements: This release is partiallysupported by theNSF CAREERAwardNo. IIS-0238586,the NSFaward No. IIS-0742960,the NSF-funded RESCUEproject, a Google Research Award, a gift fund from Microsoft, a fundfrom CalIt2, theNSF CluEProject and the ASTERIXProject funded by the NSF. Many thanks to Minh Doan, and Kensuke Ohta for theirvaluable testing and feedback on the code and documentation.
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of The Regents nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
The end-user understands that the program was developed for researchpurposes and is advised not to rely exclusively on the program for anyreason. THE SOFTWARE PROVIDED IS ON AN 'AS IS' BASIS, AND THE REGENTS ANDCONTRIBUTORS HAVE NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT,UPDATES, ENHANCEMENTS, OR MODIFICATIONS. THE REGENTS AND CONTRIBUTORSSPECIFICALLY DISCLAIM ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING BUTNOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FORA PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THEREGENTS OR CONTRIBUTORS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT,SPECIAL, INCIDENTAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES, INCLUDING BUTNOT LIMITED TO PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES, LOSE OFUSE, DATA OR PROFITS, OR BUSINESS INTERRUPTION, HOWEVER CAUSED ANDUNDER ANY THEORY OF LIABILITY WHETHER IN CONTRACT, STRICT LIABILITY ORTORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THEUSE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF ADVISED OF THEPOSSIBILITY OF SUCH DAMAGE. If you do not agree to these terms, do not download or use thesoftware. This license may be modified only in a writing signed byauthorized signatory of both parties.
Jiaheng Lu, postdoc, 2006-2008. Now a faculty at Renmin University, China.
Yiming Lu, graduated from UC Irvine in 2008.
Rares Vernica (Ph.D. Student)
« Back to Flamingo Main Page
Flamingo 1 3 0 1
Getting Started
Please refer to the Flamingo Getting Started Guide.
Introduction
This release (in C++) includes the source code of severalalgorithms for approximate string matching developed at UC Irvine. Itincludes algorithms for approximate selection queries, selectivityestimation for approximate selection queries, approximate queries onmixed types, and others. Although an implementation for approximatejoins is included, the focus of this release is on approximateselection queries.
Here is a brief explanation of the terms used above:
Approximate String Search: Given a collection of strings and a single string,how to find those strings in the collection that are 'similar to' thegiven string? This functionality is implemented by the modulesCommon, FilterTree,Listmerger, StringMap,andPartEnum. We recommend getting started withthe FilterTree module for this purpose.
Selectivity Estimation for Approximate String Search: Givena collection of strings and a single string, how can we estimate thenumber of strings that are 'similar to' the given string? Thisfunctionality is implemented in the SEPIA module.
Approximate String Join: Given two collections of strings (possibly the samecollection), how to find those pairs of strings that are 'similar to'each other?
There are various string similarity functions, such as Levenshtein Distance (aka the Edit Distance),Jaccard Similarity,Cosine Similarity, and DiceSimilarity. The following is a description of the modulescorresponding to the source directory structure:
Common: This module contains classes for supporting the following similarity functions / distance measures: Levenshtein Distance (aka Edit Distance), Jaccard Similarity, Cosine Similarity, Dice Similarity. It alsoprovides functionality for decomposing strings into grams.
FilterTree: This module provides functionality for approximate string searchusing an inverted-list index. Furthermore, query performance can be improved by adding filters, i.e. partitioning the string collection into disjoint subsets according to some property (e.g. the length of the strings). The use of filters is facilitated by a hierarchical structure (the FilterTree), in which each level in the tree corresponds to one filter. We have implemented the length and charsum filter. This package contains three flavors of indexes: in-memory indexes compressed & uncompressed and a disk-based index.
ListMerger: Answering approximate string queries based on an inverted-list index requires finding elements that occur at least T times on the inverted lists belonging to the grams in the query string (T depends on the similarity metric and the similarity threshold). This problem is commonly referred to as the T-occurrence problem.This module implements several algorithms for solving the T-occurrence problemas described in 'Efficient Merging and Filtering Algorithms for Approximate String Searches',Chen Li, Jiaheng Lu and Yiming Lu, ICDE 2008.In addition, we have implemented efficient algorithms for disk-based indexes.
MAT-Tree: MAT-tree is an indexing structure to supportqueries on data with an approximate string predicate and a numericpredicate. A typical query is: 'Find employee records whose name issimilar to Speilberg and whose age is close to 45.' The indexingstructure is proposed in the following paper: 'Indexing Mixed Typesfor Approximate Retrieval,' Liang Jin, Nick Koudas, Chen Li, AnthonyK.H. Tung, VLDB 2005, Trondheim, Norway.
SEPIA: This technique solves the problem of estimating theselectivity of an approximate string predicate. It can answerquestions such as: 'From a collection of strings, how many of themhave an edit distance within 3 to a given string?'. Such informationcan be used in optimizing queries of approximate string matching. Thetechnique was published in the paper: 'Selectivity Estimation for FuzzyString Predicates in Large Data Sets,' Liang Jin and Chen Li, VLDB2005, Trondheim, Norway.
StringMap: This algorithm maps strings from theedit-distance metric space to a high-dimensional Euclidean space, anduses a multi-dimensional indexing structure to answer approximatequeries. The algorithm is published in the paper: 'Efficient RecordLinkage in Large Data Sets,' by Liang Jin, Chen Li, and SharadMehrotra, in 8th International Conference on Database Systems forAdvanced Applications (DASFAA) 2003, Kyoto, Japan.
PartEnum: This algorithm is published in the paper:'Efficient Exact Set-Similarity Joins,' Arvind Arasu, Venkatesh Ganti,Raghav Kaushik, VLDB 2006. We implemented the algorithm tosupport approximate string matching queries, excluding approximate joins.
TopK: This package contains algorithms for efficient Top-K approximate string search.
In addition, we have provided some commonly used functions in theutil directory.
Changes in Version 3.0 (compared to Version 2.0.1)
Added Compressed Indexers based on the Techniques from: 'Space-Constrained Gram-Based Indexing for Efficient Approximate String Search', by Alexander Behm, Shengyue Ji, Chen Li, and Jiaheng Lu, in ICDE 2009
Added Module for Top-K Approximate String Search from: 'Efficienttop-k algorithms for fuzzy search in string collections', by RaresVernica, Chen Li, in KEYS 2009: 9-14. (Workshop on Keyword Search onStructured Data, collocated with SIGMOD 2009)
Added Disk-Based Inverted Index, Disk-Based StringContainer andEfficient Search Algorithms using the Disk-Based Components from: 'Answering Set-Similarity Selection Queries on Large Disk-ResidentData Sets', by Alexander Behm, Chen Li, Michael J. Carey, UCI TechnicalReport 2010
Added Some Auto-Tuning Features, e.g. Automatic Choice of Partitioning Filter
Bibtex
Name
flamingo-3.0.tgz
2.4M
README.txt
Acknowledgements: This release is partiallysupported by theNSF CAREERAwardNo. IIS-0238586,the NSFaward No. IIS-0742960,the NSF-funded RESCUEproject, a Google Research Award, a gift fund from Microsoft, a fundfrom CalIt2, theNSF CluEProject and the ASTERIXProject funded by the NSF. Many thanks to Minh Doan, and Kensuke Ohta for theirvaluable testing and feedback on the code and documentation.
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of The Regents nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
Flamingo 1 3 0 2
The end-user understands that the program was developed for researchpurposes and is advised not to rely exclusively on the program for anyreason. THE SOFTWARE PROVIDED IS ON AN 'AS IS' BASIS, AND THE REGENTS ANDCONTRIBUTORS HAVE NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT,UPDATES, ENHANCEMENTS, OR MODIFICATIONS. THE REGENTS AND CONTRIBUTORSSPECIFICALLY DISCLAIM ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING BUTNOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FORA PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THEREGENTS OR CONTRIBUTORS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT,SPECIAL, INCIDENTAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES, INCLUDING BUTNOT LIMITED TO PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES, LOSE OFUSE, DATA OR PROFITS, OR BUSINESS INTERRUPTION, HOWEVER CAUSED ANDUNDER ANY THEORY OF LIABILITY WHETHER IN CONTRACT, STRICT LIABILITY ORTORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THEUSE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF ADVISED OF THEPOSSIBILITY OF SUCH DAMAGE. If you do not agree to these terms, do not download or use thesoftware. This license may be modified only in a writing signed byauthorized signatory of both parties.
Flamingo 1
Adobe xd cc 2019. For any questions regarding this release, please send email toflamingo AT ics.uci.edu