Introduction
Elasticsearch is an API search product that took the search world by storm when it was first introduced in 2010. Elasticsearch enjoyed the benefits of being open source for 10 years, but in recent years the company saw its market share challenged by cloud providers such as AWS.
Up until 2021, Elasticsearch was under an Apache license, but to remain competitive and protect their IP, Elasticsearch changed their licensing structure by moving it under the Elastic License and Server Side Public License (SSPL).
It’s no longer debatable whether Elasticsearch is still truly an open source search engine. It’s created a fissure in the open source community and concern about future licensing changes, and it opened the question of what service is best given the multitude of options. This has led many companies to rethink their search strategy.
There’s also the question of what is the best use case for Elasticsearch. The company has moved aggressively into the log analytics and SIEM markets with its popular ELK stack. And this is where Elasticsearch shines. However, because it’s more well-suited to massive log analysis applications, core site search capabilities are much more complex than needed.
Let’s take a look at what Elasticsearch is, why it’s so darn good at log analytics, and what alternatives are available for building site search applications more rapidly.
What is Elasticsearch?
Elasticsearch is a specialized search engine that has built a massive community around logging analytics projects with its popular ELK stack, which was open source up until 2021.
Elasticsearch combines the flexibility of a full text search engine with the power of a JSON document database’s indexing. It offers a tool for rich data analysis on large volumes of data ready to power catalogs, autocompletion, log analysis and more. Searches can be fuzzy and the results get scored for relevance making it ideal for looking for near misses and close matches.
Elasticsearch is built on top of Lucene (as are Apache Solr and other legacy search tools). Like Lucene, Elasticsearch builds an inverted index of your data for full text information retrieval — essentially it indexes your data by keyword. Elasticsearch is able to achieve fast search responses because it searches an index rather than the text itself. Combined with its distributed architecture, it makes Elasticsearch ideal for scaling to very large datasets.
Elasticsearch core features and APIs include full-text search, typo tolerance, sorting, ranking, and much more. In addition, Elasticsearch offers a rich set of APIs, plugins, and libraries for building search interfaces, working with Python, Rust, Java and other clients, and adding plugins for file management, security, and deep analysis.
Prior to 2018, Elastic offered Elasticsearch only for on-prem deployment. Elastic built an entirely new revenue stream with the release of their fully-managed hosted service which competes with similar offerings from cloud providers.
Why is Elasticsearch so popular?
There are two ways to answer this question: by licensing and use case.
Historically it was Elasticsearch’s impressive features combined with free open source licensing that made it the go-to search engine for new projects. Companies simply defaulted to Elasticsearch for these reasons — it was free, offered ideal licensing terms, and had loads of APIs — for adding search functionality to their applications.
Other than its once flexible licensing terms, people selected Elasticsearch as a backend, NoSQL database for searching through massive amounts of data. Where Elasticsearch shines is in running log analysis and SIEM projects across massive data sets. The elastic stack — Kibana, Logstash, and Elasticsearch (ELK stack) — has become a defacto devops platform for log analytics. In fact, DevOps.com calls Elasticsearch and Splunk the “big two” in log analytics.
Why is Elasticsearch so good for log analytics? Immutable data. Log files never change and Lucene-based search engines like Elasticsearch with its distributed architecture are great at indexing and searching across these files.
For more mutable indexes, such as e-commerce search applications, Lucene-based search engines are not ideal. Elasticsearch indexes are created and saved, then aside from flagging deletes in a special "deletes file." They are never updated, but instead merged and re-written into new files periodically as the data state diverges. So if you make a small change to an item, it basically stores a flag to say that it is deleted and creates an entire new record in memory.
Because Elasticsearch never throws out old values, it slows down queries as it needs to run the search, then check the differential for changes on the way out. Periodically the memory buffer fills up, the difference is reconciled and all the files with any changes get merged and rewritten out to disk.
For e-commerce and many site search use cases, there are changes all the time to the index.
Elasticsearch alternatives
If you’re building a log analysis project, Elasticsearch should be at the top of your list. But, if your goal is site search for e-commerce, website search, or web or mobile app search — where you want to optimize search for conversions, customer success, and customer satisfaction — there are newer, better alternatives that require fewer resources for success.
Resourcing
If you’re Amazon.com and can afford 1000 search engineers, it could make sense to invest your engineering resources on customizing Elasticsearch functionality. But for most site search and e-commerce search applications, Elasticsearch is overly-complicated. Newer SaaS-based search platforms can provide greater functionality and configurability with exponentially fewer engineering resources required.
Managed SaaS vs cloud data center
For many of the same reasons as above, fully-managed SaaS search solutions are much less resource intensive. You may eventually bring the solution in-house or want to co-locate it with your application hosting, but for faster time to value with fewer resources and internal costs, managed services offer tremendous upside at little cost compared with cloud services that demand frequent updates, resource management, and optimization.
Relevance optimization
Core search features can be set up in minutes with Elasticsearch, but deep configuration and optimization to improve search result relevance requires a team of engineers. Elasticsearch’s strengths — scalability, configurability, extensibility — can also be liabilities. And, as mentioned in the previous section, performance issues are just around the corner for indexes that require frequent updates.
Today, alternative search solutions built on cloud-native architecture, can offer the same degree of configurability and scale in much less time. Let’s look at a few.
Sajari
Sajari is a user-friendly site search engine built from the ground up for developers. It’s an entirely new hosted service built on proprietary technology.
Sajari has the combined power of a full-text search engine and a database. Sajari uses real time indexes, it's own data layout/flow, and it's own binary encoding methodology to provide lightning fast results without taking a performance hit like Lucene-based solutions.
It offers tremendous flexibility and ease of configuration built on top of a cloud-native architecture for elastic scale. Machine learning (more specifically, reinforcement learning) is built into the core product for continuous improvement of search performance. Because it’s fully-hosted and battle-tested with billions of queries, you can spend more time working on your core business without having to manage search scale.
Sajari features include:
- Instant indexing with full-text crawler, including document (PDF, DOCX) search
- Easy to add search advanced capabilities via simple YAML-based configuration
- Machine learning included for always-on improvement
- Fully hosted and performant to thousands of queries per second
- Search UI generation and UI component libraries to build search experiences
Best use cases:
Additionally, Sajari has taken a different approach to configuration and extensibility, moving configuration from config.xml files to a core, built-in feature called pipelines. Pipelines are YAML-based scripts that define a series of steps which are executed sequentially when indexing a record (record pipeline) or performing a query (query pipeline). With pipelines, you can configure the search algorithm to improve search relevance or even A/B test different algorithms to determine which one provides the best search experience.
Core features, such as crawling, autocomplete, schema configuration, document indexing, synonyms, filters, faceted search, etc. are all baked in. In addition, Sajari offers a REST-like API for connecting to business data and Node, PHP, and Go SDKs, and React and JavaScript libraries for complete front-end customization.
Sign up for a free 14-day trial of Sajari.
Algolia
Like Sajari, Algolia is a new search engine built from the ground up. Originally, Algolia was developed for mobile search use cases, but has since been extended to more traditional search projects. Algolia can boast about its retrieval speed; it’s milliseconds faster than the competition. Those few milliseconds won’t matter for most use cases, but if speed is important, Algolia is worth a look. As a fully-hosted product, Algolia also eliminates the need for cluster management.
Algolia features include:
- Typo tolerant full-text search
- Simple and easy to understand tie-breaking relevance algorithm
- Global language support
- Very fast information retrieval
Best use cases:
- App search
- Mobile search
Algolia is popular because of how simple and easy it is to get started. It’s a great general purpose search engine. But, it has its critics too, particularly around pricing and complexity for managing custom rules and configurations. For example, anytime Algolia re-indexes the database — such as for A/B testing — it counts against monthly search queries quota. Features such as machine learning are add-ons that also cost more. It's ranking algorithm is a simple tie-breaking algorithm, which is easier to understand but also less flexible and powerful than other solutions on the market.
Coveo
If you’re looking to wean off of Elasticsearch while still maintaining your investment in Elasticsearch, Coveo may be a good choice. Coveo has built its own enterprise search technology but also allows people to leverage Elasticsearch as another searchable database.
Historically, Coveo was used for building secure, enterprise search applications and knowledge bases. With their tight coupling to SiteCore, Salesforce, ServiceNow, and other B2B enterprise applications and SQL and NoSQL databases, Coveo offers fast search across datastores for internal KBs and other enterprise use cases. More recently, the company has acquired AI and e-commerce technology to extend its footprint into e-commerce use cases.
Coveo features include:
- Machine learning-powered intelligent search
- Recommendation and personalization features to deliver the right content faster to the right users across datastores
- Native integrations with B2B services to provide comprehensive enterprise search
Best use cases:
- Designed for mid-to-large enterprises with multiple systems and datastores to index
- Backend search for internal knowledge base
- E-commerce and site search
Coveo is not as general-purpose a search platform as Sajari or Algolia (or Elasticsearch for that matter). It’s a platform to ingest and transform different data types into searchable, accessible content with Coveo’s proprietary search, machine learning, and recommendations engines built on top. Reviewers have mentioned that initial indexing can be slow, that updates are equally slow and cumbersome, and the user experience and UI is average, but once your content is integrated into Coveo it can be a very powerful tool.
The last word
Nullam et imperdiet erat. Pellentesque quis imperdiet mi. Suspendisse potenti. Interdum et malesuada fames ac ante ipsum primis in faucibus. Aenean quis fringilla felis. Nullam maximus justo vel est maximus, faucibus congue dui lacinia. Mauris accumsan leo et libero volutpat blandit. Integer cursus ac libero ut vestibulum. Praesent aliquam leo ac scelerisque dictum. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam id mi et dolor efficitur efficitur id id nulla. In congue aliquam erat, id ultrices lectus consectetur eu. Fusce vitae magna diam.