Sajari is now Search.io! Learn about our new name and product announcement here.

Read more
Security

Is there a disaster recovery plan in place?

Yes. Managed infrastructure and global load balancing means minimal risk of disaster. Our applications are replicated across Google's data centers globally, are backed up before each release, and can be restored or rolled back in under 5 minutes. Data is stored in triple redundancy block store. The underlying operating system is patched by Google Cloud security and our containers are moved as necessary with "live migration" (zero downtime).

Infrastructure

Where are Sajari's data centers located?

Sajari's data centers are hosted in the USA and Australia. We find little speed degradation communicating transcontinentally but if you require hosting closer to your physical location, don't hesitate to ask.

What browsers do you support?

For Site Search, Sajari provides native support for all the stable versions of the current major browsers: Chrome, Firefox, Safari, Edge, IE11, and IE10.

What programming language is Sajari built in?

Sajari is built 100% with Go. Go is a great language for speed, concurrency and distributed programming. Check out our Github project for more content on Go.

Analytics

How do I integrate custom analytics with my search interface?

If you've created a search interface from the Sajari Console, you can trigger your own analytics on searches by subscribing to events from the search interface in javascript.

Instructions

If you've created a search interface from the Sajari Console, you can trigger your own analytics on searches by subscribing to events from the search interface in javascript.

Interfaces are created using setup which is included when generating the interface from the Console.

You can subscribe to events by calling your interface with the "sub" value followed by the pipeline (either pipeline or instantPipeline) and event name, then a callback. It takes the form:


For example, if you are using the default inline interface and want to listen to the search-sent event, you would write:


For information on the events please refer to Website Search Integration documentation on Github.


Accounts

Why can’t I see the invites that I sent to team members in my console?

If you're not seeing invited users in the console after the invitation has been sent, here's what to do.

The invites are sent out via email and the link expires after 24 hours. The Team Management section will display users only after they have created an account using the invite link.

If you are wondering whether a user has accepted an invited:

  • Go to the Team Management section in your account and look for the user in question.
  • If the user is not listed, and the invite was sent more than 24 hours ago, invite the user again.

What types of user roles are available?

You can invite your team members to your Sajari project and give them specific user roles and permissions.

The permissions and visibility for each user role is detailed below:

  • Owner: This user role is assigned to the member who created the account. Ownership cannot be transferred to other team members.
  • Admin: This user role has the same level of access as the account owner and can view API keys, view/edit Billing details & Plan. An Admin cannot delete an Owner.
  • Engineer: This user role can manage Collections, Domains, Schema, Rules, Promotions , Synonyms, and Re-index. An Engineer can also view Analytics, Team Members, Credentials (API Key)
  • Editor: This user role can manage Synonyms, Rules, and Promotions. An Editor can also view Analytics, Team Members, Collections, Schema, Sites
  • Viewer: This user role can only view data and settings within Analytics, Collections, Promotions, Rules, Schema, Sites, Synonyms, Team Members. This user cannot manage settings.

What happens if I exceed my monthly quota?

Overage charges (calculated in multiples of 10,000 queries used above 100,000) will be applied in the following month, and for each successive month that your usage exceeds 100,000 queries.

Can I edit or delete permissions for other team members?

Team members who have 'Admin' or 'Account Owner' roles can delete team members and edit the roles of other team members.

However, the account owner cannot be deleted and the account ownership cannot be transferred to other team members.

Can I use my Sajari account with multiple domains, sites, or applications?

Yes, you can use your Sajari account with multiple domains, sites, or applications. You can choose to have multiple domains in one collection or have multiple collections having one domain, depending on your use cases.

There are certain limits to the amount of collections you can create on each plan, see more details on our pricing page.

  • To add another sites or domains to your collection, go to the Domains section of your console and click on “Add Domain“.
  • To create a new collection, go to the Collections page in your console.

Terminology

What is an operation?

An operation is any process that communicates with the Sajari engine.

Operations are incurred when:

  • a search request is made
  • when search suggestions are fetched
  • when new content is added, deleted or updated to your collection
  • when a search result is interacted with and that interaction is recorded
  • schema or pipeline is viewed, added, updated, or deleted from a collection

You can always keep track of your operations in the Usage section of your Console.

What is a domain?

Domain names are used in URLs to identify websites.

For example, in the URL https://www.sajari.com/faq/getting-started the domain name is 'sajari.com'.

Sajari only counts domains towards your plan where the root domain is different, e.g., if we see the websites www.acme.com, acme.com and blog.acme.com as one domain but acme.com and acme-blogs.com are two different domains.

Customization

What customizations can be applied to the crawler?

You can setup a variety of rules for crawling your site.

From your Console, you can choose what domains are stored in your collection, and if crawling is active for each domain.

You can also create exclude rules based on URL structure, domain, and a variety of metadata.

Exclude rules will remove all matching records from your collection, and Sajari will not re-crawl any records that match in the future.

How to hide a field in a search interface?

Background

When you generate an interface via console for a Site Search collection, we return title, description, URL, and image(optional) in the search results. In some instances, you might want to hide title, description, or URL.

Limitation

Our default interface uses URL field for click-tracking, and it must be returned in response, otherwise, the click-tracking won’t function. Hence, if you try to remove URL field, it will return an error:

Instructions

To hide ‘title’ or the ‘description’ field from the search interface:

  1. Generate an interface from the Integrate section in the console.
  2. After choosing the relevant options, and generating an interface, click on “View code“
  3. Add the “fields” parameter in the values object. See example below which will only return and render ‘title’ and ‘URL’:

How do I exclude a directory from a search result?

You can exclude directory from search results by adding exclude rules in your console. If you want to exclude a specific directory then select either the dir1 or dir 2 field, set it to 'Equals', and then enter the directory name that you wish to exclude from search results.

For example, if you have a site called www.acme.com with a section containing legacy publications you wish to exclude at www.acme.com/old, you would create an exclude rule where 'dir1' equals 'old-directory'. This will remove all content from the index found within 'old'.

If the content instead lived in www.acme.com/publication/old then you would set the rule where 'dir2' equals 'old'.

Does Sajari offer the ability to re-order search results for specific queries, or eliminate results that might not be contextually relevant?

At Sajari we do not believe ordering results for specific queries is sustainable or desirable.

For any site more than a few hundred pages it is not possible for us as humans to predict the content a user is searching for in exact order, over time. User behaviors change, your content changes, and like everyone else, you're busy. Instead, we provide you with access to tune your results via query rules, in combination with our machine learning model.

Integration

How do I integrate Google Analytics with my site search?

If you've created a search interface from the Sajari Console, you can trigger your own analytics on searches by subscribing to events from the search interface in javascript.

Instructions

Interfaces are created using setup which is included when generating the interface from the Console.

You can subscribe to events by calling your interface with the "sub" value followed by the pipeline (either pipeline or instantPipeline) and event name, then a callback. It takes the form.

For example, if you are using the default inline interface and want to listen to the search-sent event, you would write:


For information on the events please refer to the documentation on our Github

Crawler

How do I index a site map?

Sitemap can be helpful in pointing our crawler to webpages that are not internally linked within your website. When you add a new domain, we look for a sitemap at the root (e.g. www.website.com/sitemap.xml) of the domain and index all the links on the sitemap.

Follow these steps to add your sitemap to your collection:

  1. Check if the sitemap exists on your website by going to www.yourwebsite.com/sitemap.xml. If sitemap is already present, skip to 3.
  2. If sitemap is missing, then ask your website developer or technical team to add sitemap to your website.
  3. The name of the sitemap must end with "sitemap.xml" in order to be crawled. It can be located under any directory (e.g. www.yourwebsite.com/site/first-sitemap.xml)
  4. Log in to your console and select the relevant collection.
  5. Navigate to Domains section and click on "Diagnose".
  6. Enter the URL of the sitemap, e.g. www.yourwebsite.com/sitemap.xml as the URL and press "Diagnose".
    It will return a message "Page not found in the index." Press "Add to Index".

The sitemap will be indexed and it might take a few minutes or a few hours depending on the amount of pages on your website and load in our index queues.

If you click on "See extended debug information", you might see a MIME error on the Page Debug tool. This error can be ignored, and your sitemap and all the links on your sitemap will be indexed.

How does the crawler handle 301 or 302 redirect?

Sajari Site Search handles HTTP redirects automatically when a page is reindexed. The redirected pages are removed from the Collection(index) when they have a 301 or 302 HTTP status code.

The destination page of the redirect gets added to the Collection and will be shown in the search results.

Note: Any page that doesn’t have any meta changes detected will be re-crawled after 3-6 days in any case. The redirected pages might still appear in the search results until the next re-crawl takes place.

How to remove redirected pages immediately from a Collection:

You can also manually trigger the crawler to remove the page if you don't want to wait by following these steps:

  1. Log in to your Sajari account
  2. Select the relevant Collection
  3. Navigate to 'Domains' and click on 'Diagnose'
  4. Enter the URL that you have removed/redirected and press "Diagnose"
  5. The result and details of the record would be returned. Press "Add to Index"

The page will be removed from the index in a few minutes, and the "State" would change to "Redirect" the next time it is diagnosed. See a screenshot below:

Does the crawler automatically crawl my website content?

Yes, the Sajari crawler will visit your website periodically to update or remove existing content in your collection. However, if you're have the ping-back code installed on your website, any changes will be applied to your collection instantly when that page is visited for the first time and the ping-back code is triggered.

If a page is in a collection, and then its HTTP status code changes to 404, Sajari will immediately remove this page from your collection as soon as the page is visited and the ping-back code is triggered.

Similarly, when a new page is published and viewed for the first time, this will be added to your collection instantly.

Capabilities

Does Sajari support fuzzy search?

Yes, we support fuzzy search. Our fuzzy matching algorithm is very fast and can handle any character sequence. Sajari will build your own personalized dictionary from the content that appears on your site, which includes jargon and brand specific terms. More information can be found on our Synonyms page.

What impact does Sajari have on ROI?

High quality search drives increased revenue, improved user experience, and general site performance for a variety of reasons.Search that is fast keeps people on sites for longer. According to Google's own research, slow search has a real impact on site and cart abandonment rates.

Search that is relevant delights your users and makes your content more discoverable. It improves your click-through rates, and finds the information or products your users are after, quickly.

Search that is intelligent gets better over time, as if it's reading your users' minds. Our machine learning model improves the ranking of results based on user behavior.

Sajari's search technology is fast, relevant, and intelligent. Our customers typically see CTR improvements of 50%+ after implementing our search technology.

How does Sajari handle stemming?

Stemming is used by search engines to return relevant results for search queries using a shared word stem even if the query the user has typed differs from the available results. For example, a user types in the query 'dental' but your website only uses the word 'dentist'.

Sajari uses a stemming algorithm to ensure that the user is still returned results for their query as the two terms share the same word stem.

How is Sajari different from other search solutions like Algolia, Swiftype, and Elastic?

While some core features are shared by many search solutions, certain things make Sajari different. When we set out to build Sajari, the intention was to address the problems inherent in current search technologies. In doing so, we think we have made it uniquely capable in some important areas.

Some Sajari specific benefits include:

  • Real-time reinforcement machine learning. To continually and dynamically optimize relevance, Sajari employs reinforcement learning by default for all customers. Within two to four weeks Sajari’s ML system will develop a clear understanding of which pages or records are most relevant to each of your users' queries. What’s more, you can gain visibility on how the system is being trained in the Learning section of your Console. The process can be sped up with ‘manual’ tuning as described below, but it’s the best thing to fall back on to constantly improve relevance.
  • Dynamic query boosting. Sajari is engineered to allow users to run multiple algorithms on their data without using caches or seeing any negative effect on performance. Multiple algorithms can be run over the same data set at speed, e.g. one algorithm can be used for site search, while another produces dynamic content blocks tailored to different customer segments. This principle can be applied in any search or matching context, allowing for the creation of much richer interactions.
  • Custom score creation, on-the-fly. As results scoring is performed dynamically as queries are run, Sajari allows different blends of the factors and weightings that score results to be used without rebuilding the entire index. This allows Sajari users to tune, optimize and iterate search performance much faster.
  • Multi-dimensional indexing factors. Along with keywords, Sajari can take into account a plethora of business data all at once, for more nuanced relevance scoring. For example, in ecommerce search, Sajari customers can use different combinations of stock levels, profit margins, and conversion rates when ranking their search results, boosting product results accordingly. Most competitors only allow the use of these factors one by one to order results.
  • Use records as queries. Queries don't have to be limited to keywords or phrases. You can use an entire document, user profile, or product as a query to search for similar items.
  • Speed. We've spent a lot of time making our engine as fast and efficient as possible, for better UX. In tests against competitor search technologies, we've recorded about 10x faster indexing and about 100x faster searching.


Schedule a demo to see how Sajari can help your business or it compares against your current search provider.

Can I index DOC, DOCX, and PDFs?

Yes, Sajari supports indexing of DOC, DOCX, and PDFs for all customers on more advanced plans. See our pricing for details.

Troubleshooting

Why are there no records in my website collection?

Our crawler may have encountered issues with your site during initial crawl. If our crawler encounters errors or if you have redirects or canonicals which cause redirect loops, we will abandon crawling.

  1. The first thing you should do is to add your website’s homepage URL to page debug tool.
  2. The page debug tool will indicate whether the website crawler was able to crawl your homepage. Read the fetch log to see if there were any redirects before the page was parsed(crawled) or any errors
  3. There are common issues we see:
  4. Canonical tags that are all set to your homepage - in this case we may only ever end up indexing your homepage. To resolve, update the canonical tags to the correct page or they should be left blank.
  5. Website homepage is in a canonical loop (i.e. homepage redirects to a different page, which redirects back to the homepage, and ends in a loop). To resolve, either remove the canonical tag from the website’s homepage or add the correct canonical tag.
  6. Using path-relative URLs instead of root-relative URLs - Path-relative URLs can cause redirect loops. The crawler does not follow path-relative URLs (i.e. if there is no base path on the page <a href="example-page">). Please make sure you use root-relative (e.g. <a href="/example-page">) or Absolute URLs (e.g. <a href="https://www.acme.com/example-page">) on your site. For further reading on URL types </a></a></a>please see here.

Notes:

  • Sajari does not fully crawl third party sites until our pingback code is installed.
  • Please don't add "amazon.com", "google.com", or other domains that you do not own as your domains to be crawled - we do not fully crawl third party domains without authentication.

How do I prevent pages from being crawled?

You can add data-sj-noindex anywhere in a page and it will not be indexed. Most commonly this will be defined in the <head> of an HTML page as follows:</head>

  1. Locate the <head> tag of the page you want to prevent from being crawled.</head>
  2. Add the following code within the <head>:</head>
    <meta name="robots" content="noindex" data-sj-noindex="">
  3. Save the changes. The crawler will ignore this page next time it comes across it.

Additionally you can use crawling rules to programmatically exclude sections or certain pages of your web site. You can also set individual pages to not be indexed from the data sources tab of the admin Console.

How do canonicals impact indexing?

A canonical tag (aka "rel canonical") is a way of telling search engines that a specific URL represents the master copy of a page. This is done by setting the canonical tag in the head section of the page, as below.

Canonicals are used for a variety of reasons, such as choosing the preferred domain, http vs https preference, and consolidation of ranking "juice" for a given piece of content. Good canonicals can also help improve SEO. For more information, read how Google handles canonical tags and why the SEO community considers them important.

Canonicals are very important to the way Sajari works and one of the biggest reasons for crawling failing to index content correctly. They are a very strong signal and we generally won't index a URL if it has a canonical pointing elsewhere; we will instead try to index the canonical URL. The biggest mistakes we see with canonicals are:

  • Redirect loops: The canonical will point to a different URL, which will redirect back to the original, and so on.
  • Unresolvable: The URL in the canonical tag is either not a URL, does not exist, or cannot be resolved.
  • Self referential: Sometimes developers and CMS' set the canonical for each page as itself, defeating the point of canonicals.
  • All the same: Every page on a site has the exact same canonical URL (often the root domain or homepage).

You can tell if you have some of these issues using our content debug tool. You should either a) fix these issues or b) remove canonical tags from your pages altogether. Removing all canonicals is much better than setting them incorrectly.

How can I test search in staging or development environments?

When you create a Website Search collection and add a domain, e.g. www.website.com, we only authorize searches from the domain that you have added. If a search request is made from a staging site or development site, e.g. www.staging.website.com or www.dev-website.com, you will get an authorization error i.e. "Authorization for this request failed. Check your credentials".

To test search in staging or development environments, you need add the domain or URL of the staging or development site in the Domains section of your Collection.

Add one or more search domains.


Make sure that the "Search from Domain" is enabled. You can also add and authorize IP addresses in the same way.

How can I test search in a password-protected staging environment?

If your staging environment is not publicly accessible, then you will need to allow our crawler access to it.

There are a number of ways to achieve this if you want:

  • The easiest way to do this is to look for Sajaribot - at the start of the user-agent in HTTP requests, and allow these requests.
  • It is also possible to whitelist a range of IP addresses used by our crawling infrastructure. We generally recommend that you check these often as they are likely to change. Our primary crawling system runs within Google Cloud and has a very large and dynamic address range. Raise a request and we will get back to you with the current IP list.
  • If this is difficult, you can always index your production site instead, and then test new search interface integration on your staging site using your production data. This presents no performance issues and will not change the search functionality of your production site. This method allows the UI to be developed without the need for us to index your staging site.

How can I fix PDFs and DOCs that fail to index or have the wrong title?

If a few PDF and DOC files are not added to your collection or have the wrong title, here are some steps to take.

  1. The first thing to do is check how the crawler views your document. Do this by adding the URL of the document to the debug page.
  2. If the debug page shows that the page is indexed correctly, then go to the Domains section of the console and use “Diagnose” to see the current crawl status of the page. If status is no-index or redirect, then it means that there are rules in the collection or a no-index tag in that document due to which we cannot crawl your document.
  3. If the debug page shows an error and mentions that it can't download the document, then it's likely a corrupt file. Some systems may still be able to open the file, but not all. We recommend re-saving or exporting with a different program or version.

Regarding the documents that have wrong title, we take the title from the metadata of the document. If no title is present, then we use the filename instead. You can do the following to update the title:

  1. Update either the metadata or the filename and upload the file to your CMS/website
  2. Once added, we will index the PDF on the next crawl cycle. If you want the change to reflect immediately, then re-index the URL of the PDF document via our Diagnose tool in the Domains section.

Install

Where can I find my API key?

Your API key can be found in your console under 'Credentials'.

Is Sajari available for download or on-prem deployment?

No, Sajari is hosted, i.e. "search as a service" only. You cannot download it, but you can create an account and be up and running in minutes as a service.

We manage a dedicated Kubernetes cluster of machines across multiple availability zones specifically to save you time and resources. Our cluster has many machines and services operating with round-the-clock monitoring.

How long does it take to implement Sajari?

Installing Sajari for your website search is an easy process. With some basic web development skills you should be able to get up and running in about 5 minutes.

For apps and custom deployments (such as connecting multiple data sources), you can use our SDKs or API directly. This will take a little longer and requires more advanced development knowledge to set up.

Can I install Sajari with Google Tag Manager (GTM)?

Yes, follow the steps below.

Step 1

Sajari is a "custom HTML tag," you can select this when adding a new tag as per below:


Step 2

Sajari is typically a global install, so you can activate across your entire site as per below. Don't worry about hiding various pages from your site search, you can easily exclude them later using "crawling rules".


Step 3

You can access your install code from within the Sajari app itself (if logged in). Or you can cut and paste directly from below.



Copy the code on the left, replace <project> and <collection> with your actual company and collection names.</collection></project>

Once you've completed the above steps, your content will begin to index and build statistics around popularity, recency, etc to optimize your search and recommendations.

Billing

When will my credit card be charged?

If you are currently on a free trial and have provided credit card details, you will be charged for your first billing period (i.e. month or year) on the day after the end of the trial.

We’re in the business of improving your business.
Get started today.