Quantcast
Channel: Source @ Coveo
Viewing all 202 articles
Browse latest View live

Adding server-side scripting

$
0
0

Recently, I’ve spent some time adding support for server-side scripts in Coveo’s Search API component. In a Coveo setup, the Search API provides the backend through which our JS UI framework executes queries on the index, using REST requests. Queries go through what we call the Query Pipeline. The Query Pipeline provides various ways to alter the query before it’s finally sent to the index server, and now offers scriptable extension points where you can implement complex custom logic when needed.

Choosing a script language

I had many choices of languages I could support. The Search API runs on the JVM, so I could simply load additional jars. But that requires compilation, and I wanted something users can simply edit, save and reload to test. Some kind of script language would work better. The Java ecosystem offers many options (Jython, JRuby, Groovy, several JavaScript runtimes, etc.). In the end, I settled on JavaScript, mainly because working with the JS UI already requires JS skills.

There are several options for running JavaScript code on the JVM. At the time I started working on that, Java 8 was not yet released and thus the standard way to go was to use the Rhino engine that comes standard with the JDK (vs the new Nashorn engine in Java 8). So I started with Rhino.

Pretty quickly I had the basics working: a query pipeline folder could contain a main.js file that would be loaded at initialization, and to which I could expose ways to interact with the query pipeline when queries are executed. Whenever the JavaScript code changes, it’s automatically reloaded so that the next query uses the latest code.

I could now do stuff like this:

Coveo.onResolveIdentity(function(authenticated){varwindowsDomainUser=newUserId();windowsDomainUser.user='DOMAIN\\someone';windowsDomainUser.provider='Active Directory';returnauthenticated.concat(windowsDomainUser);});

This code adds an additional identity to use when executing the query. Coveo.onResolveIdentity is a method I implemented on the Scala Search API code. It gets passed some kind of handle to the JavaScript function, and later on I’m able to call this and collect the result (with lots of marshalling).

Runtime libraries

I was pretty happy at this point, and then I started thinking: “How do I allow the JavaScript code to perform more sophisticated stuff, like reading files, calling stuff over the network, etc?” Code running under Rhino does have access to the Java libraries in the classpath, but it felt weird using Java libs in JavaScript. People won’t be expecting that.

Another option was to provide 100% custom APIs for performing the common tasks. This would require quite a large amount of work, would be completely specific to our environment, and I’d also need to document all this. Hmm. Honestly, I prefer writing code.

Then Greg on my team mentionned it’d be nice if we could make use of libs from NodeJS. As a matter of fact, I had previously investigated whether I could somehow embed the real NodeJS runtime inside my process, but it turns out it’s not quite possible yet (Node being essentially a single threaded system). It’s also not very convenient to call Node as an child process — I wanted to be able to expose a Java object and have my script code call its methods. So Node was out. But then Greg asked, isn’t there some Java implementation of Node, like there is for Python, Ruby, etc?

Well, it turns out, there is.

The Nodyn project aims to implement a NodeJS runtime on the JVM. It’s being built by some nice folks working for Red Hat. It hasn’t reached its first release yet, but they already have quite a lot of the core APIs working. Also, they support using packages from NPM, so any of those that doesn’t use stuff that isn’t implemented yet should work fine.

But there was one problem: they don’t use Rhino. Instead, they use the DynJS JavaScript runtime (built by the same folks). So before I could try using Nodyn in my stuff, I had to port all my JS runtime code to work with DynJS. In the end, it wasn’t very hard, and in fact, I find the DynJS API to be much nicer than Rhino’s, so even without Nodyn the switch was a plus.

Then, from there loading the Nodyn environment into my JS environment was very easy:

runtime.global.defineGlobalProperty("__dirname",System.getProperty("user.dir"))runtime.global.defineGlobalProperty("__filename","v1")valnodeJs=classOf[org.projectodd.nodyn.Main].getClassLoader.getResourceAsStream("node.js")runtime.dynJs.newRunner().withSource(newInputStreamReader(nodeJs)).evaluate()

That’s it. Well, of course I’ve added a Maven dependency to the Nodyn lib. But then I only need to arrange for the embedded node.js file to execute whenever I’m initializing a JavaScript context and then I’m good to go. Suddenly, I could do stuff like this in my JS code and everything would just work:

Coveo.registerQueryExtension('stock',function(query,callback){http.request('http://d.yimg.com/autoc.finance.yahoo.com/autoc?callback=YAHOO.Finance.SymbolSuggest.ssCallback&query=goog',function(res){varjson='';res.on('data',function(chunk){json+=chunk;});res.on('end',function(){vardata=eval((''+json).replace('YAHOO.Finance.SymbolSuggest.ssCallback',''));// Very secure, please do that at home kids.vartickers=_.map(data.ResultSet.Result,function(result){returnresult.symbol;});console.log(tickers);});}).end();});

Yay! No need to provide calls to perform everything. Customers can simply use stock Node APIs or use packages from NPM. My empty sandbox suddently had thousands of libraries available.

Handling callbacks

One thing about Node, it never blocks threads. Well, mostly. That’s why it scales so well. So, pretty much all code written for Node uses callbacks for about everything. This means I had to handle that as well on my code.

In my first implementation, when a JS function returned, it was expected that the return value was definitive and fully computed. It made it impossible to use Node APIs that use callbacks to signal that an async operation (like an HTTP request) completed.

Right now, the implementation for the Search API’s REST service doesn’t use an async model. This means a thread is allocated to each request and is kept in use until the processing is finished. I want to change that at some point, but for now any remote call will simply block the thread.

I needed a way for my main request thread to block if the JS code I’ve invoked is executing some asynchronous process. Also, I wanted to keep support for synchronous usage (e.g. returning a value directly), because that’s often just simpler and simple is good.

To address this, I arranged for all my calls to JS code to go through a single call point that would check the function being invoked from Scala code. If the function has one more formal parameter than what’s expected (based on the params I’m passing it), I assume the additional parameter is a callback, and I pass it a special object. Then, if the function doesn’t return anything meaningful (e.g. undefined), I block the main request thread until the callback has been called, or if a specific timeout expires.

Here’s the code (well, part of it):

defcall(runtime:JsRuntime,function:JSFunction,theThis:JSFunction,args:Seq[Object]):Object={runtime.executeInEventLoopThread(defaultTimeout)(cb=>{// If the function takes one more parameter than what we've been provided, assume
// it's a callback and enable waiting on it. Looks like a hack, but hey why not?
valjsCallback=if(function.getFormalParameters.length>args.length){Some(newJsCallback(runtime.global,cb))}else{None}valresult=runtime.context.call(function,theThis,args:+jsCallback.getOrElse(Types.UNDEFINED):_*)resultmatch{caseTypes.UNDEFINEDifjsCallback.isDefined=>// When a function returns undefined, we assume it'll call the callback
// function that was provided to it automatically. There is no need to
// wait in this thread because executeInEventLoop will do that for us.
caseother=>// Otherwise, when a function returns a value we call the callback ourselves
cb(Right(other))}})}

Under the hood, Nodyn uses Vert.x for providing event loops usable for async operation (among many other things). So, every time I make a JS call, I arrange for it to happen in a Vert.x event loop. Per design, all subsequent callbacks are invoked in the same event loop (e.g. no parallel execution). So I only have to wait in my main request thread for the result to be available.

Using NPM packages

At this point I started showing this to some coworkers and PS guys (professional services consultants). One of those conveniently had a need to override some stuff based on data retrieved from a SOAP service ಠ_ಠ, and he was willing to beta test my stuff. There are many libs in NPM to call SOAP services, and in the end it boils down to making an HTTP request somewhere. Should be a piece of cake, right? I mean, I did that at least once.

Well, not so fast here cowboy.

As I said previously, Nodyn hasn’t reached a first release yet, and this means there are some rough corners. In particular, it had issues with the request NPM package, which the SOAP library used under the hood. I had to fix a couple of glitches in the Nodyn and DynJS code to get the package to work as expected. I’ve submitted those changes to the maintainers, and the fixes are now merged in the official code.

A more annoying thing was that request seems to access undocumented fields of Node’s HTTP request, which for obvious reason aren’t present in Nodyn’s implementation. For now I worked around this by “enhancing” the Nodyn objects with some stubs (this only when running in my environment, since it’s too ugly for a pull request). Still, I’d like to find a better solution to this. The Nodyn devs are currently rewritting the HTTP stack directly on top of Netty, so I’ll wait a little and then check if there’s something better to do.

UPDATE: I just learned that the Nodyn devs switched to a different approach for implementing Node’s core APIs. Instead of replicating the user facing APIs with their own implementation, they now only implement the native APIs from which Node’s JS runtime depends. This means they are now using Node’s own JS code as-is, effectively eliminating this family of issues once and for all. Great!

With those changes, I was able to build a client from the service’s WSDL and use it to call some methods. The only problem remaining is performance: the parser that processes the WSDL runs pretty slowly on DynJS. Right now I’m not using the JIT feature of the interpreter, because I had a weird error when I tried it, so that might explain the performance issue.

In any cases, I’ve seen mentions that Nodyn might also support Nashorn as a JS engine in the future, which should take care of performance issues if I can’t get DynJS to run faster. Also, the problem really happens when CPU intensive work is being done in JavaScript, which often ain’t the case anyway.

Of course, I expect other issues to appear with other NPM packages. I’ll try to address those as they come. Still, what’s already working is a pretty interesting addition.


Better unit test assertions in Java

$
0
0

If you’ve ever written a unit test in Java, you’ve probably used JUnit 4 and done something like this:

public class CompanyTest {
    private Company company;
    @Before
    public void setup()
    {
        company = new Company();
    }
    
    @Test
    public void getEmployees()
    {
        company.addEmployee("John");
        company.addEmployee("Susan");
        assertEquals(company.getEmployees().size(), 2);
    }	    
}

This is okay and gets the job done. However, I think there are some issues with this approach.

If there’s a bug in your Company class and the addEmployee method does not work, you will get the following message:

java.lang.AssertionError: 
Expected :0
Actual   :2

Expected 0? Oops, assertEquals takes the expected value first and the actual value second. That’s an easy mistake to make. So you fix your test and write it as:

assertEquals(2, company.getEmployees().size());

And now you get:

java.lang.AssertionError: 
Expected :2
Actual   :0

That’s better. Now you know that the test expects an integer with a value of 2 but received an integer with a value of 0.

But is that really what your test expects? I would argue that you were expecting a collection with a size of 2, not an integer with a size of 2. Sadly, it is not easy to express this with JUnit 4.

Introducing Fest Assertions (or Fluent Assertions)!

Fest Assertions introduces a new static method, assertThat that gives you the power to easily express complex assertions in a way that allows the assertion engine to know what you want to verify.

The best way to write the above unit test would become:

assertThat(company.getEmployees()).hasSize(2);

If this test fails, you will get the following message:

java.lang.AssertionError: expected size:<2> but was:<0> for <[]>

This is much better! Now we know that we expected a collection with size 2 but the actual size was zero. We can even see the content of the collection being tested. This technique also addresses the problem of assertEquals() parameter order.

Here are some more examples on how assertThat can help you.

Another cleaner error message

The test

// Before
assertTrue(company.getEmployees().isEmpty());
// After
assertThat(company.getEmployees()).isEmpty());

The assertion message

// Before (yes, that's all it says!)
java.lang.AssertionError
// After
java.lang.AssertionError: expecting empty, but was:<['John']>

Code that reads more like english

The test

// Before
assertFalse(company.isHiring());
// After
assertThat(company.isHiring()).isFalse();

Powerful assertions

The test

// Before
assertTrue(company instanceof Company);
// After
assertThat(company).isInstanceOf(Company.class);

The assertion message

// Before
java.lang.AssertionError
// After
java.lang.AssertionError: expected instance of:<com.coveo.Company> but was instance of:<java.lang.Object>

EDIT: Here is another way you can enhance the readability of your tests using Java 8.

Coveo for Sitecore troubleshooting

$
0
0

Coveo for Sitecore has been out for a few months now and we had to rethink the way we index search and of course, for us, support and maintenance people, the way we tackle the issues popping here and there.

Our Search Provider implementation pushes documents to RabbitMQ, which keeps them warm until Coveo picks them up for indexing. The configuration is set in the Coveo.SearchProvider.config file and is also being synchronised with the indexing back-end through the Coveo Admin Service. The schema below is from the Coveo Online Help, for now I will only address the Search Provider part.

The process is simpler than before but the troubleshooting will require a change of mindset.

IMPORTANT: The CES Console is your dear friend for all your Coveo operations. This is still the case with Coveo for Sitecore. The first step before starting any type of troubleshooting is to arm yourself with that friendly tool. Be aware that you can filter by source or system events.

NEW STUFF: For all of you using June 2014’s release or newer, the R&D team added a built-in Coveo Diagnostic Page. I will not lie, this is one of my favorite new feature.

The new process and where you should look first when something is wrong

Sitecore

1. Admin service issue.

The first blocking issue you could meet is directly in the Sitecore Indexing Manager.  As I mentioned before, the Coveo Admin Service is the messenger for configuration-related calls, which means that it is also the first component of the chain that can fail. The Coveo for Sitecore diagnostic page should also tell you if anything is not running properly. If you do not have it, you will see the service state in the Windows Services, where it can be stopped or started. You can also access it directly through your browser using http://localhost/AdminService or http://localhost/AdminService?wsdl.

Now, what can go wrong with it? Well if you are using Coveo for Sitecore with a Coveo instance that you have been using with other connectors, you might have not installed the Coveo Admin Service in the first place. It is not installed by default with the Coveo Enterprise Search Installation Wizard, you need to choose the Custom Install path and make sure that it is selected.

The other common issue is with the https certificate. If you are using a restrictive certificate imposing an instance name, then the default local host will not pass the validation step. Simply change the <AdminServiceUri> value from localhost/AdminService to [qualified name]/AdminService in the Coveo.SearchProvider.config.

Finally, you must leave some breathing space for the service, so do not forget to enable port 80 or 443 depending on the security of your service.  The ports can be managed in the Coveo.CES.AdminService.exe.config file located by default under C:\Program Files\Coveo Enterprise Search 7\Bin\AdminServiceSecurity. An issue with the Admin Service could return this type of error in the Sitecore Indexing manager:

System.ServiceModel.EndpointNotFoundException: There was no endpoint listening at http://127.0.0.1/AdminService that could accept the message.

NOTE: The Coveo Admin Service has its own logging stored by default under C:\Program\ Files\Coveo Enterprise Search 7\Bin

2. Version mismatch issue.

This is a common one and you can easily make a mistake. Even for minor upgrades, an installation of both the CES instance and the Coveo for Sitecore package, including the Search Provider and Search Interface, needs to be performed. A failure to do so might produce several errors at indexing and querying time. One of the most common error would be a “method not found” exceptions. To double check your versions, use the property options on the Coveo DLLs in the bin folder of your Sitecore. Consult the Coveo Online Help for more information on both the Coveo Enterprise Search 7 and Coveo For Sitecore versions.

3. Duplicate Key Exception.

This is a hidden issue fixed in July 2014’s version of Coveo for Sitecore. The Coveo Admin Service will not be able to create a new Security Provider if an old User Identity is already active. If the indexing manager fails with the following Exception:

Fail to add security provider “Security Provider” on source …

then simply navigate to the Coveo Administration Tool under the Configuration>> Security tab. Choose the User Identity menu and delete any User identity with the following name:  Sitecore Admin for [Your instance name].

4. QueueURI invalid.

Not much to say on this one, the QueueURI is set in the Coveo.SearchProvider.config file.

Make sure that it is matching the one set in the RabbitMQ Management page. See the RabbitMQ section below for more details on how to reach this page. If the QueueURI is not set properly, you will most likely hit this error while indexing:

RabbitMQ.Client.Exceptions.BrokerUnreachableException: “None of the specified endpoints were reachable.”

TIP: Sitecore produces an enormous amount of logging but everything related to the Coveo Search Provider at the Sitecore level can be found in the logs with the following naming convention: log[timestamp].txt. It is with all the other logs in the Data/logs folder of your Sitecore instance. No need to look in the Coveo file system until now.

IMPORTANT: Closing the dialog window will not stop the indexing process. The only way to force Sitecore to stop is through a recycle/reset in IIS.

WARNING: Depending on how you customize your Coveo for Sitecore, you might play around the config files under the App_Config folder of your Sitecore instance. Do remember that Sitecore reads EVERY file with the .config extension. So if you are to keep a few files as backup, keep in mind that you should use .config.backup and not .backup.config as an extension. It is common Sitecore knowledge but since a mistake is easily made, I prefer to mention it here. Remember that Sitecore does provide you with a merge of all the config file in the Show Config page: http://[sitecoreinstance]/sitecore/admin/showconfig.aspx

Coveo

1. Live monitoring stopped.

As I mentioned before, the new Coveo for Sitecore integration has the “Everything is done in Sitecore” mindset. 95% of the time, you will be conducting publishing, indexing and even troubleshooting operations only in Sitecore. Still, if the indexing is working properly in Sitecore but the CES Console is not logging any activity, then you might want to take a look at the Coveo Administration Tools. Under the Index>> Sources and Collections tab, you will find the Sitecore Search Provider collection with the sources corresponding to the database name set in your Coveo.SearchProvider.config.

NOTE: Adding databases or changing source name for scaling scenarios is described in further details in our Scaling Guide.

There is no point on trying to rebuild or refresh manually these sources, they are plugged to the search provider and are commanded under the cover by the Sitecore Indexing Manager. What you can do however, is the good old “turn it off and on again” that the IT world has been relying on since the beginning of the computer era. To do so here, click on one of the sources, it will lead you to the source status page. From there, you will be able to disable and re-enable the live-monitoring operation. Keep an eye on the CES Console since it will tell you straight away (and in bright red) if anything is wrong. You can also use the check boxes to select multiple sources and use the drop-down box to disable and enable live monitoring. Most of the time, this will resume the indexing operation.

2. The security provider handbrake.

The security provider is now created automatically by the Admin service, thus reducing the configuration errors which were frequents with earlier connectors. Still, a downed Sitecore can prevent the security provider to load properly, triggering an handbrake on the indexing operation. Usually, Coveo will be nice enough to warn you with the following message on the status page of the Coveo Administration Tools:

A security provider is not properly configured.

Simply navigate to your Sitecore Security Provider under the Configuration>> Security tab and reload it by clicking on Accept Changes at the bottom of the page. Once it is done, simply disable and enable the live monitoring operation on the source and you should be good to go.

3. QueueURI invalid part 2.

There are two locations where you can set the QueueURI, the Coveo.SearchProvider.config file and the Coveo Administration Tool. As I mentioned before, an invalid QueueUri in the .config file will break the indexing process in Sitecore. Same goes on Coveo side but it can be harder to spot and if it is not fixed quickly, it could end up filling the Queue, and your disk at the same time.

The QueueURI is defined in the source’s general properties under the Index>> Sources and Collections tab. It is set automatically by the Admin Service when you first perform a rebuild in Sitecore, so a mismatch is not a common error but it is worth keeping it as a reminder in your troubleshooting guide. If you have to change it, do not forget to accept the changes at the bottom of the page.

RabbitMQ

The newest fluffy addition to the Coveo and Sitecore relationship is RabbitMQ. The default access point of the management is http://localhost:15672/ and the username and password are guest/guest. RabbitMQ will give you a simple and user friendly overview of the current crawling operation. While rebuilding, an empty queue will be a clear sign that nothing left Sitecore, while a full and idle queue will reveal a Coveo issue.

WARNING: If you ran several indexing operations that are not being picked up by Coveo or you changed the source name multiple times, the documents will not be automatically cleared from the Queue. This could end up filling up the drive on which your RabbitMQ is installed. By clicking on the Queue, you will be able to Delete or Purge it. We don’t recommend to use the purge unless the Queue is not used anymore, in case of a source name change for example.

LAST TIP: Coveo supports log4net, do not hesitate to use it!

Getting the most out of relevancy

$
0
0

You probably wondered how come a result is coming first and not the other when performing a query. Why is the search result ranked in a certain way and what can you do about it?

How to check your ranking:

The best way to check your ranking is to enable the ranking information.

To enable the ranking information, just perform the query: “enablerankinginformation”

This will add to the result page some information about the ranking weights given to the documents (under Details - Ranking weights).

Before enabling the ranking information:

Before enabling the ranking information

After enabling the ranking information:

After enabling the ranking information

6634 is the total from all the weight under ranking weights tab. Do not include the “100”, these are not ranking weights but percentage. These percentages are used by Coveo engineers for debugging purpose.

Example

If you look at the total weight, you’ll realize that the first document listed is the one with the higher weight.

Ranking Total Weight

Notes:

  • These information are only available for the user which have enabled the ranking information
  • Enabling the ranking information uses a lot of additional memory in the ranking phase, so it should only be enabled temporary for analyzing ranking problem. To quit the ranking information mode, just perform the query: “disablerankinginformation”

Relevance ranking steps:

Relevance ranking has phases, each of them working on the documents sorted by the preceding phase.

  1. First phase is performed on all documents, and tries to weight the query terms (is a term in the title, in the summary, in the concepts, etc).
  2. The second phase is performed on a maximum of 50,000 documents, and weights the date and quality of the documents.

    A phase has been inserted between the second and the third, and was named the Collaborative Ratings phase. This phase takes collaborative ratings information into account.

  3. The third phase is more computing intensive, the goal here is to weight terms while taking their number of occurrences into account.
  4. The last phase: the adjacency phase, is the phase where we compute the proximity of query terms, giving more weight to documents having the terms close together.

As usual, every application comes with a default setting, and most of the time users can modify those settings so the ranking will match their own preferences.

The ranking process is not an exception to that rule. Coveo comes with many features that will help you achieve your goal.

In the next topics, I’ll provide you with detailed information on some of the features that you can use to personalize or customize the way your documents should be ranked.

Grouping related search results

$
0
0

The JavaScript Search Framework is a step forward in website integration. It was a natural fit when we decided to launch our Q&A site. Here is how we managed to display a question and its answer(s) together in the search results.

Indexing the content

If you have already visited answers.coveo.com, you may have noticed that it is powered by OSQA. Before replacing the search on that platform, we needed to index its content. OSQA does not have an API that we could use to extract content. However, everything is stored in a database so we went with our very flexible Database Connector to retrieve all the questions and answers. Here is the mapping file that we are using:

<?xml version="1.0" encoding="utf-8" ?><ODBC><CommonMapping><AllowedUsers><AllowedUsertype="Windows"allowed="true"><Name>everyone</Name><Server></Server></AllowedUser></AllowedUsers></CommonMapping><Mappingtype="Questions"><Accessortype="query">
            select forum_node.id as id, title, body, added_at, last_activity_at, node_type, tagnames, username, email, parent_id, score from forum_node inner join auth_user on forum_node.author_id = auth_user.id where node_type='question' AND state_string!='(deleted)';
        </Accessor><Fields><Uri>https://answers.coveo.com/questions/%[id]</Uri><ClickableUri>https://answers.coveo.com/questions/%[id]</ClickableUri><FileName>%[ID].txt</FileName><Title>%[ID]</Title><ModifiedDate>%[added_at]</ModifiedDate><ContentType>text/html</ContentType><Title>%[title]</Title><Body>
                %[body]
            </Body><CustomFields><CustomFieldname="OrderDate">%[last_activity_at]</CustomField><CustomFieldname="nodetype">%[node_type]</CustomField><CustomFieldname="tags">%[tagnames]</CustomField><CustomFieldname="sysAuthor">%[username]</CustomField><CustomFieldname="foldingid">%[id]</CustomField><CustomFieldname="questionid">%[id]</CustomField><CustomFieldname="questiontitle">%[title]</CustomField><CustomFieldname="score">%[score]</CustomField></CustomFields></Fields></Mapping><Mappingtype="Answers"><Accessortype="query">
            select forum_node.id as id, state_string, body, last_activity_at, node_type, tagnames, username, email, parent_id, score from forum_node inner join auth_user on forum_node.author_id = auth_user.id where node_type='answer';
        </Accessor><Fields><Uri>https://answers.coveo.com/questions/%[id]</Uri><ClickableUri>https://answers.coveo.com/questions/%[parent_id]</ClickableUri><FileName>%[ID].txt</FileName><Title>Answer to question id %[parent_id]</Title><ModifiedDate>%[last_activity_at]</ModifiedDate><ContentType>text/html</ContentType><Body>
                %[body]
            </Body><CustomFields><CustomFieldname="OrderDate">%[last_activity_at]</CustomField><CustomFieldname="nodetype">%[node_type]</CustomField><CustomFieldname="sysAuthor">%[username]</CustomField><CustomFieldname="foldingid">%[parent_id]</CustomField><CustomFieldname="answerid">%[id]</CustomField><CustomFieldname="answerstate">%[state_string]</CustomField><CustomFieldname="score">%[score]</CustomField></CustomFields></Fields></Mapping></ODBC>

We wanted to display the answers with the questions in the search results. An answer without the context of the question is only a part of the information you are looking for. We also wanted to pull the question into the result list if the searched term was only found in the answer and vice versa. In the Coveo jargon, this type of result interaction is called Folding. To fold questions and answers, we created several custom fields.

First, we needed a field containing the same value: @foldingid. On the question, @foldingid contains the id of the question in the OSQA database. On the answer, @foldingid contains the parent_id value which represents the id of the related question in the OSQA database. To organize the documents in a group of related results, we needed a way to distinguish the parent from its children. For that purpose, we set the field @questionid on all the questions and the field @answerid on all the answers. Those three fields will be used by the JavaScript Search Framework to fold results together.

Parent-Child relations

We then used the Folding Component to group questions and answers together. Here is the HTML that we are using:

<divclass='CoveoFolding'data-field='@foldingid'data-parent-field='@questionid'data-child-field='@answerid'data-range='5'data-rearrange="date ascending"></div>

We set the data-field property to the common value shared by both the question and its answer(s). We then set the properties to specify the field to use for the parent and for the children. Those fields are either on the question or on the answer but not on both. Finally, we set the data-rearrange property to "date ascending". This way, the results are displayed in the same order as they were created (question, answer 1, answer 2, …).

Rendering the results using templates.

We are only displaying questions and answers in this result list. The default result template is used to display the question, here it is:

<script class="result-template" type="text/x-underscore-template">
    <div class='coveo-date'><%-dateTime(raw.sysdate)%></div>
    <div class='coveo-title'>
        <a class='CoveoResultLink'><%=title?highlight(title, titleHighlights):clickUri%></a>
    </div>
    <div class='coveo-excerpt'>
        <%=(state.q)?highlight(excerpt, excerptHighlights):highlight(firstSentences, firstSentencesHighlights)%>
    </div>
    <div class='field-table'>
        <span class='answers-author'>
            <span class="CoveoFieldValue" data-field="@sysauthor"></span>
        </span>
        <span class='answers-score'>
            <img src="/upfiles/image/like_button.png"/>
            <span class="CoveoFieldValue" data-field="@score"></span>
        </span>
        <% if(raw.tags){ %> <span class="CoveoFieldValue CoveoTags" data-field="@tags" data-facet="TagsFacet" data-split-values="true"></span><% } %>
    </div>
    <% if (childResults.length) { %>
        <img class="folding-picture" src="/upfiles/image/fleche_attention.png"/>
    <% } %>
    <div class='CoveoResultFolding'
        data-result-template-id="answer-template" 
        data-more-caption="ShowMoreReplies" 
        data-less-caption="ShowLessReplies"></div>
</script>

We have the ResultFolding Component near the end of the template. While the Folding component folds the questions and answers together, the ResultFolding Component renders the folded results. The template used to render the folded results is an option of the ResultFolding Component. In our case, the "answer-template" template is used to render the children.

We use a specific template for the children since, among other things, an answer does not have a meaningful title. Here is the template that we are using:

<script id="answer-template" type="text/x-underscore-template">
    <div class='coveo-date'><%-dateTime(raw.sysdate)%></div>
    <div class='coveo-excerpt'>
        <%=highlight(excerpt, excerptHighlights)%>
    </div>
    <span class='answers-author'>
        <span class="CoveoFieldValue" data-field="@sysauthor"></span>
    </span>
    <span class='answers-score'>
        <img src="/upfiles/image/like_button.png"/>
        <span class="CoveoFieldValue" data-field="@score"></span>
    <span>
</script>

The end result for a question with two answers will look like this:

Answers sample

And that’s how you can easily group related content in a Coveo result list.

Using Java 8 with Mockito

$
0
0

Java 8 is pretty cool. We (finally!) got Lambda Expressions in Java and a lot of other goodies. At Coveo, we started working with Java 8 as soon as the first stable release was available. As mentioned in my previous blog post, one way we’re using Java 8 is in our unit tests.

Let’s say we want to test that adding an Employee to a Company correctly invokes the CompanyDal class with the right information.

1
2
3
4
publicinterfaceCompanyDal{voidregisterEmployee(Employeee);}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
publicclassCompany{privateCompanyDaldal;publicCompany(CompanyDaldal){this.dal=dal;}publicvoidaddEmployee(Employeeemployee){// Pretend there's a deep copy hereEmployeecopiedEmployee=newEmployee(employee.getName());dal.registerEmployee(copiedEmployee);}}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
publicclassEmployee{privateStringname;publicEmployee(Stringname){this.name=name;}publicStringgetName(){returnthis.name;}}

When testing the Company class we’ll want to mock our CompanyDal interface. We use the great Mockito library for our mocking needs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
publicclassCompanyTest{privateCompanycompany;privateCompanyDalmockedDal;@Beforepublicvoidsetup(){mockedDal=mock(CompanyDal.class);company=newCompany(mockedDal);}@TestpublicvoidaddingAnEmployeeRegistersItInTheDal(){Employeeemployee=newEmployee("John");company.addEmployee(employee);// TODO: Verify that the employee was registered with the DAL.}

This is a good test as it validates that the Company class interacts with the CompanyDal class as expected. Now.. how can we do this verification?

We could be tempted to do

1
2
3
4
5
6
7
8
@TestpublicvoidaddingAnEmployeeRegistersItInTheDal(){Employeeemployee=newEmployee("John");company.addEmployee(employee);verify(mockedDal).registerEmployee(employee);}

But that won’t work since the two Employee instances are not the same. This is caused by the deep copy in the Company class.

Note: For the sake of this example, we need to pretend that we can’t override Employee’s equals() method to use value equality instead of reference equality.

What we need to do is verify that the Employee passed to the CompanyDal has the same name property as the one passed to the Company class. This can be done using Mockito Matchers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
@TestpublicvoidaddingAnEmployeeRegistersItInTheDal(){Employeeemployee=newEmployee("John");company.addEmployee(employee);verify(mockedDal).registerEmployee(argThat(newArgumentMatcher<Employee>(){@Overridepublicbooleanmatches(Objectitem){return((Employee)item).getName().equals(employee.getName());}@OverridepublicvoiddescribeTo(Descriptiondescription){description.appendText("Employees must have the same name");}}));}

This will produce the following message in case of test failure

Argument(s) are different! Wanted:
companyDal.registerEmployee(
    Employees must have the same name
);
-> at com.coveo.CompanyTest.canFindEmployee(CompanyTest.java:71)
Actual invocation has different arguments:
companyDal.registerEmployee(
    com.coveo.Employee@7f560810
);
-> at com.coveo.Company.addEmployee(CompanyTest.java:46)

This is good! But this test suddenly went from 4 lines of code to 15 lines! The matches() method could easily be replaced by a java.util.Predicate!

But we’ll need an adapter class to bridge Mockito’s Matcher with Predicate. Introducing LambdaMatcher!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
importjava.util.Optional;importjava.util.function.Predicate;importorg.hamcrest.BaseMatcher;importorg.hamcrest.Description;publicclassLambdaMatcher<T>extendsBaseMatcher<T>{privatefinalPredicate<T>matcher;privatefinalOptional<String>description;publicLambdaMatcher(Predicate<T>matcher){this(matcher,null);}publicLambdaMatcher(Predicate<T>matcher,Stringdescription){this.matcher=matcher;this.description=Optional.ofNullable(description);}@SuppressWarnings("unchecked")@Overridepublicbooleanmatches(Objectargument){returnmatcher.test((T)argument);}@OverridepublicvoiddescribeTo(Descriptiondescription){this.description.ifPresent(description::appendText);}}

LambdaMatcher is really easy to use. Here’s our rewritten addingAnEmployeeRegistersItInTheDal test.

1
2
3
4
5
6
7
8
9
@TestpublicvoidcanFindEmployee(){Employeeemployee=newEmployee("John");company.addEmployee(employee);verify(mockedDal).registerEmployee(argThat(newLambdaMatcher<>(e->e.getName().equals(employee.getName()))));}

And if you want a description

1
2
3
4
5
6
7
8
9
10
@TestpublicvoidcanFindEmployee(){Employeeemployee=newEmployee("John");company.addEmployee(employee);verify(mockedDal).registerEmployee(argThat(newLambdaMatcher<>(e->e.getName().equals(employee.getName()),"Employees must have the same name")));}

This new code will generate the same nice error message as above.. but it is much simpler! If you want to save even more time, you can create the following static method.

1
2
3
4
5
publicstatic<T>TargThatMatches(Predicate<T>predicate){LambdaMatcher<T>matcher=newLambdaMatcher<>(predicate);returnMatchers.argThat(matcher);}

Which will result in the following test.

1
2
3
4
5
6
7
8
9
@TestpublicvoidcanFindEmployee(){Employeeemployee=newEmployee("John");company.addEmployee(employee);verify(mockedDal).registerEmployee(argThatMatches(e->e.getName().equals(employee.getName())));}

That’s it! Feel free to use the LambdaMatcher class in your own projects and add some Lambda to your unit tests!

Ad-hoc parsing using parser combinators

$
0
0

If you work for any given amount of time as a software developer, one problem you’ll end up with is parsing structured text to extract meaningful information.

I’ve faced this issue more than I care to remember. I even think that’s why I learned regular expressions in the first place. But, as Internet wisdom will teach you regexes are only suitable for a subset of the parsing problems out there. We learn it at school: grammars are the way to go. Still, it is my deep belief that most engineers, when faced with a parsing problem, will first try to weasel their way out using regular expressions. Why? Well…

image

(do try that trick with the forks, it’s awesome)

There are many tools you can use to generate parsers in various languages. Most of those involve running some custom compilation step over your grammar file, from which a set of source code files will be produced. You’ll have to include those in your build, and then figure out a way to use them to transform your input text into a meaningful structure. Hmm. Better go fetch the duct tape.

Enter parser combinators. As Wikipedia states, parser combinators are essentially a way of combining higher order functions recognizing simple inputs to create functions recognizing more complex input. It might all sound complex and theoric, but it’s in fact pretty simple. Here’s an example: suppose I have two functions able to recognize the ( and ) tokens in some text input. Using parser combinators, I could assemble those to recognize a sequence in that input made up of an opening parens and then a closing one (of course in the real world you’d want stuff in between too).

Still too complex? Let’s see some real world code now. Let’s implement a simple parser for performing additions and subtractions (ex: 1 + 2 + (3 + 4) + 5). I’ll use Scala because it’s base library comes with built-in support for parser combinators, but similar functionality is available for other languages too.

First, let’s define a few classes to hold our AST:

abstractclassExpression{defevaluate():Int}caseclassNumber(value:Int)extendsExpression{defevaluate()=value}caseclassParens(expression:Expression)extendsExpression{defevaluate()=expression.evaluate()}caseclassAddition(left:Expression,right:Expression)extendsExpression{defevaluate()=left.evaluate()+right.evaluate()}caseclassSubstraction(left:Expression,right:Expression)extendsExpression{defevaluate()=left.evaluate()-right.evaluate()}

For the curious, a case class in Scala is essentially a shorthand for immutable classes holding a few properties. They automatically come with an proper equals and toString implementation (among other things). They are perfect for this purpose.

Now here’s the parser that goes with it:

objectSimpleExpressionParserextendsRegexParsers{defparse(s:String):Expression={parseAll(expression,s)match{caseSuccess(r,_)=>rcaseNoSuccess(msg,_)=>thrownewException(msg)}}valexpression:Parser[Expression]=binary|parens|numbervalparens="("~expression~")"^^{case"("~e~")"=>Parens(e)}valbinary=(number|parens)~operator~expression^^{casel~"+"~r=>Addition(l,r)casel~"-"~r=>Substraction(l,r)}valnumber=regex("""\d+""".r)^^{casev=>Number(v.toInt)}valoperator="+"|"-"}

The parser is in fact a class (in this case a Singleton but that’s not mandatory). It’s methods define a set of rules that can be used for parsing. Notice how close those look to a typical BNF grammar? That’s right, you won’t be needing that duct tape after all.

Rules can be very simple such as operator which recognizes simple tokens using either strings or regular expressions, or more complex such as binary which combines other rules using special operators. Note that those operators are just methods from the base RegexParsers class. The Scala libraries provide many operators and methods to define how parsers can be combined. In this case I’m using the ~ operator which denotes a sequence. It’s also possible to match variable sequences, optional values, and many, many others.

The return value for each rule is in fact a Parser[T] where T is the type of item that is recognized. Simple rules based on strings or regular expressions return a Parser[String] without the need for further processing. Rules that combine multiple values or that need the raw token to be transformed in some way (such as number) can be followed by the ˆˆ operator applied to a partial function that’ll match the recognized stuff using pattern matching and then produce the resulting value. For example, binary returns either an Addition or a Substraction, which means it’s inferred return value is Parser[Expression].

Here’s how this parser could be used:

assertEquals(1,SimpleExpressionParser.parse("1").evaluate())assertEquals(3,SimpleExpressionParser.parse("1 + 2").evaluate())assertEquals(15,SimpleExpressionParser.parse("1 + (2 + 3) + 4 + 5").evaluate())

The parse method either returns an Expression or throws an exception when a parsing error occurs. Then calling evaluate on that expression recursively computes the expression, and returns the result.

If I were to use the toString method on the root expression, for an input string 1 + 2 I would end up with Addition(Number(1),Number(2)), which shows that the result from the parsing is a nice, easy to use AST.

Dealing with left recursive grammars

You might have noticed that in my example the definition for the binary rule didn’t use expression on the left side of the operator. Why can’t I do something like this?

defbinary=expression~operator~expression

The problem with this rule is that is makes my grammar a left recursive one, and by default the Scala parser combinators don’t handle that quite well. While processing the input, the expression rule is then called, which eventually digs into binary, which then invokes expression again on the same input… and then you end up with (and on) Stack Overflow.

So how to work around this? One possibility is to ensure that your grammar never recurses to the same rule without consuming at least one character (e.g. do not use left-recursion). That’s why I used a slightly more complex form in my initial sample. Another possibility when using Scala parser combinators is to mix in the PackratParser trait into your class, which enables support for left-recursion:

objectSimpleExpressionParserextendsRegexParserswithPackratParsers{defparse(s:String):Expression={parseAll(expression,s)match{caseSuccess(r,_)=>rcaseNoSuccess(msg,_)=>thrownewException(msg)}}valexpression:PackratParser[Expression]=binary|parens|numbervalparens="("~expression~")"^^{case"("~e~")"=>Parens(e)}valbinary=expression~operator~expression^^{casel~"+"~r=>Addition(l,r)casel~"-"~r=>Substraction(l,r)}valnumber=regex("""\d+""".r)^^{casev=>Number(v.toInt)}valoperator="+"|"-"}

Much better isn’t it?

Conclusion

In this post I’ve only shown a very simple parser, but using the same techniques it’s possible to build much more complex ones that can process just about any kind of structured expression, without the need for external tools, and using very little code. All of sudden, recognizing complex expressions no longer becomes an issue, and this opens up many possibilities when faced with situations where custom text input is being used. So give it a try!

Customize Ranking with QRE

$
0
0

A query ranking expression (QRE) modifies the ranking score of search results matching a query expression.

What is the impact on the final ranking score?

The QRE feature is computed during the first phase of the ranking algorithm. This will have an impact on all documents, especially in case where it’s only the QRE that boost the document’s weight.

Since the September 2013 build, a QRE interface was added to the Interface Editor. The QRE interface can be found under Interface Editor - Search Interfaces – Features - Ranking.

The QRE Interface

With this interface, QRE can be modified directly from the Interface Editor without using CES. All QRE created within the Interface Editor are specific to the search interface that is selected in “current Interface” dropdown menu. Therefore, the QRE will affect every queries made in that search interface.

Example

A query is made for “pdf”, as you can see the first result is not a pdf file. We can boost PDF type document weight by using QRE.

Total Ranking Weight

A QRE is made for @sysfiletype==”pdf” query expression and the +100 modifier value. In the search results, all documents for which PDF is the file type will have 1000 added to their normal ranking score. The ranking score of returned documents with other file types is not affected.

Adding a QRE

QRE Modifier

The impact on the final score is:

  • (+100) will increase your document QRE weight by 1000
  • (+50) will increase your document QRE weight by 500
  • (-50) will decrease your document QRE weight by 500
  • (-100) will decrease your document QRE weight by 1000

You can also select “custom” in order to enter a custom value.

Now if we perform the same query, some PDF type documents will be listed first.

QRE Value

Why is there a QRE by default for some documents and not for the others?

Document with a QRE Value

Document wihtout a QRE Value

You need to compare the language of your search interface versus the @syslanguage of the document in CES Admin tool – Content – Index Browser

Document Language

  • If the language of the document matches the language of the search interface, the QRE of the document will automatically be: 600.
  • If the language of the document doesn’t match the language of the search interface or @syslanguage is missing under the Field tab, then the QRE of the document will automatically be: 0

How to Implement QREs?


Reusing templates with UnderscoreJS

$
0
0

UnderscoreJS is probably one of my favorite template engines. I’ve been working with Mustache or Handlebars, and they just don’t offer the same flexibility as Underscore. Even more, we’ve been using this framework extensively in our JavaScript UI Framework, always delivering great results.

It may happen that you want to render multiple templates in a single page. For instance, let’s say that we’d like to have a page displaying multiple search results categories : Products, Downloads etc.

Defining our templates

Let’s start by creating our first Underscore template for the Products section. It basically just renders a list of products with their pictures and name. There’s also a section that will be used to display more/less results.

<div id="productsContainer">
   <script class="productsTemplate" type="text/x-underscore-template">
   <% if (results.results.length > 0) { %>
       <h2>Products</h2>
       <span class="showMoreResults">Show more results in Products</span>
       <span class="showLessResults">Show fewer results in Products</span>
       <div class="resultsContainer">
           <div class="results">
               <% _.each(results.results, function(result, index) { %>
                       <div class="productElement">
                           <img src='<%= result.raw.@Html.Coveo().ToCoveoFieldName("imageurl", false)%>'/>
                           <div class="productText">
                               <a class="clickableUri" href="<%= result.clickUri %>">
                                   <span><%= result.title %></span>
                               </a>
                           </div>
                       </div>
               <% }); %>
           </div>
       </div>
   <% } %>
   </script>
</div>

PS : Don’t mind the “ToCoveoFieldName” syntax, this is coming from a Coveo for Sitecore example.

We then need to compile and render our template with results. In a

varproductsTemplate=_.template($("script.productsTemplate").html());

Obviously, if we want the whole thing to actually do something, we’ll need to render our template with relevant results. Let’s perform a query directly on the REST endpoint to get our results from the CES index. In this example, the productsQuery/numberOfResults are parameters passed to this function, it could be anything.

Coveo.Rest.SearchEndpoint.endpoints['default'].search({q:productsQuery,numberOfResults:numberOfResults}).done(function(results){if(results.results.length>0){$("#productsContainer").append(template({results:results}));}});

This will actually output the resulting HTML in the page. You may notice that i’m using the results object, which actually contains the Query Results object returned by the REST endpoint. So the whole thing work, and we have products and rainbows displayed in our page. SUDDENLY, someone asks us to display another section, based on Downloads this time. Alright! Let’s create another template.

<div id="downloadsContainer" class="searchContainer">
  <script class="downloadsTemplate" type="text/x-underscore-template">
      <% if (results.results.length >= 0) { %>
          <h2>Downloads</h2>
          <span class="showMoreResults">Show more results in Downloads</span>
          <span class="showLessResults">Show fewer results in Downloads</span>
          <div class="results">
         <% _.each(results.results, function(result) { %>
              <div class="downloadsElement">
                  <div class="downloadsInformations">
                      <a class="clickableUri" href="<%= result.clickUri %>">
                          <%= result.title %>
                      </a>
                      <div>
                          <span class="downloadsSize"><%= result.raw.@Html.Coveo().ToCoveoFieldName("size", false)%>kB</span> 
                          <span class="downloadsExtension"><%= result.raw.@Html.Coveo().ToCoveoFieldName("extension", false)%></span>
                      </div>
                  </div>
              </div>
          <% }); %>
         </div>
      <% } %>
  </script>
</div>

This time, we’re outputting slightly different informations : we have a link, as well as the file size and the file extension. Just like we did before for the Products, we need to compile and then render our template. We’ll extract our query in an utility function, since we don’t want to repeat code, don’t we?

varproductsTemplate=_.template($("script.productsTemplate").html());vardownloadsTemplate=_.template($("script.downloadsTemplate").html());performQuery(productsQuery,numberOfResults,"#productsContainer");performQuery(downloadsQuery,numberOfResults,"#downloadsContainer");functionperformQuery(queryToPerform,numberOfResults,container){Coveo.Rest.SearchEndpoint.endpoints['default'].search({q:queryToPerform,numberOfResults:numberOfResults}).done(function(results){if(results.results.length>0){$(container).append(template({results:results}));}});}

Excellent. We have Products and Downloads rendered, our customer is happy and we’re riding unicorns. But we’re not 100% happy. Take a look at this little piece of HTML that gets repeated :

<h2>Products</h2>
 <span class="showMoreResults">Show more results in Products</span>
 <span class="showLessResults">Show fewer results in Products</span>

This may seems small, but let’s say we add 6 more templates in our page (that actually happened), it will get impossible to maintain. Especially if you decide to change the CSS class of one of these elements, for instance.

Extracting a header template

My solution to our repetition problem is to create a new header template, by extracting and parametizing our little piece of markup. Our header template will use a category as parameter.

<script class="headerTemplate" type="text/x-underscore-template">
 <h2><%= category%></h2>
 <span class="showMoreResults">Show more results in <%= category%></span>
 <span class="showLessResults">Show fewer results in <%= category%></span>
</script>

We then need to edit our JavaScript to take this new template into account.

varheaderTemplate=_.template($("script.headerTemplate").html());varproductsTemplate=_.template($("script.productsTemplate").html());vardownloadsTemplate=_.template($("script.downloadsTemplate").html());performQuery(productsQuery,numberOfResults,"#productsContainer");performQuery(downloadsQuery,numberOfResults,"#downloadsContainer");functionperformQuery(queryToPerform,numberOfResults,container){Coveo.Rest.SearchEndpoint.endpoints['default'].search({q:queryToPerform,numberOfResults:numberOfResults}).done(function(results){if(results.results.length>0){$(container).append(template({header:headerTemplate,results:results}));}});}

You can see that when we append the HTML, we now pass the headerTemplate to the current template we’re rendering. Last thing we need, let’s edit our templates to use this new header parameter.

<div id="productsContainer">
   <script class="productsTemplate" type="text/x-underscore-template">
   <% if (results.results.length > 0) { %>
       <%= header({category : "Products"})%>
       <div class="resultsContainer">
           <div class="results">
               <% _.each(results.results, function(result, index) { %>
                       <div class="productElement">
                           <img src='<%= result.raw.@Html.Coveo().ToCoveoFieldName("imageurl", false)%>'/>
                           <div class="productText">
                               <a class="clickableUri" href="<%= result.clickUri %>">
                                   <span><%= result.title %></span>
                               </a>
                           </div>
                       </div>
               <% }); %>
           </div>
       </div>
   <% } %>
   </script>
</div>
<div id="downloadsContainer" class="searchContainer">
  <script class="downloadsTemplate" type="text/x-underscore-template">
      <% if (results.results.length >= 0) { %>
          <%= header({category : "Downloads"})%>
          <div class="results">
         <% _.each(results.results, function(result) { %>
              <div class="downloadsElement">
                  <div class="downloadsInformations">
                      <a class="clickableUri" href="<%= result.clickUri %>">
                          <%= result.title %>
                      </a>
                      <div>
                          <span class="downloadsSize"><%= result.raw.@Html.Coveo().ToCoveoFieldName("size", false)%>kB</span> 
                          <span class="downloadsExtension"><%= result.raw.@Html.Coveo().ToCoveoFieldName("extension", false)%></span>
                      </div>
                  </div>
              </div>
          <% }); %>
         </div>
      <% } %>
  </script>
</div>

When rendering one of those templates, Underscore will render the child templates as well (just like the header template there). Much cleaner!

That’s it for now! Feel free to use this trick in your projects or ask for more examples!

Customizing the Insight Panel

$
0
0

If you haven’t deployed the insight panels in your organization yet, here is a link to the procedure to get it done: Installing the Coveo for Salesforce Application.

Out of the box there are several predefined Insight Panels with pertinent queries. That is a great starting point, but even more usefulness can be attained by customizing the insight panels to meet the needs and the ways of working of the users.

One good example to demonstrate the usage is a “related comments” insight panel. Here at Coveo, we document our cases in the case comments, so looking up the comments that are relevant to the case can help give us precious insight that can help orient towards a path to resolution more quickly. In this post we will walk through the procedure for creating a customized insight panel using the Query Builder.

Once the Insight Panels are deployed, they should look something like this: Coveo Insight Panel

Let’s create a new tab to hold our custom-built query by pressing the menu icon (☰) at the far right and selecting “add a tab”, and call it “Related Comments”.

Add a Tab

Create a Coveo Insight Tab

Next we add a box to it pressing the menu icon (☰) at the far right and selecting “add a box”, or pressing the orange “add a box” button.

Add a Box

And then select the Query Builder.

Query Builder

The Contextual Query tab allows us to build a query by selecting criteria from pick lists, in this example, we want to return Case Comments in our Insight Panel.

Case Comment

Linking fields is used to boost the ranking when its content matches that for the current Salesforce context. In this case we want to give more credence to Case Comments that match the Subject and Summary of the current case.

Match Subject and Summary

Next we use the advanced filtering to give boost the relevancy of the Case Comments that match with the Case’s subject or summary. In this example, we decide that a match in “Subject” should be considered more relevant than a match in “Summary”

Boost the Subject

We’ll also use the custom filters to make sure the comments from the last year appear first by giving them a relevancy boost. We could have made this condition mandatory, but we’d like to keep the old comments in the search results if there are any, we just want to make sure they appear after more current results.

Boost Recent Comments

Contextual query generates the query using the Coveo query extension language based on our configuration.

Generated Contextual Query

Saving this brings us back to our case, where we can see that the Coveo Insight Panel is now returning related case comments:

Related Recent Comments

We can see that the Query builder tool can easily be used to create complex and useful queries to automatically present personalized context related content in search interfaces without the end-users even having to type a query.

Even more flexibility in the query can be obtained by Creating a Coveo Insight Box Using the Advanced Mode in Query Extension Language directly. The full reference for the query extension language is available on the Coveo query extension language page.

Optimizing web application build process

$
0
0

At Coveo, we love to tackle new technologies. Last year was all about TypeScript, preprocessed stylesheets with LESS/SASS and optimizing our JavaScript applications with minification, concatenation and compression. Since we have a Java stack in the cloud, using the build tools that were already in place was a no brainer. We started with an Ant script to manage the build process of the Cloud Admin Web Application.

We created Ant tasks for compiling TypeScript, concatenating JavaScript libraries and compiling LESS to CSS. We then added sprites compilation, unit tests and underscore templates… up to a point where building the whole application took about 30 seconds.

That means that every time a developer had to modify a stylesheet and wanted to test it in a browser, he had to run ant less..and wait 6 or 7 seconds. After that, he could refresh the page and see the results. It was the same thing if he had to modify a TypeScript file, add a new library or add a new icon.

We then started to be annoyed by these preprocessors and compilers and having to manually trigger each tasks. We were also annoyed by the big pile of XML configuration for Ant. We needed a new build tool better suited to our JavaScript web application needs.

##Then came Gulp

Working with Gulp is a pleasure. What I must insist on is Gulp’s philosophy of Code Over Configuration. The biggest pain about Ant is it’s XML configuration. I love the way we can code the build tasks just as you would code regular JavaScript. Also, Gulp is blazing fast since it can run tasks that are not dependent to each other simultaneously. Porting to Gulp led to a 9 seconds build time.

There are a tons of great official and third-party plugins targeting a JavaScript stack. The plugins are all searchable right on Gulp’s website. Each plugins can be installed via npm and saved to your package.json’s devDependencies.

##Great plugins we use

###Watch

Watch is integrated into Gulp and must be one of its greatest features. Watch allows to select files and monitor for changes. We can hook those changes to specific tasks, so modifications to LESS files will trigger LESS compilation automatically. Watch improves our productivity a lot!

vargulp=require('gulp');gulp.task('watch',function(){gulp.watch('./stylesheets/**/*.less',['less']);gulp.watch('./templates/**/*.ejs',['templates']);gulp.watch('./src/**/*.ts',['ts:compile']);gulp.watch('./test/src/**/*.ts',['test:compile']);});

###Template compile

The Cloud Admin Web Application is a Single Page Application built with Marionette.JS and our template engine of choice is Underscore.JS. When developing, we want each template separated in their own EJS files, but in production we need the best performance. Sure, we could use underscore’s _.template method on the client as stated in Vincent’s latest post but gulp-template-compile goes a step further by concatenating all of our templates into a single file and generating plain JavaScript functions that are usable later.

vargulp=require('gulp');vartemplate=require('gulp-template-compile');varconcat=require('gulp-concat');gulp.task('templates',function(){returngulp.src('./templates/**/*.ejs').pipe(template({namespace:'CoveoTemplates'})).pipe(concat('templates.js')).pipe(gulp.dest('target/package/templates'));});

Templates are now in target/package/templates/templates.js and accessible with window.CoveoTemplates.templateName(data).

###Gzip

As you may know, the Cloud Admin Web application is hosted on AWS S3. Since S3 is for static file serving and doesn’t allow on-the-fly compression, we need to gzip files before uploading them to S3. With gulp-gzip we simply pipe gzip() into our task’s stream and we have a compressed gzipped file.

vargulp=require('gulp');vargzip=require('gulp-gzip');// Gzip the file CoveoJsAdmin.jsgulp.task('gzip',function(){returngulp.src('target/package/js/CoveoJsAdmin.js').pipe(gzip()).pipe(gulp.dest('target/package/js'));});

This will output the gzipped CoveoJsAdmin.js.gz file in target/package/js.

###Minify, concatenate and source maps gulp-concat allows us to to combine all the files needed into a single file while gulp-uglify minifies (and uglifies!) our files to shrink them to a minimum weight. gulp-sourcemaps lets us know which code belongs to which file for easier debugging. gulp-if allows us to add conditions to tasks. We use a --min flag to determine if we minify and gzip or not.

vargulp=require('gulp');vargutil=require('gulp-util');vargulpif=require('gulp-if');varconcat=require('gulp-concat');varuglify=require('gulp-uglify');vargzip=require('gulp-gzip');// Check for '--min' flag (true if present)varuseMinifiedSources=gutil.env.min;// Minify and gzipgulp.task('libraries',function(){returngulp.src('lib/**/*').pipe(concat('CoveoJsAdmin.Dependencies.js')).pipe(gulp.dest('target/package/js')).pipe(gulpif(useMinifiedSources,uglify())).pipe(gulpif(useMinifiedSources,rename('CoveoJsAdmin.Dependencies.min.js'))).pipe(gulpif(useMinifiedSources,gzip(gzipOptions))).pipe(gulpif(useMinifiedSources,gulp.dest('target/package/js')));});

This will take all our libraries files (Backbone, JQuery, Underscore, etc) located in /lib and bundle them into CoveoJsAdmin.Dependencies.js. If the flag --min is enabled when calling gulp, it will also minify and rename to CoveoJsAdmin.Dependencies.min.jsand gzip the file to CoveoJsAdmin.Dependencies.min.js.gz. The generated files will be output to target/package/js.

###Other notable plugins we use - gulp-clean, to delete generated files and get a fresh start - gulp-util, for easier logging in tasks and passing arguments via the CLI - gulp-less, for those awesome stylesheets - gulp-tsc, to compile our TypeScript

###What’s coming next

Microsoft’s roadmap to 2.0 is very promising!. Especially with the latest tsc 1.1.0.1 compiler that should save us more than 30% compile time. We’re actually updating the compiler and adapting our codebase to support it soon. Some plugins offer incremental builds, we’re actually looking at some to ensure compile time will be even shorter. Also, we still have development scripts in Python and we would love to translate them to JavaScript to build the project as easily as

    hg pull
    npm install
    gulp

We’ll get there soon!

Working Remotely

$
0
0

I’ve been working remotely for more than two years now, and I thought I’d add my own experience to the rather large amount of similar posts out there on the web.

First some background. I’m a long time Coveo employee, and before going remote I worked on-site for many years. I’m a Canadian who, having married a frenchwoman, now lives in France. My job is split between development and some management, with a strong emphasis on the former. I head a 7 person strong team responsible for our Salesforce integration, JS UI front-end and REST backend, and also our mobile apps.

One particular (I think) aspect of my own situation is the 6 hours time zone difference between me and the home office. Also, I should note that most Coveo R&D staff are on-site, although we do have some other remote developers but none as far away as me.

When thinking about remote work, communication is always the first thing coming up, so I’ll dedicate this post to this topic.

Find ways to build social ties

When I started working remotely, I learned quickly that you can’t depend only on email, scheduled meetings and phone calls to properly interact with coworkers. Social relationships are built on more than strict information transfer; and you need to form ties with your coworkers in order to create a productive environment. In an on-site setup this happens through casual conversations (the coffee machine comes to mind). In short, you need a way to hang out with the folks.

Here at Coveo we use Slack to host chat rooms dedicated to various topics. Each team typically has a room that members use to discuss various topics. Other employees can also use that channel to ask questions, enquire about issues they are having, etc. Discussions often fall outside of strictly work-related subjects (we even have a #phones channel), and that’s a great thing for remote workers, as it replaces in part the casual conversations you’d have up in the hallway.

Now, for 100% remote teams, maybe Slack (or any similar service) is good enough, but in my case I’m working with a bunch of folks sitting right next to each other. Often they’ll find it easier to just discuss something instead of bringing the discussion to Slack.

What I came up with is using a permanent Google Hangout running on an old laptop sitting in the office near my teammates. Whenever I’m working at the same time as folks in Quebec, I’ll connect to this hangout and let the audio feed play on my laptop speakers. I also put the video feed in a small window in the corner of one of my (numerous) screens. This way, the hangout feels like a window to the office. I can hear conversations between my teammates, and jump in whenever I like (which I do quite often, being of the opinionated trollish kind).

Some might think that’s creepy having a coworker essentially eavesdropping all conversations, but when you think about it I’d hear the same stuff if I was sitting there, and whenever I’m connected my face is shown on full screen, so it’s pretty obvious I’m “here”. In fact, I never had anyone complain about that situation. I did frighten some of the cleaning staff while doing tests, but that’s another story.

On top of that, 4 or 5 times a year I jump on a plane and go spend a week or two at the office. It’s essentially a social operation; there is not really any task that requires me to be onsite, but to me it’s essential nonetheless. By going there I’m able to meet people with whom I’d otherwise never talk just because we work on unrelated things. I try to use the opportunity to go and meet new hires, although with the company growing it’s becoming somewhat harder. Also, having the opportunity to go drink a beer with the guys really helps with the whole social relationship thing I’ve mentioned earlier.

Be good at email

The last thing I wanted to mention is that, as a remote worker, you need to be good at email. Even with all the futuristic stuff I mentioned above, email is going to be a big part of your life. Personally this is something I haven’t mastered yet, but I’m getting better… I think.

Sometimes, there is a situation or an opinion that I need to convey to someone. When I was on-site, I’d just barge in that person’s office and trust body language and frantic hand movements to put the right emphasis on things. By looking at his/her reaction I’d get a good feeling of how my message was being understood and interpreted. Not so much with email.

My first reaction to that was to include as much information as possible in my emails. I created long, flowing dissertations painstakingly explaining the minute details of whatever was my subject. And nobody read them. Hell, you’re probably not even reading this far into this post.

So my advice is: keep those email short, and to the point. Re-read them, and remove every word or sentence that isn’t important. You’ll certainly omit details, but you’ll be able to reach people. There is always replies or followup discussions to work on the subtler nuances.

There are so many much more things to discuss about being a remote worker. In fact I removed whole paragraphs from this post; I might use them in a followup sometime. Anyway. That’s it for now.

Hackathon - formerly known as Itch Scratching Development

$
0
0

##The Non-Scratching

Our days are packed. We work on bug fixes, we work on features that have been sold to clients, on pieces required before the next rollout or it may be on the new architecture. We work on things that have a due date or that somebody is waiting for. We don’t have much time to ponder and wonder about all those little things we would like to change or those ideas we have; for all those itches we lack the time to scratch.

How do we encourage people to scratch the itch even if they are not part of our planning? Enter the Hackathon.

##The Scratching

We had our first Hackathon last Saturday. The rules were simple:

  • Work on something related to Coveo.
  • Work on something you would not work in your normal workweek (aka, don’t plan on “working”).
  • You demo what you did before you leave.
  • No beer before noon.

Ok, the last one is not a rule, only a recommendation.

I was impressed by the number of working projects presented at the end of the day. We had some complex NLP tried out, some time saving macros and some UI hacks (by non-UI people). Some even tried to prototype other technology to replace limitations in the ones we have or add new APIs into our infrastructure, and much more. We can say it was an all around success.

Everybody that was present deserves a huge Kudos.

##The Next Time

This was a success that must be repeated. Things to change for the next edition? We’ll try to get more people involved. Since many participants did projects with technology they do not use daily, we should have our computers ready to hack with a setup in the expected technology done the day before. Also, stuff the fridge with beer to ensure it’s cold enough for noon.

Meanwhile, be on the lookout. Some of those itches may come back in the planned roadmap.

Taking Enums to the next level with Java 8

$
0
0

In our awesome cloud Usage Analytics API, there is a call that returns the analytics data in data points format (these are meant to be used to build a graph). Recently, we added a feature allowing the user to choose the time period (initially, only days was available). Problem is, the code was strongly coupled with the day period…

image

For exemple, take this snippet :

1
2
3
4
5
6
7
8
9
10
11
12
privatestaticList<DataPoint>createListWithZerosForTimeInterval(DateTimefrom,DateTimeto,ImmutableSet<Metric<?extendsNumber>>metrics){List<DataPoint>points=newArrayList<>();for(inti=0;i<=Days.daysBetween(from,to).getDays();i++){points.add(newDataPoint().withDatas(createDatasWithZeroValues(metrics)).withDayOfYear(from.withZone(DateTimeZone.UTC).plusDays(i).withTimeAtStartOfDay()));}returnpoints;}

Note: Days, as well as Minutes, Hours, Weeks and Months in the snippet a little further below, come from the Joda-Time Java date and time API.

Even if the name of the method does not reflect it, it is very strongly binded to the concept of days.

As I was looking for a way to use different time periods (months, weeks, hours for exemple), I saw the oh so nasty switch/case statement slowly sneaking its way into the code.


You have to understand that the notion switch/case = evil was drilled into my mind when I was attending college and in two internships that I had, so I tend to try to avoid those at any cost, mainly because they often violate the open-closed principle. I strongly believe that this principle is one of the most important best practices for writing object-oriented code. And I am no the only one. Robert C. Martin once said :

In many ways [the open-closed principle] is at the heart of object oriented design. Conformance to this principle is what yields the greatest benefits claimed for object oriented technology; i.e. reusability and maintainability.1


I told myself : “We just started using Java 8. Maybe I can find a way to use some awesome new feature to avoid that switch/case death trap.” Using the new functions in Java 8 (well not so new, but you know what I mean), I decided to add a little meat to an enum that would represent the different time periods available.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
publicenumTimePeriod{MINUTE(Dimension.MINUTE,(from,to)->Minutes.minutesBetween(from,to).getMinutes()+1,Minutes::minutes,from->from.withZone(DateTimeZone.UTC).withSecondOfMinute(0).withMillisOfSecond(0)),HOUR(Dimension.HOUR,(from,to)->Hours.hoursBetween(from,to).getHours()+1,Hours::hours,from->from.withZone(DateTimeZone.UTC).withMinuteOfHour(0).withSecondOfMinute(0).withMillisOfSecond(0)),DAY(Dimension.DAY,(from,to)->Days.daysBetween(from,to).getDays()+1,Days::days,from->from.withZone(DateTimeZone.UTC).withTimeAtStartOfDay()),WEEK(Dimension.WEEK,(from,to)->Weeks.weeksBetween(from,to).getWeeks()+1,Weeks::weeks,from->from.withZone(DateTimeZone.UTC).withDayOfWeek(1).withTimeAtStartOfDay()),MONTH(Dimension.MONTH,(from,to)->Months.monthsBetween(from,to).getMonths()+1,Months::months,from->from.withZone(DateTimeZone.UTC).withDayOfMonth(1).withTimeAtStartOfDay());privateDimension<Timestamp>dimension;privateBiFunction<DateTime,DateTime,Integer>getNumberOfPoints;privateFunction<Integer,ReadablePeriod>getPeriodFromNbOfInterval;privateFunction<DateTime,DateTime>getStartOfInterval;privateTimePeriod(Dimension<Timestamp>dimension,BiFunction<DateTime,DateTime,Integer>getNumberOfPoints,Function<Integer,ReadablePeriod>getPeriodFromNbOfInterval,Function<DateTime,DateTime>getStartOfInterval){this.dimension=dimension;this.getNumberOfPoints=getNumberOfPoints;this.getPeriodFromNbOfInterval=getPeriodFromNbOfInterval;this.getStartOfInterval=getStartOfInterval;}publicDimension<Timestamp>getDimension(){returndimension;}publicintgetNumberOfPoints(DateTimefrom,DateTimeto){returngetNumberOfPoints.apply(from,to);}publicReadablePeriodgetPeriodFromNbOfInterval(intnbOfInterval){returngetPeriodFromNbOfInterval.apply(nbOfInterval);}publicDateTimegetStartOfInterval(DateTimefrom){returngetStartOfInterval.apply(from);}}

Using this enum, I was able to easily change the code to allow the user to specify the time periods for the graph data points.

This :

1
for(inti=0;i<=Days.daysBetween(from,to).getDays();i++)

became this (note that the timePeriod was passed to the method after being specified in a query param by the user) :

1
for(inti=0;i<timePeriod.getNumberOfPoints(from,to);i++)

The code behind the getGraphDataPoints call of the Usage Analytics service is now completly independant and unaware of the time period. And as a bonus, it does respect that open/closed principle I was talking about earlier.

Making an online community

$
0
0

image

As the UX design department grows at Coveo we start to get enough manpower to actually create stuff before it’s implemented (yep…). One of these concepts is for an online community.

The idea for this project came from an internal initiative to unify the search on our various websites. You’re probably familiar with how great Coveo is at this kind of problem. Basically, it’s what we do. I was tasked to create mockups for this initiative. As I was working on this, another idea came up. We have Coveo staff who are communicating directly with developers. We have online resources (a lot) that are frequently updated. Maybe we could bring all of this together. We talked about it in our meetings with the design team and soon it started consuming me. What if we could go further than only unifying the search? What if we could create a sense of an online community. Not only would people easily get access to the help they need but they could also contribute and bring their own insights on Coveo’s products. Each user would have a real sense of doing something great. My journey began.

image

image

Making an online community is not something you do over 24 hours. It’s something that has to be well thought if you want it to succeed. Those familiar with UX design know that we try to start projects with research to back up our design decisions. This part is especially important considering how big that community project can be. We all have a vague idea of how beneficial an online community can be to a business. Often, vague won’t be good enough for this kind of investment and so my first quest appeared.

Benefits 0/1

Find tangible information on the benefits of an online community

Online Community 0/1

Build an online community

Luckily, this quest wasn’t too hard. A lot of research has been done in regards to online communities. Ridings and Gefen (2004), suggest that a first category of benefits simply come from forming a social group. Easy enough, when you form a social group you can:

  • Exchange information
  • Give and receive emotional support
  • Develop friendships
  • Have fun

It makes sense, for most of you, who have friends, you can agree with them.

The second category is related to access to online information. An online community will often have 24/7 access and operation, asynchronous interactions, global geographical reach and, hopefully, some kind of history of what happened in the community. Well, all of our assumptions are confirmed and the idea of an online community looks great for a business like Coveo. But wait, there’s more. Hagel and Armstrong (1997) have also found that online communities are believed to promote customer loyalty. Of course! Customers get more personally attached to a product because the product not only represents a solution but also represents a network of online relationships. I’m a network ambassador for Frank & Oak (great clothing if you’re looking for clothes -here’s a 25$ coupon as well). Contrary to what one can believe, the community there doesn’t only speak about fashion, but also about lifestyle. We can exchange ideas about music, as well as games, or just do a meetup in a city somewhere. We also actively participate with Frank & Oak staff on some design decisions by providing feedback. This attachment makes all of us even more loyal to the brand but it also makes us feel special for being a part of it. It feels like this emotion is the right one to have in regards to a community. An emotion one would get when he is part of something greater than himself. Now I’m overstepping a bit on the next part which is how to ensure the success of an online community.

image

Benefits 1/1

Find tangible information on the benefits of an online community

Online Community 0/1

Build an online community

So we’ve identified the benefits. You can pat yourself on the back. The second big step is to make sure we actually succeed in creating an online community. Indeed, only if we’re successful in creating our online community will we get all these benefits. But wait a second, how do you measure success in an online community? Surely, it must be different than any other web applications.

Benefits 1/1

Find tangible information on the benefits of an online community

Success 0/1

Identify the criteria of success

Online Community 0/1

Build an online community

I’m going to take this quote from Iriberri & Leroy (2009):

Researchers who focus on measuring success [in an online community] agree that, the larger the volume of messages posted and the closer members feel to each other, the more successful the online community becomes.

Basically, the more an online community feels like a real one, the more successful it will be (take this dumb down sentence with a grain of salt, of course there are many other considerations when you do stuff online). One suggested approach is to build the community by stages. These stages must always take in consideration the needs of the members and the state of the community as a whole. For instance, losing a certain privilege or functionality at a stage could have very negative impact on the user experience. The first stage is the inception stage.

Benefits 1/1

Find tangible information on the benefits of an online community

Success 1/1

Identify the criteria of success

Online Community 0/1

Build an online community

image

image

This stage is composed of 5 key elements. I’m trying to keep this post short so I will go over these elements quickly.

Leitmotiv

A clear purpose will generally serve you well when building an online community. This purpose can be explicitly written in the interface (as a tagline) and will inform potential members of your community’s purpose. A good example of this (taken from Iriberri & Leroy 2009) is the tagline of slashdot.org: News for Nerds. Stuff that matters.

This strong tagline instantly defines the community and its purpose.

Focus

A focus on the users and their characteristics is also essential. The content needs to be formatted for your particular user. You can take into consideration his interests, age, gender, and even ethnicity if it matters.

Rules

It is generally okay for a community to have explicitly written rules on how to behave and other aspects of the community (such as legal ones, like being of a certain age).

Branding

Like I said earlier, a great tagline will help in this regard. You can also use design patterns that you know are appealing to your audience. For instance, with the Frank & Oak network, we can see a visual style that is similar on their network site and on their e-commerce site.

Feedback & Continuity

image

Finally, a community also needs to keep on building from its foundation. Malone & Crumlish (2009), who talk about social interfaces, suggest not to completely finish the design. Users should be able to “ finish ” some parts of the design as they use the interface. There is also a need for consistency between each version of the design, at least on some features. Not to repeat myself, but it is important to note that if a factor is present in one stage, it should continue to exist in the other stages.

Stay tuned, next week I will update this with a visual design of how this can be brought to life.

image

With all that said, now we can talk about visual design. I will spare some of the design decisions that were made (in order to keep it short) but the overall idea is to keep the Coveo brand while providing delightful details that will appeal to developers (which is the main focus of our community).

image

As we can see, the landing page presents the tagline (as said before, it is essential) with a clear sentence that further explains the purpose of the community. The logo took some care and we hid some details in order to make the interface more delightful (as you discover them). I see these as some kind of micro interactions that greatly add to the playful idea of the interface. It must feel great to be in there (bet you didn’t see the heart before).

imageimage

When a user logs in, he is brought to his profile page, which is undoubtedly called “Home”. There, he has access to updates on various Coveo sites, as well as other features (such as social media, etc.).

image

Finally, when a user goes to another user’s profile, he can see his achievements, tweets and other information such as his fellowships. The same idea of a guild except it would be automatically generated by the user’s activity on different sites. Users would then know more about each other and could share knowledge.

image

TO BE CONTINUED

It doesn’t stop there, there are many more explanations I could give on smaller features (private messaging, private information on profiles, etc.) but this post ends here.

SOURCE

CRUMLISH, C. & MALONE, E. (2009). Designing social interfaces: Principles, patterns, and practices for improving the user experience, O’Reilly Media, Inc., 518 p.

HAGEL, J. I. & ARMSTRONG, A. G. (1997). Net Gain: Expanding Markets through Virtual Communities., Harvard Business School Press, Boston, MA, 235 p.

IRIBERRI, A. & LEROY, G. (2009). « A life-cycle perspective on online community success. », ACM, Computing Surveys (CSUR), 41(2), p. 11-22.

RIDINGS, M. & GEFEN, D. (2004). « Virtual community attraction: Why people hang out online. », Journal of Computer‐Mediated Communication 10, 10 (1), p. 00-00.


Creating custom model in Coveo for Sitecore

$
0
0

Coveo for Sitecore offers various extension points to easily customize almost every part of it : indexing, queries, interfaces etc. In the developers documentation, we’re often giving examples of custom pipelines or JavaScript snippets, but what happen if you guys want to do more than that?

For instance, you may want to use a Coveo Search Component and add even more parameters that will be available from the Page Editor (or in Presentation/Details in the Content Editor). In this example, we’ll try to add a configurable placeholder text for the search box. We’ll use MVC components, but it also applies to Web Forms with slight modifications.

1. We must code!

Step 1. Install Visual Studio Step 2. Code Step 3. ???? Step 4. Profit

We’ll create a new Class Library project in Visual Studio that we’ll call CustomCoveoSearch. Let’s add references to Coveo.UI.dll and Sitecore.Mvc.dll that should be in the bin folder of your Sitecore instance.

Then, let’s add a new class in our project named CustomCoveoModel. The code should simply looks like this :

usingCoveo.UI.Mvc.Models;namespaceCustomCoveoSearch{publicclassCustomCoveoModel:SearchModel{publicstringPlaceholderText{get{returnParametersHelper.GetStringParam("PlaceholderText");}}}}

The code is fairly simple. We need a public property to ensure our new parameter is usable in the .cshtml layout. We use the ParametersHelper that comes from Coveo’s base SearchModel to get the proper value corresponding to our Sitecore item.

Build the project and add the CustomCoveoSearch.dll in the bin folder of your Sitecore instance.

2. We must create Sitecore items!

We then need to create a few Sitecore items to make it work. First step, if you open your Sitecore Content editor and navigate to the Templates section, you should see a CoveoModule/Search folder. Let’s create a new CoveoCustom folder, and duplicate the item Coveo Search Parameters in this new folder. We’ll name it Custom Coveo Search Parameters. Working with custom items will ensure that our work will not get overriden if we upgrade to a new version of Coveo for Sitecore.

image

In the Advanced field section of this item, let’s add a property named PlaceholderText with the Single-Line Text type. We’ll set the title of this field to ‘Placeholder text for search box’.

image

Then, if we navigate to Layouts/Models, we’ll also create a CoveoCustom folder, in which we’ll create a new Model named Custom Search Model. The model type should point to the type we created on step 1, which is CustomCoveoSearch.CustomCoveoModel, CustomCoveoSearch.

image

Finally, we’ll navigate to Layouts/Renderings, create again a CoveoCustom folder and then duplicate the Coveo Search View item into a new Custom Coveo Search View. We’ll need to adjust the Model, Parameters Template and Datasource Template fields to respectively point on our custom model and our custom parameters item. Let’s change the path for ‘/Views/CoveoCustom/CustomSearchView.cshtml’ as well, we’ll setup that in step 3.

imageimage

Make sure to save and publish all these new items.

3. We must create a proper cshtml!

In the Website folder of your Sitecore instance, you should have a SearchView.cshtml file located under ‘/Views/Coveo’. Duplicate that into the path you just set at the end of step 2, probably ‘/Views/CoveoCustom/CustomSearchView.cshtml’. Copy the Web.config file from the first folder to your new folder (author of these lines actually forgot this step and lost some time troubleshooting why the example was not working…) Open the ‘CustomSearchView.cshtml’ file with your favorite text editor.

Look for the following tag at the end of the file :

<script type="text/javascript">Coveo.$(function(){Coveo.$('#search').coveoForSitecore('init',CoveoForSitecore.componentsOptions);});</script>

And replace it by :

<script type="text/javascript">Coveo.$(function(){Coveo.$('#search').coveoForSitecore('init',CoveoForSitecore.componentsOptions);Coveo.$('#search').find("input.CoveoQueryBox").attr("placeholder",'@Model.PlaceholderText');});</script>

This will make the link between our new model and the JSUI search box. Save and close this file.

4. We must test it out!

Let’s create a new Coveo Search Page (MVC) under /sitecore/content/home. Go into Presentation/Details and replace the Coveo Search View by our new Custom Coveo Search View. (Obviously, you could also try it in any item you want, just make sure you have a Coveo Search View Resources rendering in your layout as well) Just make sure that the placeholder is coveo-search-mvc.

image

Save and close the Presentation/Details window. Let’s now open our item in the Page Editor! In the properties of the view, we should see our placeholder text property :

image

What’s magical is that it actually works! If we enter something as a placeholder and save the item, it displays our new value :

image

5. We must conclude!

Lots of steps for a small result you may say! But like i said earlier, all those duplicates are actually really important, because your work will not get affected by the upgrades of Coveo for Sitecore. Those kind of customizations can actually give a nice bonus to your end users :)

Thanks for reading!

image

Searching Jekyll

$
0
0

This blog is powered by Jekyll, a static site generator. There is no database, nothing to search against, the site is simply composed of a few HTML pages referencing each other. Being search experts, we can leverage Coveo to add powerful search to Jekyll.

Indexing the content

Since we are dealing with a structured site containing static HTML pages, our Web Pages Connector will get the job done.

In addition to creating a new Web Pages Connector source to index the blog, we created a custom field to have facetted tags on the search result page.

“Installing” the Coveo JavaScript Search Framework in Jekyll

The JavaScript Search Framework is simply a zip file containing image, css and JavaScript files. For this straightforward integration, we added the following files to the Jekyll folder structure:

  • css/CoveoFullSearch.css
  • js/CoveoJsSearch.WithDependencies.min.js
  • js/d3.min.js
  • images/sprites.png
  • images/wait.gif
  • images/wait_circle_small.gif
  • images/wait_facet.gif
  • images/wait_facet_search.gif

Adding the search page

Each Jekyll page is rendered using a layout, we have two:

  • base
  • post

We created a new page at the root of the site: search.html

This page is using the base layout, just like the home page, and contains the following:

---
layout: base
---

<article class="container">
  <div id="search" class="CoveoSearchInterface" data-enable-history="true">
    <span class="CoveoAnalytics" data-search-hub="Blog" data-token="16aa3ce3-d77a-4c25-97a2-18be59347d9e" data-user="coveokmua@coveo.com" />
      <div class="coveo-tab-section">
        <a class="CoveoTab"
          data-id="Blog"
          data-caption="Blog"
          data-icon="coveo-sprites-documentType-all-content"
        data-expression='@syssource==BlogTech'></a>
      </div>
      <div class="coveo-results-section">
        <div class="coveo-facet-column">
          <div data-tab="Blog">
            <div class="CoveoFacet" data-title="Author" data-field="@sysauthor" data-show-icon="true"></div>
          </div>
          <div class="CoveoFacet" data-title="Tags" data-field="@tags" data-is-multi-value-field="true"></div>
        </div>
        <div class="coveo-results-column">
          <div class="CoveoShareQuery"></div>
          <div class="CoveoPreferencesPanel">
            <div class="CoveoResultsPreferences"></div>
            <div class="CoveoResultsFiltersPreferences"></div>
          </div>
          <div class="CoveoBreadcrumb"></div>
          <div class="coveo-results-header">
            <div class="coveo-summary-section">
              <span class="CoveoQuerySummary"></span>
              <span class="CoveoQueryDuration"></span>
            </div>
            <div class="coveo-sort-section">
              <span class="CoveoSort" data-sort-criteria="relevancy">Relevance</span>
              <span class="CoveoSort" data-sort-criteria="date descending,date ascending">Date</span>
            </div>
            <div class='coveo-clear'></div>
          </div>
          <div class="CoveoHiddenQuery"></div>
          <div class="CoveoDidYouMean"></div>
          <div class="CoveoErrorReport" data-pop-up='false'></div>
          <div class="CoveoResultList" data-wait-animation="fade">
            <script id="Default" type="text/x-underscore-template">
              <%= fromFileTypeToIcon(obj) %>
              <div class="coveo-date"><%-dateTime(raw.sysdate)%></div>
              <div class="coveo-title">
              <a class="CoveoResultLink" target="_blank"><%=title?highlight(title, titleHighlights):clickUri%></a>
              <%= loadTemplate("DefaultQuickView") %>
              </div>
              <div class="coveo-excerpt">
              <%=highlight(excerpt, excerptHighlights)%>
              </div>
              <div class="CoveoPrintableUri"></div>
              <table class="CoveoFieldTable">
              <tr data-field="@sysauthor" data-caption="Author"></tr>
              </table>
            </script>
            <script id="DefaultQuickView" type="text/x-underscore-template">
              <div class="CoveoQuickView" data-title="<%= attrEncode(fromFileTypeToIcon(obj) + title) %>" data-fixed="true" data-template-id="DefaultQuickViewContent">
              </div>
            </script>
            <script id="DefaultQuickViewContent" type="text/x-underscore-template">
              <div class="coveo-quick-view-header">
              <table class="CoveoFieldTable">
              <tr data-field="@sysdate" data-caption="Date" data-helper="dateTime"></tr>
              <tr data-field="@objecttype" data-caption="Type"></tr>
              </table>
              </div>
              <div class="CoveoQuickViewDocument"></div>
            </script>
            <script class="result-template" type="text/x-underscore-template">
              <%=
              loadTemplates({
              'Default' : 'default'
              })
              %>
            </script>
          </div>
          <div class="CoveoPager"></div>
        </div>
      </div>
      <div style="clear:both;"></div>
    </div>
  </article>

The code above results in a simple search page with two facets:

  • Author (out of the box, no custom field is needed)
  • Tags (based on the tags custom field we created earlier)

Searching from anywhere on the blog

We wanted to have a search box on every page. We proceeded to change our two layout files to include the following HTML element in the header:

<divid="searchBox"class="CoveoSearchBox"data-activate-omnibox="true"></div>

For the search box and the search page to work, the JavaScript Search Framework resources must be included in all the pages of the blog. These two lines were also added to the base and post layouts:

{% include _search.html %}
<linkrel="stylesheet"href="/css/CoveoFullSearch.css"/>

As you can see, we created the _search.html file in the _includesJekyll folder to centralize the JavaScript dependencies, here is what it contains:

<script src="/js/d3.min.js"></script><script src="/js/CoveoJsSearch.WithDependencies.min.js"></script><script type="text/javascript">$(function(){Coveo.Rest.SearchEndpoint.endpoints["default"]=newCoveo.Rest.SearchEndpoint({restUri:'https://developers.coveo.com/coveorest/search',anonymous:true});if(document.location.pathname!=="/search/"){$('#searchBox').coveo('initSearchBox','/search');}elseif(document.location.pathname=="/search/"){varmySearchBox=$("#searchBox");$("#search").coveo("init",{externalComponents:[mySearchBox]})}});</script>

With the Search Endpoint configured in our _search.html file, the search page was already working. However, we had two problems:

  • the search interface was initiated two times
  • the search box was only working on the search page itself

To avoid the double init problem, we added logic to prevent the standalone search box from initializing in the search page. If we are not on the search page, we initialize the search box directly and give it the path of our search page, i.e. /search:

if(document.location.pathname!=="/search/"){$('#searchBox').coveo('initSearchBox','/search');}

If however we are on the search page, we pass the mySearchBox variable as an externalComponents to the search page:

elseif(document.location.pathname=="/search/"){varmySearchBox=$("#searchBox");$("#search").coveo("init",{externalComponents:[mySearchBox]})}

You can learn more about the Standalone Search Box possibilities in our documentation.

And that’s how you integrate search in Jekyll using the JavaScript Search Framework.

The tale of a programming contest

$
0
0

It all began…

It is in September of 2010 that the idea of doing a programming contest came to life. Coveo was a startup with highly passionate employees, ready to grow and bring more passionate people on board.

Did I say passionate?

Our primary source for hiring talented and passionate - I know, but it’s really important to us - software developers was (and still is) universities. We bring students on board during an internship and show them what it’s like to work with us - the challenges, the environment, the people, … - it usually works really well and we end up hiring almost 33% of the interns.

Back in 2010, Coveo was little known in Quebec City, let alone Montreal or Sherbrooke. We had to find a way to spread the word on how Coveo is such a great place to work at.

I was at the coffee machine when Jean (our CFO) came in and asked:

Jean: How’s hiring doing?

Marc: It’s been better. Our last internship posting had only got 4 applicants and we will be lucky to get 1 intern.

Jean: What can we do to get more applicants, to get our name out there?

Marc: *joking* We should do a programming contest for students!

Jean: What a great idea - let’s do that!

Ok, that was not totally a joke, but not totally serious either! The idea came from a programming contest held in Quebec City called “Le Bivouac Urbain”, where contestants had to create a full video game in 48 hours.

Defining the contest

The decision to do a programming contest was made - yeah, that’s really all it took to start the project! But we still needed to do a lot of things to make it a reality and the most important one was defining the challenge.

But first, let’s talk about the name, because this contest needed a name. Since I’m not that good to come up with names, I asked all the people from our R&D team (27 people back then) for suggestions. Here’s a few I received:

  • Programmathon Coveo

  • Coveo 8 hours of Search

  • Coveo Search Blitz

  • CoveoPalooza

  • Coveo Challenge

No surprise here, we decided on Coveo Blitz.

Now, back to the challenge itself. At first, we floated the idea of doing a 24 hour contest where students had to create the best web site they could based on the criteria we provided. We brainstormed around that idea to finally change it completely and tie it a lot more to our business.

We wanted people to know that creating a search engine that works is challenging. Here’s what was going to be the first project for Coveo Blitz:

  • Create a search engine to index 45,000 documents (48,223 documents to be exact)

  • Support boolean operators and exact phrases

  • Rank documents by relevance

  • Create a web page to display the results

The final score would be based on several criteria: performance, code quality, quality & design of the UI, and of course, results precision. The correction would be manual, meaning that Coveo’s staff would evaluate the different aspects of the project.

Preparing the day

Our objective was to get 5 or 6 teams of 4 students. Based on the limited success we had attracting students, achieving that goal would definitely help spreading the word about Coveo. We reached to the different universities & colleges to promote the contest. We received more than 12 registrations in a few weeks! We were really excited, but at the same time, …

That was a problem, since we initially wanted to host the contest in our office, but we didn’t have a room large enough for all these people + staff! We fixed this through a fantastic collaboration with Laval University that offered to rent us a classroom. We decided to hold the contest there. We also limited the contest to 8 teams and asked all team to provide a small demo project to classify. I evaluated all those projects and selected the 8 teams.

One of the teams presented an algorithm to compute stuff I can’t remember, but I sure remember that they put a nice image while the computation was running.

Image presented while the algorithm was computing

We wanted the contest to be one of the best experiences the students would have in such an event so that they would have a highly positive opinion of Coveo and its people. First thing we did was to take care of feeding the participants, partly dressing them and teasing them with the prize for the winning team. Here’s a list of things we provided:

  • Free drinks (juice, water, soda and of course coffee and Red Bull)

  • Free food (breakfast, lunch, dinner, snacks)

  • Free beer - yeah, for real!

  • 4 Alienware laptops for the winners

  • T-Shirts with nice quotes - that’s also kind of a publicity for us afterwards as shown in this tweet

One participant tweeting about his Coveo Blitz T-Shirt

Coveo Blitz 2011 T-Shirt (Black for students we hired, blue for staff & white for students)

Since the contest was based on performance, we wanted to make sure that everyone had the same chance at winning the contest. We decided to provide laptops for all contestants - that meant 32 laptops to buy, configure, move, install, move back and sell! I know, I know! At the time, it seemed like a good idea, but in hindsight, that was a bad idea and definitely to put in the don’ts list!

We spent a number of hours preparing the room the day before the contest and cleaning it after the contest. We brought a router and quite a few number of network cables & electric extensions.

A lot of people at Coveo got excited by Coveo Blitz and helped make it a reality. I can’t stress enough that without their help, Coveo Blitz would not be possible. And that is true, year after year, so THANK YOU!

The router and cables

The room and participants

Lessons learned

We learned several things by doing all these editions of Coveo Blitz. Here are the most important ones, at least, for me!

You are never too prepared

We are now at our fifth edition of Coveo Blitz and I can tell you that after every edition, we told ourselves that the following year, we’ll be more prepared. We start working on the next edition several months in advance, often 8-10 months before, but you need more than enough time to succeed. Because as says the Parkinson’s law: work expands so as to fill the time available for its completion.

I suggest to take the preparation of a contest like any other project and set milestones. Make sure you respect those, otherwise, you may end up like this on the day of the contest!

Coveo’s version of Grumpy Cat

Favor automatic correction over manual

We planned about 60 minutes to correct the team solutions in the first edition and it took more than 3 hours! Also, automatic correction opens a lot of opportunities - see below.

Multiple evaluation runs with dynamic scoring/dashboard

If you ask anyone who participated to Coveo Blitz, starting in 2013, what was the highlights, I bet they will tell you “dashboard & scoring”.

Being able to do multiple evaluation runs with a live dashboard displaying real-time score as the run is executed is just awesome! It’s like watching a sport game, everyone is standing and cheering!

An evaluation run at Coveo Blitz 2013

Host the event in your office

If possible, hosting the event in your office is a plus. At least, it was for us.

  • It shows the candidates your work environment and gives them a feel of what it might be like to work there - although we’re not always working on 8 hours deadline!

  • It is easier to setup since all the infrastructure is there - network, desk, chairs, …

  • You can prepare in advance since it’s in your office

Have people bring their own hardware

I said it earlier, do not supply the computers for the people. Have them bring their own laptops, desktops or mainframe, whatever they want. When you do the evaluation, use a method that is not based on their hardware to compute the score. For example, we now use Amazon Web Services to run the contest and do evaluation runs, which we can somewhat control (by machine type, for example).

Innovate

We tried to innovate at every edition of Coveo Blitz by modifying the challenges but also by modifying the infrastructure - dashboards or achievements, for example.

We also have a few surprises for the 2015 edition and it started when we launched the web site. We added a small challenge to register to the contest. Students had to create a small REST API that replied a proper answer along with all the information about the people in their team. We had fun creating the error messages for wrong response, had you noticed them? :)

One of the teams built a server on a Nintendo DS - How cool is that?!

Free food & beer

I think that giving people free food & drink during the contest is a plus. We also have free beer to celebrate at the end of the day, and a tower of pizzas. Everyone appreciates this and I bet the Coveo team appreciates the beer even more than the students: celebrating the success of the contest after having worked on this single day for the last several months is a must!

This is also the moment where everyone can talk together and where we make contact with the students. We can talk about what the work is like at Coveo and also talk about current opportunities. Remember that the #1 reason for the contest is recruiting.

The pizzas at Coveo Blitz 2013!

Fun facts of the 2012 edition

  • 48 participants

  • +30 Coveo employees worked on this edition

  • +600 hours invested by Coveo employees

  • 166 tweets related to this edition

  • 64 t-shirts

  • 35 tables & + 60 chairs

  • 657 meters of network cable

  • 171 meters of electric extension

  • 16 network switches

  • 15 power bar & 3 UPS

  • 1 wifi network

  • 60 laptops & 2 desktop

  • 1 daskeyboard ultimate

  • 1 whiteboard

  • 4,900 lines of code for the Coveo platform

  • A 7GB database containing 10,715 products

  • Around 200 evaluation runs

  • 10 x 12 donuts

  • 6 x 12 muffins

  • 50 litres of coffee

  • 60 litres of juice, water & soda + 7 litres of Red Bull

  • 100+ sandwiches

  • 200 slices of pizza

  • 407 pictures

  • 2 hours of video

In conclusion

The first edition was a success and we got a few people hired out of that edition. We also got the word out, if not about how cool Coveo was, at least how free the beer was at the contest! :)

We hired more than 35 people in R&D since the first edition of Coveo Blitz. Out of those 35, 13 participated in Coveo Blitz and 32 learned about Coveo from Coveo Blitz. Another impact is the number of interns we get in R&D every year. In the summer of 2013 we welcomed 19 interns and 15 in the summer 2014. All of them participated or heard from friends about Coveo Blitz.

If you know me, you know I really like pie charts, so let’s try to represent these numbers in a nice pie chart!

When I see that most people I have in interviews know about Coveo Blitz or probably heard about Coveo from Blitz, I tell myself: “Mission accomplished!”, and, I’m always happy to see Coveo Blitz mentioned in someone’s resume contest section.

In just a few weeks, it’s going to be the fifth edition of Coveo Blitz. The Coveo team can’t wait to live that. Several people have been working on the project since last March - yes, it takes time to build the challenges, test them and create all these nice dashboards!

I hope to see you then!

Logos from all Coveo Blitz edition to date

Distributed resource locking using memcached

$
0
0

Small update: Having noticed comments about this post on Twitter, I think it’s important to specify that the word “lock” might have been badly chosen here. It’s more of a best-effort thing to reduce the frequency of situations where certain operations would need to be retried. It’s perfectly fine if for any reason those still occur concurrently (indeed before there was no locking whatsoever). The only adverse effect would be slightly decreased performance for the querying user. So, kids, do not use this approach if you need real resource exclusion.

As Coveo’s cloud usage analytics product matures, more and more events are logged every seconds and a lot more people are using the analytics dashboards from the cloud admin. That increased load is great and the usage analytics handles it easily, but there was one thing we did not see coming: transaction cycles in the database. They did not happen often, but this was still problematic as continuous increase in the load on the service only meant more transaction cycles. These cycles were the result of scheduled jobs running when insertion of new events and reporting queries occurred at the same time.

Amazon Redshift is already doing a pretty good job at handling concurrent operations on the cluster but we needed a little more, so we decided to go with resource locking (where the resources are the different tables in the database). There are a lot of ways to implement such a solution, but we also had some constraints :

  • Our service is running on multiple instances, so we needed a distributed locking mechanism.
  • The resource locking had to be transparent to read operations (ie. no impact on reporting queries).
  • Deadlocks were not acceptable.
  • Locks had to have a short lifespan. We did not want to lock insertions and updates for an extended period of time.

With these in mind, we quickly discarded some solutions like the LOCK feature offered by Amazon Redshift because it impacted all queries, from the simple select 1; to the complicated triple full outer join of doom, without forgetting any inserts or updates.

After consideration, we were left with two possible solutions :

  • Sharing locks using a cache service (in our case, a Memcached cluster provided by Amazon Elasticache)
  • Sharing locks using a table in the database

We finally decided to go with the cache service, mainly because of the timeout capabilities that would allow us to easily circumvent the deadlock issue, better performance and it was much simpler to implement than the database option.

Here is what it looks like :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
publicclassMemcachedResourceLockerimplementsResourceLocker{privateCacheServiceresourceLockCache;privateString[]resourceId;privateStringlockerId;privateDurationlockDuration;publicMemcachedResourceLocker(CacheServiceresourceLockCache,String[]resourceId,DurationlockDuration){this.resourceLockCache=resourceLockCache;this.resourceId=resourceId;this.lockDuration=lockDuration;try{this.lockerId=InetAddress.getLocalHost().getHostName()+"-"+Thread.currentThread().getName()+"-"+DateTime.now().toString();}catch(UnknownHostExceptione){this.lockerId="unknown_service_host"+"-"+Thread.currentThread().getName()+"-"+DateTime.now().toString();}}@Overridepublicvoidlock()throwsCouldNotAcquireLockException{if(!resourceLockCache.add(RESOURCE_LOCK_CACHE_PREFIX,resourceId,lockerId,lockDuration.toStandardSeconds().getSeconds())){logger.debug("Could not acquire lock on resource '{}'. Someone else has it.",Arrays.toString(resourceId));thrownewCouldNotAcquireLockException(Arrays.toString(resourceId));}}@Overridepublicvoidunlock(){StringcurrentLockerId=resourceLockCache.get(RESOURCE_LOCK_CACHE_PREFIX,resourceId,String.class);if(currentLockerId!=null&&currentLockerId.equals(lockerId)){resourceLockCache.delete(RESOURCE_LOCK_CACHE_PREFIX,resourceId);}}}

Note: The ResourceLocker interface was not included to keep the code to a minimum. It is a simple interface that includes the lock() and unlock() methods.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
publicclassRetryableLockResourceextendsAbstractRetryableTask{privateResourceLockerresourceLocker;publicRetryableLockResource(Durationdelay,intmaxRetry,ResourceLockerresourceLocker){super(delay,maxRetry);this.resourceLocker=resourceLocker;}@Overrideprotectedvoidcall()throwsRetryableTaskException{try{resourceLocker.lock();}catch(CouldNotAcquireLockExceptione){thrownewRetryableTaskException(e);}}publicstaticvoidtryLock(ResourceLockerresourceLocker)throwsRetryableTaskFailedException{newRetryableLockResource(resourceLocker).execute();}}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
publicabstractclassAbstractRetryableTask{privateDurationdelay;privateintmaxRetry;publicAbstractRetryableTask(Durationdelay,intmaxRetry){this.delay=delay;this.maxRetry=maxRetry;}protectedabstractvoidcall()throwsRetryableTaskException;protectedbooleanhandle(RetryableTaskExceptione){returnfalse;}publicvoidexecute()throwsRetryableTaskFailedException{booleanexit=false;for(intiteration=1;!exit;iteration++){try{call();exit=true;}catch(RetryableTaskExceptione){if(iteration>=maxRetry){thrownewRetryableTaskFailedException(e);}else{exit=handle(e);if(!exit){ThreadUtils.sleepNoThrow(delay);}}}}}}

Here is an example of how we use it in an actual “real code” situation in a scheduled job (scheduling is done with Quartz:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
publicvoidexecute(JobExecutionContextcontext)throwsJobExecutionException{Stringaccount=context.getJobDetail().getJobDataMap().getString(SchedulerWrapper.ACCOUNT_NAME_PARAM);ResourceLockerallEventsLocker=null;try{allEventsLocker=resourceLockerFactory.createLockerForAllEvents(account);RetryableLockResource.tryLock(allEventsLocker);dal.updateAllEvents(account);}catch(RuntimeException|RetryableTaskFailedException|FailedToRetrieveDimensionExceptione){thrownewJobExecutionException(e);}finally{RetryableUnlockResource.tryUnlock(allEventsLocker);}}

This solution works great. Since we implemented our distributed resource locking using Memcached, we increased the number of scheduled jobs and the load on the service has more than doubled, but we have not seen any transaction cycle or performance degradation, which is nice.

Distributed resource locking using memcached - part 2

$
0
0

Following some strong reactions about my last post (that were caused, I believe, by a poor use of the word lock on my part), I decided to write a little follow up post to remedy the situation. I will explain why what we wanted was to avoid duplicate work, not prevent it at all cost1.

Let’s start by taking a look at the definition of a lock. Wikipedia says :

In computer science, a lock is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution. A lock is designed to enforce a mutual exclusion concurrency control policy.

Locking resources using memcached respects the first part of that definition. It does enforce limits regarding access to a resource. The issue is that it does not respect the second part of the definition (which is kind of the important part of the definition). Mutual exclusion cannot be achieved using memcached because the information about the lock is not persistent, there could be synchronization problems when memcached in running on a cluster composed of multiple instances, lock state could be lost in case of network partition problems, etc. What would happen if, for instance, the Memcached server crashed or an entity held the lock for a long time and the cache entry expired? It would be as if the resource was never locked and concurrent operations on a critical section could (and, according to Murphy’s law, would) happen. That kind of locking is called ghetto locking by the fine people of Memcached. We could call it probabilistic locking or resource access control.

This brings me to the second confusing point of my previous post : the use case I described was, well, not so well described. I will try to explain it better here. We use Amazon Redshift for our analytics application in which a lot of data that is rarely updated (some flag values are updated according to what was added after) is inserted. This data is exposed via an API and is then used to generate very customizable reports to help our customers improve their search experience. We encountered a problem called transaction cycles. These are caused by concurrent transactions touching multiple tables in a database. For example, a scheduled task updating table A based on data contained in table B, a process inserting events in the table B and an api query joining tables A and B. The transaction cycle would cause the queries to fail, which is quite problematic. Because of our fault tolerance mechanisms, the operations would eventually succeed : the scheduled task would be rescheduled later and the insertion would be retried immediately after a failure is detected. Redshift already does an amazing job managing concurrent write operations on his cluster, but if the timing was perfect, these transaction cycles could still happen. That is why we needed some external mechanisms to avoid falling into these annoying traps.

We decided to go with ghetto locking because it met the needs I described in my original post (distributed, transparent to read operations, no deadlocks and short-lived locks). Also, we did not want to add another building block to our stack (which would have been required if we wanted persistent lock in a database, or using distributed synchronization systems such as Zookeeper) as it necessarily introduces more risks. The persistent lock in a database would require an additional component to the application because database we use has strong limitations when it comes to locks. As a matter of fact, the only lock supported is an access exclusive lock on a table. Also, from what we experienced with Redshift, numerous small queries on the database (which would have been required to check the locks states) had a negative impact on the overall performance of the cluster, making using a table to persist locks information undesirable.

As I said in introduction, what we truly wanted to achieve was to avoid duplicate work, not prevent it at all cost. With this solution, we achieved exactly that. We moved from 2 or 3 transaction cycles per week to no transaction cycle since it reached the production environment.

  1. Special thanks to M. Sean Cribbs for this way of saying it

Viewing all 202 articles
Browse latest View live