PIUG Annual Conference 2017

Our abstract proposal, Single Pass Numerical Matching  A linguistic solution to numerical searching.”  has been accepted by PIUG, so we will be off to Atlanta, Georgia in May.  You can find the published abstract together with other the other speakers named so far by clicking the PIUG link above.  You can see the full proposal by clicking on the title above.

This arose from meeting David Goodchild while we were in Madrid. David's company, David Goodchild Ltd,  specialises in patent information for the metals industries and he had a particular problem searching for ranges of the elements used in alloys which are specified in patent claims.  We worked together to produce a solution, and decided this was of general interest to the patent search community, so submitted a co-authored proposal.

One very important feature not explicitly in the abstract is the speed of response. The file size of the full report is  3.5mb, processing 8652 claims containing one or more of the search specification elements, from 1400 patents in a single file of 7.5mb, in 10 seconds.    No index is used, so the coverage is exhaustive, with every claim in each patent being read in a linear fashion, as we do as humans.

The 1400 patents in the input data are being collected using Questel Orbit by David Goodchild, using regular keyword/boolean queries and an output specification, saved in UTF-8.

The input query is a simple UTF-8 plain text file. It contains this:


No further query construction is needed.

The process is linear, so the time taken is dependent on three elements:

  1. The complexity of the specification
  2. The number and length of the claims in each patent
  3. The quantity of matching to record and display.


II-SDV Conference 2017 Nice

We attended the 2017 II-SDV Conference in Nice, 24 - 25 April 2017.  We are going to catch up with the latest developments and also taking with us our latest development, SpanMatch, which is what we will be talking about in Atlanta next month.  This is a numerical search program specifically aimed at finding and calculating ranges against a given specification.

We developed it in association with David Goodchild, of David Goodchild Limited, whom we met in Madrid.  We extended the initial English language version to cover Chinese, Japanese and Korean when we saw the state of some of the machine translations into English.  You can find illustrations of it in use on the Patent Program tab.  

We also discussed possible contributions we can make to the improvement of machine translation from our background in multilingual concordancing.


This was our first visit to an EPOPIC.  We had done some consultancy work in the patent area and thought that our methodologies and presentation style would be of interest to the patent search community, so decided to go and ask them.  In general the answer appears to be "Yes".  

We attended Discussion Group 7, Freedom-to-operate, and were pleased to note at the outset that three items on the wish list of the questionnaire were met by our program, ability to highlight search phrases instead of only words, ability to save users' highlighting of text in descriptions and claims, and mumti-screen viewing to compare documents or parts of a document.  

The discussion group was run in such a way that each table of 7 tackled 2 of the 5 assignments set by the chair, Susanne Hantos.  As the designated note taker I had the privilege of hearing  12 professionals giving their take on the problems of handling a) geographical scope and b) inventions versus designs with great clarity and organisation in a time frame of 20 minutes for each team of 6.   

From the conference as a whole it became clear that FTO was a very hard search task but equally a vitally important one for companies to ensure that their R&D efforts were geared towards products that had a chance of making it to market.  It was also clear that, although there had been huge advances in machine translation, this was still an area where their was a lack of confidence in the accuracy of the current tools, particularly over extended passages.

As CFL works in both the document comparison area and multilingual concordancing we think we should be able to make further contributions to the power and accuracy of FTO search tools. 


We took our latest program with us.  CFL Patent PIPEline includes several distinctive features in the area of patent searching, not least the complete absence of a need for keyword/boolean searching.

You can take a look at the screenshots we showed in Madrid, to much interest from patent searchers, by choosing it from the Programs tab.  Patent PIPELine is an API, so these screenshots are indicative of what can be done with the output.  The programs ingest UTF-8 plain text and can return HTML, XML, JSON or whatever is desired by an end user.  The output displayed on page 2 of the PDF is actually working directly from the XML source files where the matched patents are found.

The report illustrated in the header is the result of starting with the ultrasonic surgical shears illustrative sample from the USPTO Examination Guide. The program is not using the title, but all the words in the sample.  The numbers on the left are the number of terms it has found in common between the sample and each document shown.  If this had been your idea, then you would quickly find out that Ethicon Endo-Surgery, Inc. had got there before you!  

Here are some of the other features:

  • Search input is normally a description of the invention or an outline application, although it is possible to use a patent or application number instead.
  • Automatic identification of all terminology in the search document.
  • Searching on all those terms; in  a full patent this can be over 2000 terms.
  • Results ranked by number of search terms found in the full patent.
  • Filters can be applied for classification codes or parts of patents.
  • Extended report giving primary classification, assignee and title in addition to the patent number and date of publication.
  • Comparison of each matched patent with the search document.
  • Parallel display of the matched sentences, with location in the patent identified (Abstract, Brief Summary, Claims, Description).
  • Automatic collection of the citations for each matched patent.
  • Ability to compare any of the citations with the search description, or the citing patent, with parallel display of the results as for patents.
  • Comparative vocabulary use for each comparison pair.
  • Unified operation across US and EPO datasets.
  • Searching in original languages, including Chinese.