Our abstract proposal, Single Pass Numerical Matching A linguistic solution to numerical searching.” has been accepted by PIUG, so we will be off to Atlanta, Georgia in May. You can find the published abstract together with other the other speakers named so far by clicking the PIUG link above. You can see the full proposal by clicking on the title above.
This arose from meeting David Goodchild while we were in Madrid. David's company, David Goodchild Ltd, specialises in patent information for the metals industries and he had a particular problem searching for ranges of the elements used in alloys which are specified in patent claims. We worked together to produce a solution, and decided this was of general interest to the patent search community, so submitted a co-authored proposal.
One very important feature not explicitly in the abstract is the speed of response. The file size of the full report is 3.5mb, processing 8652 claims containing one or more of the search specification elements, from 1400 patents in a single file of 7.5mb, in 10 seconds. No index is used, so the coverage is exhaustive, with every claim in each patent being read in a linear fashion, as we do as humans.
The 1400 patents in the input data are being collected using Questel Orbit by David Goodchild, using regular keyword/boolean queries and an output specification, saved in UTF-8.
The input query is a simple UTF-8 plain text file. It contains this:
No further query construction is needed.
The process is linear, so the time taken is dependent on three elements:
- The complexity of the specification
- The number and length of the claims in each patent
- The quantity of matching to record and display.
We attended ECOPIC 2016 in Madrid this November, taking our latest program with us. CFL Patent PIPEline includes several distinctive features in the area of patent searching, not least the complete absence of a need for keyword/boolean searching.
You can take a look at the screenshots we showed in Madrid, to much interest from patent searchers, by clicking here. Patent PIPELine is an API, so these screenshots are indicative of what can be done with the output. The programs ingest UTF-8 plain text and can return HTML, XML, JSON or whatever is desired by an end user. The output displayed on page 2 of the PDF is actually working directly from the XML source files where the matched patents are found.
The report illustrated in the header is the result of starting with the ultrasonic surgical shears illustrative sample from the USPTO Examination Guide. The program is not using the title, but all the words in the sample. The numbers on the left are the number of terms it has found in common between the sample and each document shown. If this had been your idea, then you would quickly find out that Ethicon Endo-Surgery, Inc. had got there before you!
Here are some of the other features:
- Search input is normally a description of the invention or an outline application, although it is possible to use a patent or application number instead.
- Automatic identification of all terminology in the search document.
- Searching on all those terms; in a full patent this can be over 2000 terms.
- Results ranked by number of search terms found in the full patent.
- Filters can be applied for classification codes or parts of patents.
- Extended report giving primary classification, assignee and title in addition to the patent number and date of publication.
- Comparison of each matched patent with the search document.
- Parallel display of the matched sentences, with location in the patent identified (Abstract, Brief Summary, Claims, Description).
- Automatic collection of the citations for each matched patent.
- Ability to compare any of the citations with the search description, or the citing patent, with parallel display of the results as for patents.
- Comparative vocabulary use for each comparison pair.
- Unified operation across US and EPO datasets.
- Searching in original languages, including Chinese.