IPCCAT Categorizer Help

 

 

The following text explains briefly, how the categorizer works and what the various on-screen fields are.

 

Select One of the Following:

 

IPCCAT Categorizer Help

1.     Browser Compatibility

2.     What is this tool for?

3.     What is this tool not for?

4.     How does it Work?

5.     How do I use it?

6.     How do I input text/patent information?

7.     Do I have to specify a document language?

8.     What are 'Predicted Categories'?

9.     What is the 'Confidence' indicator for?

10.       What happens if I select 'Refine'?

11.       Example Usage

12.       IPC Coverage and precision

13.       Limitations

14.       Further information on WIPO Computer-Assisted Categorization of Patent Documents in the IPC

15.       IPCCAT Web Service

 

If your questions are not answered here please Contact Us and we will do our best to answer them.

 

 

1.         Browser Compatibility

 

Browser compatibility was tested for Internet Explorer 7, 8, 9 and Firefox 5. For Internet Explorer 10 and 11, add https://www3.wipo.int/ipccat/ to the “Compatibility View settings”.

 

2.         What is this tool for?

 

This tool has been primarily designed for small and medium sized patent offices to assist them in classifying applications according to the International Patent Classification (IPC). The tool itself provides predictions based on a given text, at Class, Sub-Class or Main Group levels.

 

For further information regarding the classification of patents please consult the IPC documentation.

 

3.         What is this tool not for?

The categorizer has been designed with the concept of full phrases describing the technical subject matter and is a classification aid, not as a keyword search tool.
While the categorizer will try to perform its task even with a limited number of keywords as input the results obtained are to be taken cautiously (see Limitations).

 

4.         How does it Work?

 

  1. The user enters a section of text or imports a text file that contains the patent information to be categorized.
  2. The user defines how many predictions the tool is allowed to make and chooses the level of classification,
  3. When the user clicks the 'Classify' button, the subject and meaning of the imported / entered text are examined in order to select important words and phrases which could lead to a proper classification. By default the classification is performed at the subclass level, unless otherwise specified by the user. The system allows for further refinement by selection of the "Refine" double arrow behind a prediction. The system will then refine its predictions within the selected category. Each prediction returned by the system has a level of confidence.

 

 

5.         How do I use it?

 

This tool has been designed for new users, so the actual operation of the tool is very simple.
An example of how to use the tool follows:

o   Insert text into the tool either by:
i) Clicking the "Browse" button to upload a text file (e.g. a patent application) or;
ii) Typing or pasting text (e.g. from a patent application) into the largest (central) text box.

o   Select the number of predictions that the tool should make.

o   Select the IPC level at which the system should make predictions (Class, Sub-Class, Main Group). The default level is Sub-Class.

o   Click on the 'Classify' button.


The tool will now attempt to predict which categories the patent application should be assigned to, based on the text supplied by the user. Once the system has completed its calculations a new screen will be displayed containing initial predictions and the related confidence. At this point the user may request a further refinement down to Main Group level by clicking on a "Refine" double arrow on the IPC prediction line. The system will then display 3 predictions for Main Groups within the chosen subclass.

The user may further choose to classify directly at main group level by clicking on the Main Group button under "Change Classification Level".
Important: this classification feature is different from the "Refine" one because by clicking on the Main Group button under "Change Classification Level" the user asks the system to categorize directly at main group level, as if it had not made any previous prediction at any other level before. The user may also ask for predictions at coarser levels by pressing on the class or subclass buttons.

 

 

6.         How do I input text/patent information?

 

There are three methods for entering text to be categorized into IPCCAT:


People doing patent classification may have at their disposal a patent document in electronic form.

o   The text file of this patent document can be uploaded through the "Browse" button.
The uploaded file can be in any of the following formats: Word (doc, wri, rtf, dot), Adobe PDF (pdf) and Plain Text (txt, asc, ans).
Warning: Plain Text (txt) documents must use either the ANSI character set (as produced by default in MS Windows) or the UTF-8 format (as saved by Notepad when UTF-8 output is selected).

 

o   If the file is not available but the text can be copied, the second method is to paste it into the largest (central) text box. This method allows the user to import text from non-supported file formats.

 

If the text to be categorized cannot be captured in on of the above methods:

o   The third method is to manually type the text into the largest (central) text box.

 

 

7.         Do I have to specify a document language?

The language is automatically detected. IPCCAT supports the following languages:
- English
- French

 

8.         What are 'Predicted Categories'?

When the user submits a text (e.g. a patent abstract) to be categorized, he/she chooses the number of predictions that the tool will attempt to make. The predictions initially displayed are the tool's best guesses as to which IPC symbol the current abstract best relates (given the selected classification level).

From this initial guess the user may either refine the predictions by clicking on the "Refine" double arrow to obtain predictions for Main Groups within that selected subclass, or may "coarsen" the predictions by clicking on the "Class" button under "Change Classification Level".

 

Predictions are based upon previous training of the system based on patents classified by human experts.

 

As a default setting, predictions are initially delivered at the subclass level.

 

9.         What is the 'Confidence' indicator for?

The tool makes predictions based on the text entered by the user.

The confidence indicator shows how certain the tool is of its predictions, according to its previous training (through a number of stars proportional to the level of confidence)

From time to time the tool will be re-trained on highly controlled data to ensure that it makes use of new vocabulary and recent reclassification of patent documents in the IPC. This ensures that the confidence indicator remains meaningful.

 

10.       What happens if I select 'Refine'?

When the user submits an abstract to the categorizer, the prediction results are displayed at the subclass level i.e. the class and the subclass to which it believes the submitted text should belong.

At this point the user may wish for a more detailed level in the IPC, i.e. for the tool to be yet more specific within each of its prediction(s). In this case the user may click the "Refine" double arrow on the IPC prediction line so that the tool shows the lower level predictions (Main Group).

 

Note: The number of lower level predictions is determined by the number of predictions set on the initial text entry page.

 

It is also possible for the user to select the class button and see the predictions at a higher level, before refining them within a specific class.

Alternatively, if after reading a set of predictions the user thinks that the categorizer is wrong and that the patent application should very probably be classified under a particular category (for instance the A01B subclass), he/she can force the categorizer to make predictions under this subclass. To do so, the user types the required category code (in our example, A01B) in the box located at the bottom of the result screen and clicks on the "Start From..." button.

 

11.       Example Usage


The following example is included to illustrate how IPCCAT may be used for testing purposes and as such uses data from Patentscope.

  1. In the Patentscope window, enter keywords for example "image recognition" (do not forget the double quotes) in the central text entry box, and click on the "Search" button.
  2. Copy the abstracts text of one of the documents
  3. Now return to the browser window of IPCCAT and Paste the text in the large, central text entry box above the "Classify" button.
  4. You should now decide how many predictions are required for the categorization of the text through the 'Number of predictions' field. The number of predictions selected here will determine how many results will be later returned to the user.
  5. Click now once on the 'Classify' button in order to launch the categorization work, and wait until suggested IPC categories are returned.
  6. You see now the predicted categories for the text. These predictions are by default at subclass level.
  7. Confirm the preferred subclass through consultation of the IPC Scheme for each subclass. To do so, click on the relevant  of the description column.
  8. If you are not satisfied with the predicted subclass, it is possible to manually influence the predicted subclass. To do so, click the 'Class' button displayed at the top of the screen to coarsen predictions to class level.
  9. From one of the likely predicted subclasses, it is possible to refine to the Main Group level. To do so, click on the relevant  of the refined column.
  10. When you go through the "Refine" step, the classifier predicts what the classification at Main Group level should be within the corresponding subclass. At this point you get an additional number of predictions at Main Group level.
  11. You can find more details about the predicted IPC symbol through its consultation in the IPC Internet publication. To do so, click on the relevant in the description column.
  12. You can start a classification from a particular section, class or subclass. To do so, type the desired section, class or subclass in the box at the left of "Start From" and click the "Start From" button.
  13. To start a new classification, click the "Start Over" button.

 

12.       IPC Coverage and precision

An IPC category is considered as covered by IPCCAT if it contains at least 10 documents.

The following statistics reflect IPCCAT current coverage:

 

English Set

French Set

Data Source

DOCDB

DOCDB

Number of Training Patents

15 517 590

2 614 226

Number of Testing Patents

3 879 398

653 557

Total Number of Example Patents

19 396 988

3 267 783

Total Number of Classes in IPC 2012.01

129

129

Number of Trained Classes

121

121

Coverage at Class Level

93.8%

93.8%

Precision of Classification at Class Level ("Three Guesses")

91.0%

93.0%

Total Number of Sub-Classes in IPC 2012.01

631

631

Number of Trained Sub-Classes

620

619

Coverage at Sub-Class Level

98.3%

98.1%

Precision of Classification at Sub-Class Level ("Three Guesses")

87.1%

89.0%

Total Number of Main Groups in IPC 2012.01

7 400

7 400

Number of Trained Main Groups

6 731

5 820

Coverage at Main Group Level

91.0%

78.6%

Precision of Classification at Main Group Level ("Three Guesses")

80.0%

81.1%

 

 


These statistics show that for certain Main Group areas IPCCAT's predictions will not be as reliable as those of an IPC specialist.

The training Corpus used is a collection prepared from the DOCDB XML database of March 2013, including patent documents reclassified in IPC 2012.01

 

13.       Limitations

- It is useful to underline the training task was initially performed using the patent's title and abstract. Therefore if the user does not want to use the full text of the patent application, it is suggested that the abstract be used (as a minimum), and not a random selection of text from the description.

- Categorization result consistency: Because of the big number of main groups in the IPC, it is difficult to find a large number of good training examples for each of them. More generally, users should be aware that all categories of the IPC are not documented in an even way, and that therefore some predictions are less reliable than others.
With this in mind it is perhaps understandable that for certain Main Group areas the categorizer's predictions will not be as accurate as a human classification.

 

14.      Further information on WIPO Computer-Assisted Categorization of Patent Documents in the IPC

 

 

15.      IPCCAT Web Service

Documentation

Web Service description