- 6.1 The LifeQuest Database
- 6.2 Field List
- 6.3 How Patent Documents, Kind Codes, and Updates are Handled
- 6.4 How Patent Families are Handled
- 6.4.1 Simple Family Definition
- 6.4.2 Extended Family Definition
6.1 The LifeQuest Database
LifeQuest contains over 16 million patent documents coming from a long list of patent authorities. The large majority are full text. We provide machine translations for non-English documents, however the PDFs are in the original language.
The database consists of documents that were carefully chosen to belong to the domain of biology. To determine biology relatedness, we examined patent documents from life science companies in a wide variety of domains from pharma to med tech to agrochemical to food and cosmetics, we extracted all the CPC/IPCR codes from our GQ-Pat sequence database, and finally we carefully examined the CPC classification documentation. We chose the broadest interpretation of biology while attempting to avoid most non-pharmaceutical chemistry.
From the CPC codes, we then selected the equivalent IPCR and ECLA codes. We attempted to find the equivalent US codes, but the CPC to US codes pages provided by the USPTO are flawed and can be erroneous.
To obtain documents based on their CPC, IPCR and ECLA codes, we selected all documents with those codes, but also all documents in the same tight family. This helps us be thorough since assignment of classification codes to documents is done differently by different authorities.
6.1.1 CPC Codes Included
The complete table of classes contained in LifeQuest is available upon request to customers and potential customers.
The most common classes in LifeQuest are:
- A61 which concerns medical science
- C07D and C07K which are about heterocyclic compounds and peptides
- C12 which is about biochemistry, genetic engineering and wine/beer
Please send an email to [email protected] to request the full list of CPC codes in LifeQuest.
6.1.2 Document Counts by Authority upon Initial Release (will be updated)
Authority | # Docs | % Full Text |
---|---|---|
AP | 5808 | 0.03% |
AR | 53689 | 4.84% |
AT | 237777 | 14.61% |
AU | 619160 | 46.95% |
BE | 70164 | 91.45% |
BG | 16121 | 18.44% |
BR | 128929 | 38.11% |
CA | 475761 | 90.44% |
CH | 75305 | 65.04% |
CN | 1312306 | 99.88% |
CU | 2421 | 11.85% |
CZ | 32846 | 5.34% |
DD | 22867 | 57.40% |
DE | 794878 | 64.92% |
DK | 155443 | 29.94% |
EA | 21382 | 47.11% |
EC | 6865 | 0.06% |
EE | 4555 | 0.35% |
EP | 1183538 | 50.15% |
ES | 240300 | 47.99% |
FI | 126559 | 45.87% |
FR | 328869 | 79.07% |
GB | 308925 | 68.84% |
GR | 27034 | 1.50% |
GT | 3911 | 0.03% |
HR | 18490 | 0.17% |
HU | 94367 | 0.01% |
IE | 52264 | 28.50% |
IL | 131387 | 2.37% |
IN | 124405 | 60.27% |
IT | 106649 | 28.50% |
JP | 2105871 | 60.22% |
KR | 403810 | 94.72% |
LT | 2692 | 0.22% |
LU | 11426 | 65.98% |
LV | 3472 | 0.03% |
MC | 987 | 69.60% |
MD | 2378 | 1.93% |
MX | 96769 | 61.96% |
MY | 17922 | 3.35% |
NL | 102761 | 82.51% |
NO | 129684 | 2.97% |
NZ | 63679 | 0.09% |
OA | 5655 | 1.40% |
PL | 70642 | 0.60% |
PT | 84418 | 21.24% |
RO | 17812 | 0.58% |
RU | 226311 | 65.74% |
SE | 98341 | 37.74% |
SI | 14270 | 0.06% |
SK | 16296 | 0.44% |
SU | 84054 | 92.08% |
TR | 14977 | 3.32% |
TW | 144990 | 89.54% |
UA | 32723 | 0.23% |
US | 1853033 | 99.88% |
UY | 7047 | 19.11% |
WO | 915630 | 76.13% |
ZA | 89640 | 0.00% |
6.1.3 Publication Languages and Their Codes
At present, LifeQuest only provides a searchable index in English. Many foreign language documents are included in LifeQuest – these are machine translated into the English language before they are made searchable. We will expose native language queries for many common languages in future releases.
You can query on the original publication language via the Publication Language (la
) field. The codes to use are shown below.
|
|
|
|
6.2 Field List
The list of fields available to constrain your search is provided below.
Field Name | Abbreviation | Description | Available as Column in Workfile? |
---|---|---|---|
Full Text | all |
The Title, Abstract, Claims, Description, Applicants, Inventors, and Assignees. | No |
Title | ttl |
The title of the document. | Yes |
Title, Abstract | ttl_abst |
The title and abstract of the document. | No |
Title, Abstract, Claims | tac |
The title, abstract, and claims of the document. | No |
Claims | clm |
The claims of the document. | Yes |
Abstract | abst |
The abstract of the document. | Yes |
Description | desc |
The description of the document. | No |
Names | name |
An aggregate field that searches Applicants, Assignees, and Inventors. | No |
Patent Assignee or Applicant (Full) | pa |
An aggregate field that searches Applicants and Assignees as they occur naturally in the source document. | No |
Patent Assignee or Applicant (Normalized) | pan |
An aggregate field that searches normalized Applicants and Assignees as they occur naturally in the source document. | Yes |
Inventors | inv |
The Inventors of the patent document. | Yes |
Patent or Publication Number | pn |
The Patent Number or Publication Number for the document. Note you can search for either the general patent document without a PN-KL in which case you will receive every PN-KL combination, or you can search for a specific PN-KL, e.g., pn :US123456B in which case you will only receive the B version of the document. |
Yes |
Application Number | an |
The Application Number associated with the document. | Yes |
Filing Date | fd |
The date that the application was filed with the corresponding patent authority. | Yes |
Priority Date | prd |
The Priority Date associated with the patent – the date used to establish the novelty and/or obviousness of the invention relative to other art. | Yes |
Publication Date | pd |
The date that the document was published by the authority. | Yes |
CPC or IPCR Classification | class |
An aggregate field that contains both the CPC and the IPCR classification. | No |
CPC Classification | cpc |
The CPC classification code. | Yes |
IPCR Classification | ipcr |
The IPCR classification code. | Yes |
Patent Kind Code | kc |
The Kind Code of the document. | Yes |
Authority | au |
The two letter authority code for the document. | Yes |
Legal Status | ls |
The Legal Status of the document. Currently one of Application or Grant . In some cases, the value undefined may also occur, in which case the application does not know the Legal Status of the document. |
Yes |
Publication Language | la |
The Publication Language of the document. See Publication Language Codes for a list of the language codes. | Yes |
LQ Date of Entry | ed |
The date on which the document first entered the LifeQuest system. | Yes |
LQ Family Date of Entry | fed |
The date on which the first member of the family entered the LifeQuest system. | Yes |
LQ Last Update | lu |
The date when the document was last updated. Since this field is keyed to the PNKL representative (the latest numeric version within a kind code letter), so if a document has an A1, A2, and an A3 and the A1 is updated, unless that document fills in missing information in the A3, this field will be the last time the A3 document was updated. | Yes |
LQ Family Last Update | flu |
The date when any document in the family was last updated. This field is updated for every document in the family when any document in the family is updated. It uses the same PNKL strategy as the LQ Last Update field. | Yes |
Sequences | seq |
Whether the document contains sequences as annotated in GenomeQuest. This field supports two values: yes or no |
Yes |
Protein Sequence Count | ed |
The number of protein sequences in the document. Note that this can either be a specific number, or it can be a range. | Yes |
Nucleotide Sequence Count | nuc |
The number of nucleotide sequences in the document. Note that this can either be a specific number, or it can be a range. | Yes |
Publication Type | pubtype |
One of Grant , Application , or undefined. |
Yes |
6.3 How Patent Documents, Kind Codes, and Updates are Handled
Patent documents are accessed via a patent number (PN) and kind code (KC). Kind codes define major legal events in a document. Very often a kind code A followed by a number defines a pending document whereas kind code B and a number apply to a granted patent. This is not always the case, but commonly, kind codes are a letter and a number, rarely just a letter and even less commonly 2 letters.
6.3.1 Kind Letters (KL)
We introduce the concept of kind letter (KL). A kind letter is defined as just the part of the kind code that contains letters.
Kind Letters (KL) Corresponding to Sample Kind Codes
Kind CodeCorresponding Kind Letter
A1 | A |
A3 | A |
B8 | B |
A | A |
XX | XX |
Now, we define a PN-KL combination as the latest numeric version within a KL. So, for instance, if a patent had an A1, A2, A3, and a B1 document, we define two PN-KL combinations, the A3 document which represents the A kind letter, and the B1 document which represents the B kind letter. In such cases, we attempt to augment the KL document with any missing information from earlier documents. So, if the A3 document was missing its claims section, we would augment it with the A2 information. See the augmentation process, below.
6.3.2 What is Available vs What is Searchable
LifeQuest will store every PN-KC combination available to it and make these all viewable in the Full Patent Viewer. However, it will only make the PN-KL documents searchable in the search engine. That is, when LifeQuest documents are searched, hits are organized by kind letter where a A document represents the latest A1, A2 ,A4 ,… document. This simple trick allows us to present documents from authorities that try to keep the same PN for applications and grants (EPO for instance) in the same way as authorities that do not (USPTO for instance).
Available in Full Patent ViewerWhat is Searched in Search PageExplanationXXXXThe only document available.
A1, A2, A3, B1 | A3, B1 | The A3 corresponds to the latest A document, and the B1 to the latest B document. |
A1, A2, A7, B1, B3, C2 | A7, B3, C2 | The A7 corresponds to the latest A document, the B3 to the latest B document, and the C2 is the only C document. |
6.3.3 Augmentation of Document Information
But what is a patent document? When our providers send us documents, we get the latest version of a document. For instance, there could be several updates to a US1234A4 document. Moreover some documents are partial, for instance with no description. This happens sometimes when a document goes from US1234A1 to US1234A4 for instance, the description might not be included, because it is the same in the A4 and A1. We have an augmentation process to make documents complete. Let us say we have an US1234A7 document with missing claims and description. We will look in the document with a previous kind code first, i.e. US1234A4. If the claims are in the A4 we will copy the A4 claims to the A7 document. But since US1234A4 does not have a description either, we will look in US1234A1. We will copy the A1 description to the A4 and A9 documents.
This augmentation process is done within Kind Letters, in other words, an A4 document will not be augmented by a section in a B1 document and vice versa.
6.3.4 Advantages and Disadvantages of the PNKL Approach
The major advantages of the kind letter organization:
- All authorities are viewed the same way, that they change the PN when granted or do not change it makes no difference since the kind letter changes (from A to B mostly). The USPTO and the EPO are handled identically.
- Some Chinese documents that have the same PN and different KL are in fact totally unrelated documents. This happened when both indexing schemas used by SIPO for grants and applications crossed. They now appear and are handled differently since they have different kind letters.
A few disadvantages:
- You will access applications and grants of the same document which might look redundant. Switch the search from document mode to family mode to alleviate this.
- Some small authorities using 2 letter codes will not follow the same schema and all kind codes will be shown (this is mostly for Thailand).
6.4 How Patent Families are Handled
The LifeQuest system stores patents with patent-number kind-code level granularity, and makes patents searchable at the patent-number kind-letter (KL) level. For more on this, see 6.3, above. For a given family, every document in the family should be available inside of the database for viewing, and the most recent document within a kind-letter should be searchable.
At this time, our Simple and Extended families are computed by a proprietary algorithm, the logic of which is described in more detail in 6.4.1 and 6.4.2.. This algorithm uses a very strict definition of “priority”. For example, US provisional or PCT application numbers meant as priority documents are sometimes present in either INID 63 or INID 86 respectively, rather than in the INID 30 (priority) field of a patent document. In this type of situation, the application number field (INID 21) is used in its place as the foundational record for family formation.
6.4.1 Simple Family Definition
LifeQuest currently supports two different family definitions, the Simple Family and the Extended Family. The Simple Family is akin to the SIMPLE FAMILY as described in the WIPO Handbook on Industrial Property Information and Documentation. That is, a Simple Patent Family relating to the same invention, each member of which has for the basis of its “priority right” exactly the same originating application or applications.
Simple Family Example
Patent family members Originating (priority) applications
AT 319372 B | CH 13201/72 |
CH 538767 A | CH 13201/72 (first application) |
DE 2323735 A1 | CH 13201/72 |
ES 418590 A1 | CH 13201/72 |
FR 2199213 A1 | CH 13201/72 |
FR 2199213 B1 | CH 13201/72 |
JP 49-68240 A2 | CH 13201/72 |
SE 380138 B | CH 13201/72 |
SE 380138 C | CH 13201/72 |
US 3851217 A | CH 13201/72 |
6.4.2 Extended Family Definition
“Extended patent family” means a patent family relating to one or more inventions, each member of which has for the basis of its “priority right” at least one originating application in common with at least one other member of the family. For example, note how the US4670093A document draws these two simple families together.
Extended Family Example
Patent family members Originating (priority) applications
DE 3610333 A1 | DE 3514301 |
EP 200874 A1 | DE 3514301 |
JP 61-242026 A2 | DE 3514301 |
US 4670093 A | DE 3514301 DE 3610333 |
EP 240776 A1 | DE 3610333 |
JP 62-232129 A2 | DE 3610333 |