6.0 LifeQuest Content

6.1 The LifeQuest Database

LifeQuest contains over 16 million patent documents coming from a long list of patent authorities. The large majority are full text. We provide machine translations for non-English documents, however the PDFs are in the original language.

The database consists of documents that were carefully chosen to belong to the domain of biology. To determine biology relatedness, we examined patent documents from life science companies in a wide variety of domains from pharma to med tech to agrochemical to food and cosmetics, we extracted all the CPC/IPCR codes from our GQ-Pat sequence database, and finally we carefully examined the CPC classification documentation. We chose the broadest interpretation of biology while attempting to avoid most non-pharmaceutical chemistry.

From the CPC codes, we then selected the equivalent IPCR and ECLA codes. We attempted to find the equivalent US codes, but the CPC to US codes pages provided by the USPTO are flawed and can be erroneous.

To obtain documents based on their CPC, IPCR and ECLA codes, we selected all documents with those codes, but also all documents in the same tight family. This helps us be thorough since assignment of classification codes to documents is done differently by different authorities.

6.1.1 CPC Codes Included

The complete table of classes contained in LifeQuest is available upon request to customers and potential customers.

The most common classes in LifeQuest are:

  • A61 which concerns medical science
  • C07D and C07K which are about heterocyclic compounds and peptides
  • C12 which is about biochemistry, genetic engineering and wine/beer

Please send an email to [email protected] to request the full list of CPC codes in LifeQuest.

6.1.2 Document Counts by Authority upon Initial Release (will be updated)

Authority # Docs % Full Text
AP 5808 0.03%
AR 53689 4.84%
AT 237777 14.61%
AU 619160 46.95%
BE 70164 91.45%
BG 16121 18.44%
BR 128929 38.11%
CA 475761 90.44%
CH 75305 65.04%
CN 1312306 99.88%
CU 2421 11.85%
CZ 32846 5.34%
DD 22867 57.40%
DE 794878 64.92%
DK 155443 29.94%
EA 21382 47.11%
EC 6865 0.06%
EE 4555 0.35%
EP 1183538 50.15%
ES 240300 47.99%
FI 126559 45.87%
FR 328869 79.07%
GB 308925 68.84%
GR 27034 1.50%
GT 3911 0.03%
HR 18490 0.17%
HU 94367 0.01%
IE 52264 28.50%
IL 131387 2.37%
IN 124405 60.27%
IT 106649 28.50%
JP 2105871 60.22%
KR 403810 94.72%
LT 2692 0.22%
LU 11426 65.98%
LV 3472 0.03%
MC 987 69.60%
MD 2378 1.93%
MX 96769 61.96%
MY 17922 3.35%
NL 102761 82.51%
NO 129684 2.97%
NZ 63679 0.09%
OA 5655 1.40%
PL 70642 0.60%
PT 84418 21.24%
RO 17812 0.58%
RU 226311 65.74%
SE 98341 37.74%
SI 14270 0.06%
SK 16296 0.44%
SU 84054 92.08%
TR 14977 3.32%
TW 144990 89.54%
UA 32723 0.23%
US 1853033 99.88%
UY 7047 19.11%
WO 915630 76.13%
ZA 89640 0.00%

6.1.3 Publication Languages and Their Codes

At present, LifeQuest only provides a searchable index in English. Many foreign language documents are included in LifeQuest – these are machine translated into the English language before they are made searchable. We will expose native language queries for many common languages in future releases.

You can query on the original publication language via the Publication Language (la) field. The codes to use are shown below.

Language Code
Arabic ar
Bulgarian bg
Chinese zh
Croatian hr
Czech cz
Danish da
Dutch nl
English en
Estonian et
Finnish fi
Language Code
Finnish fi
French fr
German de
Georgian ka
Greek el
Hebrew he
Hungarian hu
Icelandic is
Indonesian id
Italian it
Language Code
Japanese ja
Korean ko
Latvian lv
Lithuanian lt
Malayalam ml
Mongolian mn
Norwegian no
Persian fa
Polish pl
Portugese pt
Language Code
Romanian mo
Russian ru
Serbian sr
Slovenian sl
Spanish es
Swedish sv
Turkish tr
Ukrainian uk
Unknown xx
Vietnamese vi

6.2 Field List

The list of fields available to constrain your search is provided below.

Field Name Abbreviation Description Available as Column in Workfile?
Full Text all The Title, Abstract, Claims, Description, Applicants, Inventors, and Assignees. No
Title ttl The title of the document. Yes
Title, Abstract ttl_abst The title and abstract of the document. No
Title, Abstract, Claims tac The title, abstract, and claims of the document. No
Claims clm The claims of the document. Yes
Abstract abst The abstract of the document. Yes
Description desc The description of the document. No
Names name An aggregate field that searches Applicants, Assignees, and Inventors. No
Patent Assignee or Applicant (Full) pa An aggregate field that searches Applicants and Assignees as they occur naturally in the source document. No
Patent Assignee or Applicant (Normalized) pan An aggregate field that searches normalized Applicants and Assignees as they occur naturally in the source document. Yes
Inventors inv The Inventors of the patent document. Yes
Patent or Publication Number pn The Patent Number or Publication Number for the document. Note you can search for either the general patent document without a PN-KL in which case you will receive every PN-KL combination, or you can search for a specific PN-KL, e.g., pn:US123456B in which case you will only receive the B version of the document. Yes
Application Number an The Application Number associated with the document. Yes
Filing Date fd The date that the application was filed with the corresponding patent authority. Yes
Priority Date prd The Priority Date associated with the patent – the date used to establish the novelty and/or obviousness of the invention relative to other art. Yes
Publication Date pd The date that the document was published by the authority. Yes
CPC or IPCR Classification class An aggregate field that contains both the CPC and the IPCR classification. No
CPC Classification cpc The CPC classification code. Yes
IPCR Classification ipcr The IPCR classification code. Yes
Patent Kind Code kc The Kind Code of the document. Yes
Authority au The two letter authority code for the document. Yes
Legal Status ls The Legal Status of the document. Currently one of Application or Grant. In some cases, the value undefined may also occur, in which case the application does not know the Legal Status of the document. Yes
Publication Language la The Publication Language of the document. See Publication Language Codes for a list of the language codes. Yes
LQ Date of Entry ed The date on which the document first entered the LifeQuest system. Yes
LQ Family Date of Entry fed The date on which the first member of the family entered the LifeQuest system. Yes
LQ Last Update lu The date when the document was last updated. Since this field is keyed to the PNKL representative (the latest numeric version within a kind code letter), so if a document has an A1, A2, and an A3 and the A1 is updated, unless that document fills in missing information in the A3, this field will be the last time the A3 document was updated. Yes
LQ Family Last Update flu The date when any document in the family was last updated. This field is updated for every document in the family when any document in the family is updated. It uses the same PNKL strategy as the LQ Last Update field. Yes
Sequences seq Whether the document contains sequences as annotated in GenomeQuest. This field supports two values: yes or no Yes
Protein Sequence Count ed The number of protein sequences in the document. Note that this can either be a specific number, or it can be a range. Yes
Nucleotide Sequence Count nuc The number of nucleotide sequences in the document. Note that this can either be a specific number, or it can be a range. Yes
Publication Type pubtype One of Grant, Application, or undefined. Yes

6.3 How Patent Documents, Kind Codes, and Updates are Handled

Patent documents are accessed via a patent number (PN) and kind code (KC). Kind codes define major legal events in a document. Very often a kind code A followed by a number defines a pending document whereas kind code B and a number apply to a granted patent. This is not always the case, but commonly, kind codes are a letter and a number, rarely just a letter and even less commonly 2 letters.

6.3.1 Kind Letters (KL)

We introduce the concept of kind letter (KL). A kind letter is defined as just the part of the kind code that contains letters.

Kind Letters (KL) Corresponding to Sample Kind Codes

Kind CodeCorresponding Kind Letter

A1 A
A3 A
B8 B

Now, we define a PN-KL combination as the latest numeric version within a KL. So, for instance, if a patent had an A1, A2, A3, and a B1 document, we define two PN-KL combinations, the A3 document which represents the A kind letter, and the B1 document which represents the B kind letter. In such cases, we attempt to augment the KL document with any missing information from earlier documents. So, if the A3 document was missing its claims section, we would augment it with the A2 information. See the augmentation process, below.

6.3.2 What is Available vs What is Searchable

LifeQuest will store every PN-KC combination available to it and make these all viewable in the Full Patent Viewer. However, it will only make the PN-KL documents searchable in the search engine. That is, when LifeQuest documents are searched, hits are organized by kind letter where a A document represents the latest A1, A2 ,A4 ,… document. This simple trick allows us to present documents from authorities that try to keep the same PN for applications and grants (EPO for instance) in the same way as authorities that do not (USPTO for instance).

Available in Full Patent ViewerWhat is Searched in Search PageExplanationXXXXThe only document available.

A1, A2, A3, B1 A3, B1 The A3 corresponds to the latest A document, and the B1 to the latest B document.
A1, A2, A7, B1, B3, C2 A7, B3, C2 The A7 corresponds to the latest A document, the B3 to the latest B document, and the C2 is the only C document.

6.3.3 Augmentation of Document Information

Basic Augmentation

But what is a patent document? When our providers send us documents, we get the latest version of a document. For instance, there could be several updates to a US1234A4 document. Moreover some documents are partial, for instance with no description. This happens sometimes when a document goes from US1234A1 to US1234A4 for instance, the description might not be included, because it is the same in the A4 and A1. We have an augmentation process to make documents complete. Let us say we have an US1234A7 document with missing claims and description. We will look in the document with a previous kind code first, i.e. US1234A4. If the claims are in the A4 we will copy the A4 claims to the A7 document. But since US1234A4 does not have a description either, we will look in US1234A1. We will copy the A1 description to the A4 and A9 documents.

This augmentation process is done within Kind Letters, in other words, an A4 document will not be augmented by a section in a B1 document and vice versa.

6.3.4 Advantages and Disadvantages of the PNKL Approach

The major advantages of the kind letter organization:

  • All authorities are viewed the same way, that they change the PN when granted or do not change it makes no difference since the kind letter changes (from A to B mostly). The USPTO and the EPO are handled identically.
  • Some Chinese documents that have the same PN and different KL are in fact totally unrelated documents. This happened when both indexing schemas used by SIPO for grants and applications crossed. They now appear and are handled differently since they have different kind letters.

A few disadvantages:

  • You will access applications and grants of the same document which might look redundant. Switch the search from document mode to family mode to alleviate this.
  • Some small authorities using 2 letter codes will not follow the same schema and all kind codes will be shown (this is mostly for Thailand).

6.4 How Patent Families are Handled

The LifeQuest system stores patents with patent-number kind-code level granularity, and makes patents searchable at the patent-number kind-letter (KL) level. For more on this, see 6.3, above. For a given family, every document in the family should be available inside of the database for viewing, and the most recent document within a kind-letter should be searchable.

At this time, our Simple and Extended families are computed by a proprietary algorithm, the logic of which is described in more detail in 6.4.1 and 6.4.2..  This algorithm uses a very strict definition of “priority”.  For example, US provisional or PCT application numbers meant as priority documents are sometimes present in either INID 63 or INID 86 respectively, rather than in the INID 30 (priority) field of a patent document.  In this type of situation, the application number field (INID 21) is used in its place as the foundational record for family formation. 

6.4.1 Simple Family Definition

LifeQuest currently supports two different family definitions, the Simple Family and the Extended Family. The Simple Family is akin to the SIMPLE FAMILY as described in the WIPO Handbook on Industrial Property Information and Documentation. That is, a Simple Patent Family relating to the same invention, each member of which has for the basis of its “priority right” exactly the same originating application or applications.

Simple Family Example

Patent family members          Originating (priority) applications

AT 319372 B CH 13201/72
CH 538767 A CH 13201/72 (first application)
DE 2323735 A1 CH 13201/72
ES 418590 A1 CH 13201/72
FR 2199213 A1 CH 13201/72
FR 2199213 B1 CH 13201/72
JP 49-68240 A2 CH 13201/72
SE 380138 B CH 13201/72
SE 380138 C CH 13201/72
US 3851217 A CH 13201/72

6.4.2 Extended Family Definition

“Extended patent family” means a patent family relating to one or more inventions, each member of which has for the basis of its “priority right” at least one originating application in common with at least one other member of the family. For example, note how the US4670093A  document draws these two simple families together.

Extended Family Example

Patent family members                   Originating (priority) applications

DE 3610333 A1 DE 3514301
EP 200874 A1 DE 3514301
JP 61-242026 A2 DE 3514301
US 4670093 A DE 3514301
DE 3610333
EP 240776 A1 DE 3610333
JP 62-232129 A2 DE 3610333