Artificial Intelligence Driven Knowledge Graph Generation

PublishedMay 20, 2025

Assigneenot available in USPTO data we have

InventorsSanat MOHANTY Omkar Krishnat PATIL

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A data processing system comprising: a processor; and a machine-readable medium storing executable instructions that, when executed, cause the processor alone or in combination with other processors to perform operations comprising: accessing content items from a plurality of data sources associated with pharmaceutical development, medical device development, or both; generating a knowledge graph by analyzing each of the content items with a first language model to obtain embedding vectors representing each first content items, the embedding vectors representing one or more categories of information associated with each of the content items; receiving a query and an indication of a format for results of the query from a first client device, the query identifying one or more categories of information to search for using the knowledge graph, and the indication of the format for the results of the query indicating a format in which results of the query are to be presented; generating query embeddings for the query using the first language model; searching the knowledge graph based on the query embeddings to obtain the results of the query; generating a representation of the results of the query according to the indication of the format for the results of the query; and causing the first client device to present the representation of the results of the query on a user interface of the first client device.

2. The data processing system of claim 1, wherein the content items include one or more of press releases, news articles, documents submitted to regulatory agencies both domestically and internationally, journal articles and/or other publications, abstracts of publications, published patent applications and issued patents, financial filings, and analyst call transcripts.

3. The data processing system of claim 1, wherein the first language model is a Large Language Model (LLM) or Small Language Model (SLM), the first language model having an encoder-decoder architecture.

4. The data processing system of claim 1, wherein the representation of the results of the query comprises a graphical representation of results of the query providing a visualization of the results of the query.

5. The data processing system of claim 1, wherein searching the knowledge graph based on the query embeddings to obtain the results of the query comprises searching the knowledge graph using a vector search.

6. The data processing system of claim 1, wherein the machine-readable medium includes instructions configured to cause the processor alone or in combination with other processors to perform operations of, prior to generating the knowledge graph, extracting information from the content items and converting the information to a standard format used to train the first language model.

7. The data processing system of claim 1, wherein the machine-readable medium includes instructions configured to cause the processor alone or in combination with other processors to perform operations of: automatically checking the plurality of data sources for new content items, updated content items, or both; and automatically updating the knowledge graph by analyzing the new content items, the updated content items, or both using the first language model.

8. The data processing system of claim 7, wherein the machine-readable medium includes instructions configured to cause the processor alone or in combination with other processors to perform operations of: causing the first client device to present a source configuration user interface that includes a control for configuring a frequency at which the automatic checks for a selected data source of the plurality of data sources is checked for new content items, updated content items, or both; receiving an indication of a first frequency from the first client device; and automatically checking the selected data source for new content items, updated content items, or both according to the first frequency.

9. The data processing system of claim 1, wherein the generating the knowledge graph includes associating each content item with content item source information that provides an indication of the data source from the plurality of data sources from which the content item can be obtained.

10. The data processing system of claim 1, wherein the representation of the results of the query include controls, which when activated, cause the client device to present content source information associated with each of the content items from which the representation is derived.

11. The data processing system of claim 1, wherein generating the knowledge graph further comprises generating connection information indicating connections between the content items and known biomedical entities.

12. The data processing system of claim 11, wherein generating the connection information further comprises: generating a final confidence score by combining a similarity score associated with a first content item and a first known biomedical entity generating by a vector search algorithm and with an output of a last neural layer of the vector search algorithm; and creating a connection between the first content item and the first known biomedical entity in response to the final confidence score indicating that a confidence score matrix indicating that the first content item includes a reference to the first known biomedical entity.

13. A method implemented in a data processing system for analyzing content items, the method comprising: accessing content items from a plurality of data sources associated with pharmaceutical development, medical device development, or both; generating a knowledge graph by analyzing each of the content items with a first language model to obtain embedding vectors representing each first content items, the embedding vectors representing one or more categories of information associated with each of the content items; receiving a query and an indication of a format for results of the query from a first client device, the query identifying one or more categories of information to search for using the knowledge graph, and the indication of the format for the results of the query indicates a format in which results of the query are to be presented; generating query embeddings for the query using the first language model; searching the knowledge graph based on the query embeddings to obtain the results of the query; generating a representation of the results of the query according to the indication of the format for the results of the query; and causing the first client device to present the representation of the results of the query on a user interface of the first client device.

14. The method of claim 13, wherein the content items include one or more of press releases, news articles, documents submitted to regulatory agencies both domestically and internationally, journal articles and/or other publications, abstracts of publications, published patent applications and issued patents, financial filings, and analyst call transcripts.

15. A data processing system comprising: a processor; and a machine-readable medium storing executable instructions that, when executed, cause the processor alone or in combination with other processors to perform operations comprising: obtaining, using a data access unit, known biomedical entity information for a plurality of known biomedical entities, the known biomedical entities comprising one or more of disease information, biomarker information, or mechanisms of action information; selecting, using the data access unit, information for a first biomedical entity from among the plurality of known biomedical entities as a first query parameter; querying, using the data access unit, content items from one or more data sources associated with pharmaceutical development, medical device development, or both; compressing, using the data compression unit, the first query parameter and the each of the content items using a first compression algorithm to create compressed content entries; analyzing, a candidate entity selection unit, the compressed content entries to select candidate content entries to be added to a knowledge graph; constructing, using a graph construction unit, the knowledge graph using the candidate content entries; and validating, using a graph validation unit, the knowledge graph to identify and remove discrepancies from the knowledge graph.

16. The data processing system of claim 15, wherein the biomedical entity information includes one or more of disease information, biomarker information, or mechanisms of action information.

17. The data processing system of claim 15, wherein the first compression algorithm is one of a lossless compression algorithm or a lossy compression algorithm.

18. The data processing system of claim 15, wherein compressing the first query parameter and the each of the content items using the first compression algorithm to create compressed content entries further comprises: compressing the first query parameter and a first content item using the first compression algorithm to create a parameter and content item compressed entry; compressing the first query parameter using the first compression algorithm to create a query parameter compressed entry; compressing the first content item using the first compression algorithm to create a content item compressed entry.

19. The data processing system of claim 18, wherein analyzing the compressed content entries to select candidate content entries further comprises: determining a normalized compression distances between the query and parameter content item and the query parameter compressed entry and between the query and parameter content item and the content item compressed entry; selecting the first content item as a candidate content entry responsive to the normalized compression distances satisfying a predetermined threshold; and analyzing the candidate content entries with a encoder-based neural network to determine whether each of the candidate content entries are valid matches.

20. The data processing system of claim 15, wherein validating the knowledge graph to identify and remove discrepancies from the knowledge graph further comprises: analyzing the knowledge graph using a transformer-based algorithm to identify discrepancies in associations between biomedical entities and content items in the knowledge graph.

Patent Metadata

Filing Date

Unknown

Publication Date

May 20, 2025

Inventors

Sanat MOHANTY

Omkar Krishnat PATIL

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search