A method in one embodiment comprises receiving as inputs a plurality of application programming interface (API) documents respectively configured as electronic files, analyzing the electronic files via a first classifier to determine whether one or more schema types are present in the electronic files, generating a first classification based on the schema types for the electronic files determined to include the schema types, analyzing the electronic files via a second classifier for the electronic files determined not to include the schema types, wherein the analyzing via the second classifier is performed using one or more predictive classifiers generated via one or more machine learning techniques, generating a second classification based on the predictive classifiers for the electronic files determined not to include the schema types, and outputting a plurality of classified API documents based on the first and second classifications.
Legal claims defining the scope of protection, as filed with the USPTO.
1. An apparatus comprising: at least one processing platform comprising a plurality of processing devices; said at least one processing platform being configured to: receive as inputs a plurality of application programming interface (API) documents, wherein the plurality of API documents are respectively configured as electronic files; analyze the electronic files via a first classifier to determine whether one or more schema types are present in the electronic files; generate a first classification based on the one or more schema types for a first set of the electronic files determined to include the one or more schema types; analyze a second set of the electronic files via a second classifier responsive to a negative determination regarding whether the one or more schema types are present in the second set of the electronic files, wherein the analyzing via the second classifier is performed using one or more predictive classifiers generated via one or more machine learning techniques; generate a second classification based on the one or more predictive classifiers for the second set of the electronic files; and output a plurality of classified API documents based on the first and second classifications; wherein the one or more schema types comprise at least one of a managed object schema type, a web service schema type, and a conceptual definition schema type.
2. The apparatus of claim 1 wherein the electronic files are respectively configured in a plurality of formats.
3. The apparatus of claim 2 wherein the electronic files comprise metadata content for API data.
4. The apparatus of claim 1 wherein the managed object schema type comprises managed object format (MOF), the web service schema type comprises web services description language (WSDL), and the conceptual definition schema type comprises conceptual schema definition language (CSDL).
5. The apparatus of claim 1 wherein said at least one processing platform is further configured to validate and parse the one or more schema types.
6. The apparatus of claim 1 wherein the predictive classifiers comprise a plurality of API classification vectors.
7. The apparatus of claim 6 wherein said at least one processing platform is further configured to generate the plurality of API classification vectors using term frequency-inverse document frequency (TF-IDF) vectorization.
8. The apparatus of claim 6 wherein said at least one processing platform is further configured to invoke an artificial neural network model to classify the plurality of API classification vectors into API formats and reject non-API data.
9. The apparatus of claim 1 wherein said at least one processing platform is further configured to generate a training data set for the second classifier by training a neural network on a customized text corpus.
10. The apparatus of claim 9 wherein the customized text corpus is dynamic.
11. The apparatus of claim 1 wherein the plurality of classified API documents comprise API reference documents comprising API definitions.
12. The apparatus of claim 1 wherein the plurality of classified API documents comprise API reference documents comprising metadata.
13. A method comprising: receiving as inputs a plurality of application programming interface (API) documents, wherein the plurality of API documents are respectively configured as electronic files; analyzing the electronic files via a first classifier to determine whether one or more schema types are present in the electronic files; generating a first classification based on the one or more schema types for a first set of the electronic files determined to include the one or more schema types; analyzing a second set of the electronic files via a second classifier responsive to a negative determination regarding whether the one or more schema types are present in the second set of the electronic files, wherein the analyzing via the second classifier is performed using one or more predictive classifiers generated via one or more machine learning techniques; generating a second classification based on the one or more predictive classifiers for the second set of the electronic files; and outputting a plurality of classified API documents based on the first and second classifications; wherein the one or more schema types comprise at least one of a managed object schema type, a web service schema type, and a conceptual definition schema type; and wherein the method is performed by at least one processing platform comprising at least one processing device comprising a processor coupled to a memory.
14. The method of claim 13 wherein the managed object schema type comprises managed object format (MOF), the web service schema type comprises web services description language (WSDL), and the conceptual definition schema type comprises conceptual schema definition language (CSDL).
15. The method of claim 13 further comprising validating and parsing the one or more schema types.
16. The method of claim 13 wherein the predictive classifiers comprise a plurality of API classification vectors.
17. The method of claim 16 further comprising generating the plurality of API classification vectors using term frequency-inverse document frequency (TF-IDF) vectorization.
18. The method of claim 16 further comprising invoking an artificial neural network model to classify the plurality of API classification vectors into API formats and reject non-API data.
19. The method of claim 13 further comprising generating a training data set for the second classifier by training a neural network on a dynamic customized text corpus.
20. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing platform causes said at least one processing platform to: receive as inputs a plurality of application programming interface (API) documents, wherein the plurality of API documents are respectively configured as electronic files; analyze the electronic files via a first classifier to determine whether one or more schema types are present in the electronic files; generate a first classification based on the one or more schema types for a first set of the electronic files determined to include the one or more schema types; analyze a second set of the electronic files via a second classifier responsive to a negative determination regarding whether the one or more schema types are present in the second set of the electronic files, wherein the analyzing via the second classifier is performed using one or more predictive classifiers generated via one or more machine learning techniques; generate a second classification based on the one or more predictive classifiers for the second set of the electronic files; and output a plurality of classified API documents based on the first and second classifications; wherein the one or more schema types comprise at least one of a managed object schema type, a web service schema type, and a conceptual definition schema type.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
July 20, 2018
September 29, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.