Legal claims defining the scope of protection, as filed with the USPTO.
1. A computer system comprising: a plurality of processors for managing capture and processing of job listing information data captured through a data network from a plurality of job related sources for compilation into a searchable data structure; an administrative portal module implemented on one of the plurality of processors for providing system administration and operational control through a network interface; a scraping management module implemented on one of the plurality of processors for coordinating operation of and communication between one or more job scraping engines to obtain scraped job information data sets from corporate career sites and job boards identified by a site management module in the administrative portal module and store the scraped data sets in a database, each scraped data set comprising data fields; a quality management module implemented on one of the plurality of processors coupled to the scraping management module for comparing the data fields of each scraped job data set stored in the database with predetermined quality rules, wherein the quality rules include document rules, and wherein if the data set fails one or more of the document rules, the data set will not be indexed in the database and the data set will be flagged for quality review; a job listing data categorization module implemented on one of the plurality of processors operable to examine and categorize each job data set stored in the database into one or more predetermined job categories based upon a volume of the scraped job data sets and return categorized job data sets to the database; and a search bank synchronizer implemented on one of the plurality of processors for communicating with the database for compiling and transferring categorized job data sets from the database to a job search bank.
2. The system according to claim 1 wherein the data network includes the Internet.
3. The system according to claim 1 wherein the categorization module comprises: a job categorization database; and a categorization module determining a confidence value in each predetermined job category for each scraped job listing information data set by comparing text of each scraped job data set with previously categorized job data text in the job categorization database.
4. The system according to claim 3 wherein the administration portal comprises a categorization review module permitting a reviewer to verify categorizations determined by the document categorization platform service in the categorization module.
5. The system according to claim 3 wherein each job data set returned to the database includes an assigned job category determined by the categorization module and an assigned confidence value for that category.
6. The system according to claim 5 wherein each data set returned to the database further includes a confidence value for each predetermined job category.
7. The system according to claim 3 wherein each job data set returned to the database includes a manual review flag set if the assigned confidence value is below a predetermined threshold value.
8. The system according to claim 7 wherein the administration portal comprises a categorization review module permitting a reviewer to verify categorizations determined by the categorization module.
9. The system according to claim 1 wherein the administrative portal further comprises a quality review module communicating with the quality management module permitting a reviewer to manually examine job data sets that have been flagged.
10. A method of obtaining, handling and compiling job information data sets comprising: scraping, by a processor, job information data sets from one or more job listings on one or more corporate career sites or job boards available through the Internet; storing, by the processor, a job data set corresponding to each scraped job listing found in a database, each job data set comprising data fields; comparing, by the processor, each data field of each scraped data set stored in the database with predetermined quality criteria, wherein the quality criteria include document rules, and wherein if the data set fails one or more of the document rules, the data set will not be indexed in the database and the data set will be flagged for quality review; categorizing, by the processor, each data set stored in the database into one or more predetermined job categories based upon a volume of the scraped data sets and returning the categorized job information data set to the database; and communicating, by the processor, with the database for compiling and transferring categorized job data sets from the database to a job search bank.
11. The method according to claim 10 further comprising obtaining job information data sets from one or more of customer sites through an XML feed.
12. The method according to claim 10 wherein the categorizing operation further comprises: assigning a confidence value for each job information data set for each of the predetermined job categories.
13. The method according to claim 10 wherein the categorizing operation comprises: comparing text of each scraped job information data set with text of previously categorized job information data sets in a job categorization database; and determining a confidence value in each predetermined category for each scraped data set.
14. The method according to claim 13 further comprising: flagging each categorized scraped data set that has a confidence value below a predetermined value for manual review; and providing a manual review module permitting a reviewer to verify any flagged categorizations through an administration portal.
15. The method according to claim 10 further comprising assigning a confidence value for the job category assigned to each data set returned to the database.
16. The method according to claim 15 further comprising flagging any data set returned to the database having an assigned confidence level below a predetermined threshold.
17. The method according to claim 10 further comprising: transferring selected categorized job information data sets from the job search bank through a web client server cluster to a job seeker in response to a query by the job seeker.
18. The method according to claim 10 wherein the scraping operation further comprises: accessing one of the job boards or corporate career sites through the Internet; flagging any scraped job information data set comprising data fields not meeting the predetermined quality criteria; permitting a manual review of flagged job information data sets returned to the database, and the categorizing operation further comprises; comparing data in each scraped job information data set with previously categorized job data set data in a categorization database; and determining a confidence value in each predetermined job category for each scraped job information data set.
19. The method according to claim 18 further comprising: flagging each categorized scraped data set that has a confidence value below a predetermined value for manual review; and providing a manual review module in an administration portal permitting a reviewer to verify any flagged categorizations.
20. The method according to claim 18 further comprising transferring selected categorized data sets from the search bank through a web server to a user in response to a query by the user.
21. A computer readable medium tangibly encoding a computer program of instructions for executing a computer process for scraping job description data from corporate career sites and job boards, the computer process comprising: scraping listing information data from one or more listings on sites available through the Internet; storing a scraped data set corresponding to each scraped listing information in a database, each scraped data set comprising data fields; comparing data fields of each scraped data set stored in the database with predetermined quality criteria, wherein the quality criteria include document rules, and wherein if the data set fails one or more of the document rules, the data set will not be indexed in the database and the data set will be flagged for quality review; categorizing each data set stored in the database into one or more predetermined categories based upon a volume of the scraped data sets and returning the categorized data set to the database; and communicating with the database for compiling and transferring categorized job data sets from the database to a job search bank.
22. The computer readable medium of claim 21 wherein the process further comprises: flagging any scraped data set comprising data fields not meeting the predetermined quality criteria; permitting a manual review of flagged data sets returned to the database, and wherein the categorizing operation further comprises; comparing text in each scraped data set with previously categorized data set text in a categorization database; and determining a confidence value in each predetermined category for each scraped data set.
23. The method of claim 10 wherein the categorizing each data set stored in the database into one or more predetermined job categories further comprises choosing the value c′ for category c that maximizes p(c|x), expressed as c′=arg max c p(c|x) where x is a feature vector of the data set and p(c|x) is a conditional probability.
24. The method of claim 23 wherein the categorizing each data set stored in the database into one or more predetermined job categories further comprises computing a discriminant function d(x,c).
Unknown
April 20, 2010
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.