Legal claims defining the scope of protection, as filed with the USPTO.
1. A method for flash-optimized data layout of a dataset for queries, wherein the dataset comprises a table having a plurality of columns, the method comprising: storing, by a processor, selection columns in flash memory according to a selection optimized layout, the selection optimized layout being configured to optimize predicate matching and data skipping; wherein the selection optimized layout, for each selection column, is formed by: storing a selection column dictionary filled with unique data values in a given selection column, the unique data values stored in sorted order in the selection column dictionary; and storing row position designations corresponding to each row position that the unique data values are present within the given selection column, without duplicating storage of any of the unique data values that occur more than once in the given selection column; storing projection columns in the flash memory according to a projection optimized layout, the projection optimized layout being configured to optimize random accesses to the projection columns; wherein the projection optimized layout, for each projection column, is formed by: storing a projection column dictionary filled with unique data values in a given projection column, the unique data values in the given projection column stored in sorted order in the projection column dictionary; and storing a lookup structure comprising an index per row position to each of the unique data values in the given projection column, without duplicating storage of any of the unique data values that occur more than once in the given projection column; storing hybrid projection and selection columns in the flash memory according to a hybrid projection and selection optimized layout, wherein each hybrid projection and selection column has a sorted order such that duplicative data values in a given hybrid projection and selection column are arranged in consecutive rows within that hybrid projection and selection column; wherein the hybrid projection and selection optimized layout, for each hybrid projection and selection column and without duplicating storage of any of the duplicative data values in the given hybrid projection and selection column, is formed by: storing a hybrid projection and selection column dictionary filled with one of each duplicative data value in the given hybrid projection and selection column, the ones of each duplicative data values stored in sorted order in the hybrid projection and selection column dictionary; storing a combined lookup and row position designation structure, the combined lookup and row position designation structure comprising, for each data value stored in the hybrid projection and selection column dictionary, a pair of row position designations corresponding to a first row position and a last row position that the given data value is present within the given hybrid projection and selection column, the combined lookup and row position designation structure further comprising, for each pair of row position designations, a pointer to a corresponding data value in the hybrid projection and selection column dictionary; and storing a plurality of offsets, each offset corresponding to a data value in the hybrid projection and selection column dictionary and serving to indicate a location of a first row position of a given pair of row position designations corresponding to that data value; determining, for each column of the plurality of columns and based on selection part versus projection part query usage statistics for that column, whether to classify that column for storage in the flash memory as one of the selection columns, one of the projection columns, or one of the hybrid projection and selection columns; receiving a query to be run against the dataset, the query comprising a projection part and a selection part, the selection part including a predicate to be matched to data values in a first hybrid projection and selection column according to rules in the selection part and the projection part including a requirement to retrieve particular data from a second hybrid projection and selection column to use in answering the query; executing the selection part by: matching the predicate to a first data value in a first hybrid projection and selection column dictionary for the first hybrid projection and selection column; identifying among a first plurality of offsets for data values in the first hybrid projection and selection column dictionary a first offset that corresponds to the first data value; and locating, using the first offset and within a first combined lookup and row position designation structure for the first hybrid projection and selection column, a first pair of row position designations corresponding to a first row position and a last row position that the first data value is present within the first hybrid projection and selection column; and executing the projection part by: identifying, using the first pair of row position designations, a second pair of row position designations in a second combined lookup and row position designation structure for the second hybrid projection and selection column, wherein the second pair row position designations overlaps the first pair of row position designations; locating, using a second pointer in the second combined lookup and row position designation structure corresponding with the second pair row position designations, a second data value in a second projection and selection column dictionary that is present within the second pair of row positions in the second hybrid projection and selection column; and using the second data value to answer to the query.
Unknown
August 14, 2018
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.