I am developing a handheld data logger. The CPU is a STM32F10x, the storage chip is a 16MByte SPI-Flash.
Information of the Flash Chip: Sector Size: 4096 Bytes Programming Page Size: 256 Bytes
Currently I am using a simple File system. I append the acquired record to a binary file. The data is not necessarily stored as a file.
The structure of the record:
typedef struct strRecord{ char Info[56]; uint32_t TimeStamp; uint32_t SomeNumber; ....//Some other data }Record;
The length of the record is fixed. Before the record is appended to the simple binary file, the system must query if there is a record has the same Info(the first member of the structure) already exists in the file. If such a record exists, the new record will be merged to that record, otherwise a new record will be appended. There are no relation between these records, they come in random.
Currently I use an exhaustive approach, read from begin to end and compare. This solution only works when the file contains just hundreds of record or less.
The problem is : As the file size increases, the query process will become very slow. The system should work with tens of thousands of record.
I have attempted to port an older sqlite version to the system, but I have only about 140K of Flash size and 48K of RAM to spare. I could only trim the database to about 200K and the attempt failed.
Since the record has a fixed-length string in it, maybe some Hash-Table style trivial structure would rescue me. But all the string hash-table algorithm implementations I know of store the hash-table in memory.
Could anyone give me some hint of how to implement a hash-table on simple File System or Raw Flash.
Isn't hashing a Computer Science staple?
Not sure of the rational in jumping to sqlite to manage this, you'd be better looking at code from the 80's-90's on constrained memory systems (PC DOS). Linker and librarian tools jump to mind.
What you're likely going to need to do is a mix of caching and hashing. Decide what you can maintain in RAM, how much RAM you can commit to it, and how you want to index the data, or re-index it at startup. Think about how you can manage the indexing so the cost is spread of your use of it rather than a front loaded cost of indexing it all at once, prior to first use.
Is the time stamp linear? Can you use that as a method to binary search the records that might be matches, or close potential matches?
The time stamp is not linear and would not always in sorted status. Some times the logger's time is not accurate and the user will deliberately adjust the date time(It's permitted). The time stamp is only for reference.
I don't have a deep understanding of the old systems in 80s and 90s, such like PC DOS. And I doubt that these old systems are better than the SQLite.
I figure that what I need is something between a simple plain binary file and an embedded database. I need the hash table efficiency but could not afford even an embedded database. My poor system have about 140K of flash space and 48K of RAM space to spare. So any memory based indexing or caching is beyond my consideration.
My next attempt is to create a big index file(several M Bytes in my 16M Flash), and try some hash-table solutions.