Recycle Your Data | Save Space and Money

A few years ago, I worked on a FileMaker project where the client told us that every week they need to import about 7,000 rows of data that represent some information that this company uses. And every week there might be some modified records, some records that don’t show up any longer, and some new records that hadn’t already existed. For this example, let’s just call it this data “Products”.

Over the years that weekly information grew to about 12,000 records per week. This PRODUCT table itself had 35 fields of data. So, without a way to manage the duplicates, every week lots of redundant data would need to be imported.

To give you an idea of how much data this solution grew to … For 132 weeks of data, the system was:
  • Storing 1,483,700 records
  • Taking up 628 MB on disk
  • Easily approaching 1 GB.

The solution? We put a new approach in place that only stored the changed records. The total number of records, with duplicates removed, came out to 269,855 records. That was about an 80% reduction in redundant data.

Especially now, if you are hosting a solution with FileMaker Cloud then every little bit of data adds up: storage, processing time, data traffic, etc. So, techniques like this could come in really handy in directly saving real money.

NOTE: This approach could apply to anything that has a recurring import, whether your frequency is daily, weekly, monthly or  quarterly.

Here is a flowchart and ERD that describes this approach:
flowchart of lots of data importing

The WEEKLY_DATA table can also be an Excel file. And you could also have the import fully automated to run unattended on FileMaker Server. See related post: What Are Your Imports Waiting For.

The IMPORT table is where all records are removed, then a new imported set is brought in and processed. It has an auto-enter calculation like the following.

GetContainerAttribute ( List ( FIELD1; FIELD2; FIELD3… ); “MD5” ) The same calculation is in the DATA table.

The DATA table has a DATE_START and DATE_END. When a new record is created  DATE_START and DATE_END will have the same date. If the next week we see the same MD5 hash then that means that there has been no change for this record, represented by the MD5 hash, then all we do is set the DATE_END to the latest date for the matching MD5 hash record.

If you ever wanted to see the history of a Product you can perform a range search in the DATE_START and DATE_END field.

If in the following week’s import a particular product no longer appears, then that record would not appear in the import and thus never get updated. The last DATE_END value will stay.

Below is the script for this — copied compliments of InspectorPro 🙂 Note that this is pulling data from the historical table as prototype/proof of concept. Otherwise we would be processing a weekly import. And this script is set up in such a way that you could process more than 1 week’s/period’s worth of data

Script for reducing data

 

Vincenzo Menanno is Director of FileMaker Development at Beezwax, and creator of InspectorPro. In addition to FileMaker, Vince’s other most recent passions include visualizations, sharp knives, organization and architecture.

 

Header photo credit: climb (smaller) mountains of data ; photo by vincenzo menanno

Leave a Reply