PAWS Data Pipeline Update #1
UPDATE FROM OCT 22 MEETUP
paws_data_pipeline is off to a great start! Thanks to all those who came out Tuesday night. Great contributions across the board.
THE TEAM AND WHAT WE NEED
We had great representation from people who can make it happen. We are in need of one or a few program manager / project manager / project lead type persons who love to keep things organized! Karla and Chris will be involved every step along the way but lack the bandwidth to do week by week, task by task planning and tracking. CAN YOU HELP HERE??? IF SO… PLEASE JOIN THE SLACK CHANNEL AND MAKE YOURSELF KNOWN!
WHAT WE DID
After the kickoff presentation and some Q&A, we broke into two groups. One group focused on ideating around data extraction and creation of the data lake. The other group looked ahead to data cleansing, matching, validating, and linking it back in to systems-of-record such as Salesforce. Key points from each discussion are below.
Data Lake - Identified our source systems, discussed extraction methods briefly, and considered data lake architecture and construction. Next steps are to dig into one or two data sources, figure out extraction methods (APIs are not widely available although .csv exports are), and inventory records, structure, and elements for movement into the data lake. Test out the process.
Cleansing/Matching etc - Discussed the options for matching data across sources and making sense of the data from multiple systems (eg estimating complexity of cleaning process required and potential data architecture challenges). Next steps are to 1) make a few data sources available (probably on the volunteers/donors integration stream) so the team can understand what variables exist and in what format, and 2) clarify data privacy issues for doing this work. Additionally, PAWS staff will assess whether there are additional data structures to be created (e.g. labels) as an alternative to the currently unstructured text notes.
We will plan a meet for (we think) next week, Tuesday. Or it might be on November 5. Details on the next scheduled hack night are coming. We will communicate it through the Slack channel #paws_data_pipeline.
Some people requested remote connection, perhaps doing remote connections each week and then in-person every other week or once a month. This is still TBD. Thank you for the suggestions.
Stay tuned to the Slack channel for more!
AND… IF YOU ARE NOT INVOLVED YET AND WANT TO BE…. PLEASE JOIN UP!!!!
Chris & Karla