Skip to main content

03.20.2017

Obtaining and Validating Big Data

In this video Konstantinos Pouliasis presents a toolkit combining bash scripting, a refined postgres database design and Node.js libraries for effectively fetching, validating and storing big data publicly available from governmental resources.

Basic unix scripting programming constructs and specific commands are being presented. Curl command is utilized to fetch the zipped CSV files, the unzip command is coupled with sed to extract data and achieve some first data cleanup. A database side validation of CSV data integrity is being exposed and, next, a database design technique utilizing Postgres partitioning is being elaborated as appropriate for large seasonal data sets. Finally, the library pg-pool with its multithreaded connection capabilities is presented as an ideal complementary to automatizing data insertion using Node.js.

Project Members: Konstantinos Pouliasis

Find the program that fits your life.

Learn about our coding, cybersecurity, and data analytics bootcamps offered on full-time and part-time schedules.