August 2018

Big Data

tcph-kit to generate data

TCPH – http://www.tpc.org/tpch/ Here is where you get the kit to generate data – https://github.com/gregrahn/tpch-kit # make a dir to

Amazon AWS, Big Data, ETL

Using AWS EMR and Spark to Perform ETL

https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-1/ https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-2-code-development-with-notebooks-and-docker/ https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-3-running-pyspark-on-emr/ https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-4-analysing-the-data/ https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-5/ http://spark.apache.org/docs/latest/index.html

Scroll to Top