Big Data

Big Data

Big Data Blog Posts That I Found Interesting – I am sure there are many others

Best Of Amazon AWS Big Data Blog https://aws.amazon.com/blogs/big-data/orchestrate-multiple-etl-jobs-using-aws-step-functions-and-aws-lambda/ https://github.com/aws-samples/aws-etl-orchestrator https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/best-practices.html#organizingstacks https://aws.amazon.com/blogs/big-data/build-a-data-lake-foundation-with-aws-glue-and-amazon-s3/ https://aws.amazon.com/blogs/big-data/orchestrate-apache-spark-applications-using-aws-step-functions-and-apache-livy/

Big Data

tcph-kit to generate data

TCPH – http://www.tpc.org/tpch/ Here is where you get the kit to generate data – https://github.com/gregrahn/tpch-kit # make a dir to

Amazon AWS, Big Data, ETL

Using AWS EMR and Spark to Perform ETL

https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-1/ https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-2-code-development-with-notebooks-and-docker/ https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-3-running-pyspark-on-emr/ https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-4-analysing-the-data/ https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-5/ http://spark.apache.org/docs/latest/index.html

Scroll to Top