{"id":942,"date":"2021-06-17T20:49:27","date_gmt":"2021-06-17T15:19:27","guid":{"rendered":"https:\/\/techieshouts.com\/?p=942"},"modified":"2022-08-09T19:03:41","modified_gmt":"2022-08-09T13:33:41","slug":"spark-reading-from-oracle","status":"publish","type":"post","link":"https:\/\/techieshouts.com\/home\/spark-reading-from-oracle\/","title":{"rendered":"Spark reading from Oracle"},"content":{"rendered":"\n<p>In the other blog, we saw how to read a hive table in Spark. In this blog, we will see how to read data from Oracle<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from pyspark import SparkConf, SparkContext\nfrom pyspark.sql import SQLContext, Row\nfrom pyspark.sql.types import *\nfrom pyspark.sql.functions import lit\nfrom pyspark.sql import HiveContext\nfrom pyspark.sql import SparkSession\n\nappname = \"Application name\"\nspark = SparkSession.builder.appName(appname).getOrCreate()\noracleDF = spark.read \\\n    .format(\"jdbc\") \\\n    .option(\"url\", \"jdbc:oracle:thin:ORACLE_SERVER:PORT\/SID\") \\\n    .option(\"dbtable\", \"database.tablename\") \\\n    .option(\"user\", \"username\") \\\n    .option(\"password\", \"*****\") \\\n    .option(\"driver\", \"oracle.jdbc.driver.OracleDriver\") \\\n    .load()\nprint(oracleDF.count)<\/pre>\n\n\n\n<p>This will load the data from the Oracle table to the data frame. After that, we can perform any operation as per the program needs<\/p>\n\n\n\n<p>We need to pass the required odbc jar for the spark program to establish the connection with Oracle<\/p>\n\n\n\n<h3>Shell script to call python<\/h3>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"shell\" data-enlighter-theme=\"classic\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">ODBC_JAR=\"\/localpapth\/ojdbc6.jar\"\n    spark-submit --master yarn --deploy-mode client\\\n      --conf spark.dynamicAllocation.enabled=true \\\n      --conf spark.dynamicAllocation.minExecutors=1 \\\n      --conf spark.dynamicAllocation.maxExecutors=30 \\\n      --conf spark.dynamicAllocation.initialExecutors=1 \\\n      --jars ${ODBC_JAR} \\\n      ${SPARK_SCRIPT}<\/pre>\n\n\n\n<p>If you notice, we are passing the ${ODBC_JAR} for the Spark application. This is the library with driver details for connecting to Oracle<\/p>\n\n\n\n<p>Also read,<\/p>\n\n\n\n<ol><li><a href=\"https:\/\/techieshouts.com\/spark-reading-from-hive-table\/\">Spark reading from Hive table<\/a><\/li><li><a href=\"https:\/\/techieshouts.com\/spark-execution-modes\/\">Spark execution modes<\/a><\/li><\/ol>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the other blog, we saw how to read a hive table in Spark. In this blog, we will see how to read data from Oracle This will load the data from the Oracle table to the data frame. After that, we can perform any operation as per the program needs We need to pass\u2026 <span class=\"read-more\"><a href=\"https:\/\/techieshouts.com\/home\/spark-reading-from-oracle\/\">Read More &raquo;<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[11,136],"tags":[144,143],"_links":{"self":[{"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/posts\/942"}],"collection":[{"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/comments?post=942"}],"version-history":[{"count":3,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/posts\/942\/revisions"}],"predecessor-version":[{"id":947,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/posts\/942\/revisions\/947"}],"wp:attachment":[{"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/media?parent=942"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/categories?post=942"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/tags?post=942"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}