aws sdk - Why can't I load data from Amazon S3 in DSX notebook? -
i used following code load data amazon s3:
from ingest import connectors pyspark.sql import sqlcontext sqlcontext = sqlcontext(sc) s3loadoptions = { connectors.amazons3.access_key : 'akiajycjafzyennpacna', connectors.amazons3.secret_key : 'a6voqu3caccbfi0peqlkwqxkrquqyxqqnousondy', connectors.amazons3.source_bucket : 'ngpconnector', connectors.amazons3.source_file_name : 'addresses3.csv', connectors.amazons3.source_infer_schema : '1', connectors.amazons3.source_file_format : 'csv'} s3df = sqlcontext.read.format('com.ibm.spark.discover').options(**s3loadoptions).load() s3df.printschema() s3df.show(5) but when run code snippet, following error. similar error message when load data source, such dashdb.
attributeerrortraceback (most recent call last) <ipython-input-1-9da344857d7e> in <module>() 4 5 s3loadoptions = { ----> 6 connectors.amazons3.access_key : 'akiajycjafzyennpacna', 7 connectors.amazons3.secret_key : 'a6voqu3caccbfi0peqlkwqxkrquqyxqqnousondy', 8 connectors.amazons3.source_bucket : 'ngpconnector', attributeerror: 'nonetype' object has no attribute 'amazons3'
please use alternative ingest if like.
for spark 1.6
hconf = sc._jsc.hadoopconfiguration() for spark 2.0
spark = sparksession.builder.getorcreate() hconf = spark.sparkcontext._jsc.hadoopconfiguration() set s3 parameters in hadoop configuration
#replace accesskey amazon accesskey , secret amazon secret hconf.set("fs.s3a.access.key", "<put-your-access-key>") hconf.set("fs.s3a.secret.key", "<put-your-secret-key>") then read
spark = sparksession.builder.getorcreate() df_data_1 = spark.read\ .format('org.apache.spark.sql.execution.datasources.csv.csvfileformat')\ .option('header', 'true')\ .load('s3a://<your-bucket-name>/<foldername>/<filename>.csv') df_data_1.take(5) to write back
df_data_1.write.save("s3a://charlesbuckets31/folderb/users.parquet") thanks, charles.
Comments
Post a Comment