logo
down
shadow

How can I use dataproc to pull data from bigquery that is not in the same project as my dataproc cluster?


How can I use dataproc to pull data from bigquery that is not in the same project as my dataproc cluster?

By : Hussaini Muhammad
Date : November 25 2020, 01:01 AM
like below fixes the issue To use service account key file authorization you need to set mapred.bq.auth.service.account.enable property to true and point BigQuery connector to a service account json keyfile using mapred.bq.auth.service.account.json.keyfile property (cluster or job). Note that this property value is a local path, that's why you need to distribute a keyfile to all the cluster nodes beforehand, using initialization action, for example.
Alternatively, you can use any authorization method described here, but you need to replace fs.gs properties prefix with mapred.bq for BigQuery connector.
code :


Share : facebook icon twitter icon
read data from BigQuery and/or Cloud Storage GCS into Dataproc

read data from BigQuery and/or Cloud Storage GCS into Dataproc


By : rico
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , Using the BigQuery connector is best for cases where you want to abstract away the GCS export/import as much as possible, and don't want to explicitly manage datasets inside of GCS.
If you already have the dataset inside of GCS, it's likely better to use the GCS dataset directly to avoid the additional export steps, as well as being able to use simpler filesystem interfaces directly. The downside is it's more costly to maintain two copies of your dataset (one in GCS and one in BQ) and keep them in sync. But if the size isn't prohibitive and the data isn't updated too frequently, you might find it easiest to keep the GCS dataset around for direct access.
Data Fusion Provisioning of Dataproc Cluster Fails

Data Fusion Provisioning of Dataproc Cluster Fails


By : user2901914
Date : March 29 2020, 07:55 AM
I hope this helps you . Because the Dataproc cluster remains in "provisioning", my suspicion is that the network being used for the Dataproc cluster is not configured such that the nodes of the Dataproc cluster can communicate with each other. For more information on this, see https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/network#overview.
Error: permission denied on resource project when launching Dataproc cluster

Error: permission denied on resource project when launching Dataproc cluster


By : Krax
Date : March 29 2020, 07:55 AM
To fix this issue One possible reason is you are using a wrong project. You should use your project id, not your project name. Sometimes they are the same, sometimes not. I just met this problem and I think this could be the possible reason, and it will be useful for other people that find this question in the future.
Asking the appropriate spec of cluster for Google Dataproc to handle our data

Asking the appropriate spec of cluster for Google Dataproc to handle our data


By : Nilakanta Kshetriya
Date : March 29 2020, 07:55 AM
seems to work fine So, fist of all I will try to address to Compute Engine vs Dataproc question and then moving to sizing the cluster.
Compute Engine is Google's IaaS offering and it's basically a service to spin up VMs. Google Dataproc uses Google Compute Engine to spin up the Virtual Machines that will act as node/master in your cluster. Moreover, Dataproc already install and configures several things on the nodes, so you don't have to take care of it. If you need more stuff on the nodes, Google maintains a set of scripts that can be used install additional dependencies on the cluster. So, answering your question you need Google Compute Engine in the sense that without it you won't be able to spin up a cluster. And, if you're already set for using PySpark, Dataproc is the right choice.
Spinning up a Dataproc cluster with Spark BigQuery Connector

Spinning up a Dataproc cluster with Spark BigQuery Connector


By : Moises Rocha
Date : September 30 2020, 11:00 PM
I think the issue was by ths following , Connectors init action applies only to Cloud Storage and BigQuery connectors for Hadoop from bigdata-interop repository.
Generally you should not use BigQuery connector for Hadoop if you are using Spark, because there are newer BigQuery connector for Spark in the spark-bigquery-connector repository that you already adding with --jars parameter.
code :
gsutil cp gs://path/to/spark-bigquery-connector.jar /usr/lib/spark/jars/
Related Posts Related Posts :
  • What could be causing my WhatsApp Stickers Pack not to work?
  • How Can I Reorder/Sort The Collections List in Directus?
  • Is this language generic/mighty enough to be used for a generic game AI?
  • graphite, use regular expressions to select the target, or an alternative
  • subtract functions with type real in ml
  • how to filter '(' in navision 2013
  • sending sms from a mobile browser
  • NuGet behind firewall
  • Gstreamer hangs while generating timelapse from JPEGs on Raspberry pi
  • How to retrieve total view count of large number of pages combined from the GA API
  • Websites rich with exercices or explanation for SML?
  • Is there a TempData equivalent in ServiceStack?
  • scipy-0.12.0 failing to install on mountain lion using python setup.py install
  • Looking for simplest option to render Razor cshtml pages in a console application without any web server
  • Evaluating variables at a specific time in Modelica
  • When I run the Application, only "web" engine is running in GlassFish. "webservices" is not started
  • How To Set MIME Type Of Google Drive File
  • Remove Missing Values in Weka
  • Reloading a UICollectionView using reloadData method returns immediately before reloading data
  • carrot2 - can I cluster documents from a folder?
  • StreamSocket has no Close Implementation in C#
  • Rails, Foundation 4, Respond.js not working properly in IE8
  • How can i create imagesurface from cairo xlib's Graphics Context using cairo and x11 Api's?
  • CKEditor "overflow: scroll" on parent causes toolbar to freeze at initial position
  • Differences between components and controls in ENYO
  • Photoshop making isometric?
  • Does Intel IPP 8.0 support in-place operations?
  • What is Object dictionary in CANOpen?
  • Example of orbBasic Indexed User Variables
  • convert to ABSOLUTE in logback
  • How to conditionally download file using p:fileDownload
  • Error on pod install
  • Set HTTP GET Parameters in Finagle
  • different attack that uses sql injection
  • How can I change my xampp username not as 'root'
  • AMQP Content header payload structure
  • Apache POI formula evaluation not working for Excel IF
  • How can I trace RESTEasy's dispatch?
  • Map Freezes on iOS 7 with Google Maps SDK 1.4
  • Comparing lists, is the subset list within the first list
  • Non-ascii character highlight in Sublime Text 2
  • Installing Magit in Aquamacs
  • Receiving error - System.Net.Mail.SmtpException: 4.3.2 try again later
  • Coreaudio render callback in monotouch
  • The command 'yarn --v' also initiates 'yarn install' and installs packages automatically. Why is this happening?
  • save multiple matches in a list (grep or awk)
  • Can a number register be used in a groff request?
  • Mapping FAQ with RASA for large dataset (2000+)
  • Fragment not receiving LiveData updates after remove + add
  • FitText.js makes text bigger rather than smaller
  • ARM - Implementing stack with load/store multiple register values
  • How to check if a ChromeCast Session is already in progress
  • ngForm inside a Carousel Slide in UI Bootstrap not working
  • Clearing attributes in Tritium
  • "vagrant up" failing: Vagrant VM failed to remain in the running state
  • ftsearch returning empty docs
  • What are the advantages of setting "hive.exec.parallel" to false in Hive ?
  • Creating a root certificate in FiddlerCore
  • How to access app.config in a blueprint?
  • DB2 RECORDSET table name converted to uppercase
  • shadow
    Privacy Policy - Terms - Contact Us © ourworld-yourmove.org