This question is same as Number of partitions of a spark dataframe created by reading the data from Hive table
But I think that question did not get a correct answer. Note that the question is asking how many partitions will be created when the dataframe is created as a result of executing a sql query against a HIVE table using SparkSession.sql method.
IIUC, the question above is distinct from asking how many partitions will be created when the dataframe is created as a result of executing some code like spark.read.json("examples/src/main/resources/people.json")
which loads the data directly from the filesystem - which could be HDFS. I think the answer to this latter question is given by spark.sql.files.maxPartitionBytes
spark.sql.files.maxPartitionBytes 134217728 (128 MB) The maximum number of bytes to pack into a single partition when reading files.
Experimentally, I have tried creating a dataframe from a HIVE table and the # of partitions I get is not explained by total data in hive table / spark.sql.files.maxPartitionBytes
Also adding to the OP, it would be good to know how can the number of partitions be controlled i.e., when one wants to force spark to use a different number than it would by default.
References: