logo
down
shadow

How to get the size of a table in Cassandra?


How to get the size of a table in Cassandra?

By : user2952235
Date : November 19 2020, 12:41 AM
Any of those help I want to khnow what is the size of a table in Cassandra.
code :
nodetool cfstats -- <keyspace>.<table>


Share : facebook icon twitter icon
Cassandra: how to get total table size / estimate row count

Cassandra: how to get total table size / estimate row count


By : Kunwar Pal Singh
Date : March 29 2020, 07:55 AM
Hope this helps I think the best way to do this would be to access the statistics directly through JMX (which is how nodetool actually works.) Each node provdies a wide range of metrics but what you would be interested in are.
code :
org.apache.cassandra.metrics
  ColumnFamily
    cf_name
       TotalDiskSpaceUsed
       MemtableDataSize
Cassandra - Limiting Table Size

Cassandra - Limiting Table Size


By : klemensior15
Date : March 29 2020, 07:55 AM
this one helps. No, you can't.
By design, C* will vary the amount of disk space used, eg. during compaction, saving key/row caches to disk, index files, bloom filters, snapshots etc (all config dependant) so it may not just be the data you've inserted that you need to account for. What should be included/excluded from this hard limit?
Cassandra - >500mb CSV file produces ~50mb size table?

Cassandra - >500mb CSV file produces ~50mb size table?


By : Héctor Concha Ferxz
Date : March 29 2020, 07:55 AM
like below fixes the issue I am new to Cassandra and trying to figure out how sizing works. I created a keyspace and a table. I then generated a script to create 1 million rows in java into a csv file and insert it into my database. The CSV file was ~545 mb in size. I then loaded it into the database and ran nodetool cfstats command and received this output. It says the total space used is 50555052 bytes ( ~50 mb). How can this be? With overhead of indexes, columns, etc how can my total data be smaller than just the raw CSV data (not just smaller, but so much smaller)? Maybe I am not reading something here correctly, but does this seem right? I am using Cassandra 2.2.1 on a single machine. , So I thought of the biggest 3 pieces of data:
code :
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiYWRtaW4iOnRydWV9.TJVA95OrM7E2cBab30RMHrHDcEfxjoYZgeFONFh7HgQ
public class Main {

    private static final String ALPHA_NUMERIC_STRING = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";

    public static void main(String[] args) {

        generateCassandraCSVData("users.csv");

    }

    public static String randomAlphaNumeric(int count) {
        StringBuilder builder = new StringBuilder();
        while (count-- != 0) {
        int character = (int)(Math.random()*ALPHA_NUMERIC_STRING.length());
        builder.append(ALPHA_NUMERIC_STRING.charAt(character));
        }
        return builder.toString();
        }


    public static void generateCassandraCSVData(String sFileName){

    java.util.Date date= new java.util.Date();


        try{

            FileWriter writer = new FileWriter(sFileName);
            for(int i=0;i<1000000;i++){



            writer.append("Username " + i);
            writer.append(',');
            writer.append(new Timestamp(date.getTime()).toString());
            writer.append(',');
            writer.append("myfakeemailaccnt@email.com");
            writer.append(',');
            writer.append(new Timestamp(date.getTime()).toString());
            writer.append(',');
            writer.append("" + randomAlphaNumeric(150) + "");
            writer.append(',');
            writer.append("" + randomAlphaNumeric(150) + "");
            writer.append(',');
            writer.append("" + randomAlphaNumeric(150) + "");
            writer.append(',');
            writer.append("tr");
            writer.append('\n');


            //generate whatever data you want
            }   
            writer.flush();
            writer.close();

        }
        catch(IOException e)
        {
             e.printStackTrace();
        } 

    }

}
Table: users
        SSTable count: 4
        Space used (live): 554671040
        Space used (total): 554671040
        Space used by snapshots (total): 0
        Off heap memory used (total): 1886175
        SSTable Compression Ratio: 0.6615549506522498
        Number of keys (estimate): 1019477
        Memtable cell count: 270024
        Memtable data size: 20758095
        Memtable off heap memory used: 0
        Memtable switch count: 25
        Local read count: 0
        Local read latency: NaN ms
        Local write count: 1323546
        Local write latency: 0.048 ms
        Pending flushes: 0
        Bloom filter false positives: 0
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 1533512
        Bloom filter off heap memory used: 1533480
        Index summary off heap memory used: 257175
        Compression metadata off heap memory used: 95520
        Compacted partition minimum bytes: 311
        Compacted partition maximum bytes: 770
        Compacted partition mean bytes: 686
        Average live cells per slice (last five minutes): 0.0
        Maximum live cells per slice (last five minutes): 0
        Average tombstones per slice (last five minutes): 0.0
        Maximum tombstones per slice (last five minutes): 0
Calculating the size of a table in Cassandra

Calculating the size of a table in Cassandra


By : Emiliano Liotta
Date : March 29 2020, 07:55 AM
With these it helps It's because of Cassandra's version < 3 internal structure.
There is only one entry for each distinct partition key value. For each distinct partition key value there is only one entry for static column There is an empty entry for the clustering key For each column in a row there is a single entry for each clustering key column
code :
CREATE TABLE my_table (
    pk1 int,
    pk2 int,
    ck1 int,
    ck2 int,
    d1 int,
    d2 int,
    s int static,
    PRIMARY KEY ((pk1, pk2), ck1, ck2)
); 
 pk1 | pk2 | ck1 | ck2  | s     | d1     | d2
-----+-----+-----+------+-------+--------+---------
   1 |  10 | 100 | 1000 | 10000 | 100000 | 1000000
   1 |  10 | 100 | 1001 | 10000 | 100001 | 1000001
   2 |  20 | 200 | 2000 | 20000 | 200000 | 2000001
             |100:1000:  |100:1000:d1|100:1000:d2|100:1001:  |100:1001:d1|100:1001:d2|  
-----+-------+-----------+-----------+-----------+-----------+-----------+-----------+
1:10 | 10000 |           |  100000   |  1000000  |           |  100001   |  1000001  |


             |200:2000:  |200:2000:d1|200:2000:d2|
-----+-------+-----------+-----------+-----------+ 
2:20 | 20000 |           |  200000   |  2000000  |
Single Partition Size = (4 + 4 + 4 + 4) + 4 + 2 * ((4 + (4 + 4)) + (4 + (4 + 4))) byte = 68 byte

Estimated Table Size = Single Partition Size * Number Of Partition 
                     = 68 * 2 byte
                     = 136 byte
Regarding Cassandra Table Size

Regarding Cassandra Table Size


By : jlo0312
Date : March 29 2020, 07:55 AM
it helps some times "nodetool tablestats" replaces the older command "nodetool cfstats". In other words both are the same. Output of this command lists the size of each of the tables within a keyspace.
Amongst the output, you are looking for "Space used (total)" value. Its the Total number of bytes of disk space used by SSTables belonging to this table, including obsolete SSTables waiting to be GCd.
shadow
Privacy Policy - Terms - Contact Us © ourworld-yourmove.org