logo
down
shadow

How to speed up counting the occurences of a word in large files?


How to speed up counting the occurences of a word in large files?

By : user2950867
Date : November 17 2020, 11:58 AM
fixed the issue. Will look into that further Assuming there are no insanely large lines in the file using something like
code :
for (std::string line; std::getline(in, line); } {
    // find the number of "<page>" strings in line
}
for (std::istreambuf_iterator<char> it(in), end;
     (it = std::find(it, end, '<') != end; ) {
    // match "<page>" at the start of of the sequence [it, end)
}


Share : facebook icon twitter icon
java weka stringtowordvector is not counting word occurences properly

java weka stringtowordvector is not counting word occurences properly


By : Kakerdu
Date : March 29 2020, 07:55 AM
Hope this helps so I'm using Weka Machine Learning Library's JAVA API and I have the following code: , Gee... all those lines of code. How about these few lines instead?
code :
public static Map<String, Integer> countWords(String input) {
    Map<String, Integer> map = new HashMap<String, Integer>();
    Matcher matcher = Pattern.compile("\\b\\w+\\b").matcher(input);
    while (matcher.find())
        map.put(matcher.group(), map.containsKey(matcher.group()) ? map.get(matcher.group()) + 1 : 1);
    return map;
}
public static void main(String[] args) {
    System.out.println(countWords("sample, repeat sample, of text"));
}
{of=1, text=1, repeat=1, sample=2}
Python: faster way of counting occurences in numpy arrays (large dataset)

Python: faster way of counting occurences in numpy arrays (large dataset)


By : user4355016
Date : March 29 2020, 07:55 AM
Hope this helps I am new to Python. I have a numpy.array which size is 66049x1 (66049 rows and 1 column). The values are sorted smallest to largest and are of float type, with some of them being repeated.
code :
import numpy as np
import pandas as pd
from collections import Counter
import matplotlib.pyplot as plt

arr = np.random.randint(0, 100, (100000,1))

df = pd.DataFrame(arr)

cnt = Counter(df[0])

df_p = pd.DataFrame(cnt, index=['data'])

df_p.T.plot(kind='hist')

plt.show()
import numpy as np
import pandas as pd
from collections import Counter
import matplotlib.pyplot as plt

arr = np.random.randint(0, 100, (100000,1))

df = pd.DataFrame(arr)

cnt = Counter(df[0])

df_p = pd.DataFrame(cnt, index=['data']).T


df_p['cumu'] = df_p['data'].cumsum()

df_p['cumu'].plot(kind='line')

plt.show()
import numpy as np
import pandas as pd
from collections import Counter
import matplotlib.pyplot as plt

arr = np.random.randint(0, 100, (100000,1))

df = pd.DataFrame(arr)

cnt = Counter(df[0])

df_p = pd.DataFrame(cnt, index=['data']).T


df_p['cumu'] = df_p['data'].cumsum()

df_p.plot(kind='scatter', x='data', y='cumu')

plt.show()
C Programming: Counting word length occurences in a string

C Programming: Counting word length occurences in a string


By : dileep
Date : March 29 2020, 07:55 AM
it helps some times How would you be able to count word lengths and output their occurrences from a string using gets() or fgets()? For example, here is code doing so but using getchar()below. I think writing it in gets() would make it easier to incorporate all of the delimiters in the program rather than having to manually set if statements for each one of those would it not? , You can build your custom delimiter detection function.
code :
// globals
const char *delim = " .,;:!?\n\0";
const int n_delim = 9;

int is_delim(int c)
{
    register int i;
    for (i = 0; i < n_delim; i++)
        if (c == delim[i]) return 1;
    return 0;
}
fgets(buffer, 200, stdin);

for (i = 0; i < strlen(buffer); i++) {
    if (is_delim(buffer[i])) {
        wl[words++] = length;
        length = 0;
        continue;
    }
    length++;
}
Word occurences in VBA: how to speed up

Word occurences in VBA: how to speed up


By : Pigi
Date : March 29 2020, 07:55 AM
I wish this helpful for you Add a reference to the Microsoft Scripting Runtime (Tools -> References...). Then use the following:
code :
Private Sub CommandButton1_Click()
    Const SpanishLCID = 3082
    Dim dict As New Scripting.Dictionary, word As Variant, fixedWord As String
    Dim key As Variant

    dict.CompareMode = SpanishLCID
    For Each word In ActiveDocument.Words
        fixedWord = Trim(StrConv(Trim(word), vbLowerCase, SpanishLCID))
        If Not dict.Exists(fixedWord) Then
            dict(fixedWord) = 1
        Else
            dict(fixedWord) = dict(fixedWord) + 1
        End If
    Next

    ListBox1.Clear
    For Each key In dict.Keys
        ListBox1.AddItem key & "=" & dict(key)
    Next
End Sub
Function IsValidWord(s As String) As Boolean
    Const validChars As String = "abcdefghijklmnopqrstuvwxyz"
    Dim i As Integer, char As String * 1
    For i = 1 To Len(s)
        char = Mid(s, i, 1)
        If InStr(1, validChars, char, vbTextCompare) = 0 Then Exit Function
    Next
    IsValidWord = True
End Function
Dim regex As RegExp
Function IsValidWord2(s As String) As Boolean
    If regex Is Nothing Then
        Set regex = New RegExp
        regex.Pattern = "[^a-z]"
        regex.IgnoreCase = True
    End If
    IsValidWord2 = Not regex.Test(s)
End Function
Function GetValidWord(s As String) As String
    'GetValidWord("Introduction.......3") will return "Introduction"
    If regex2 Is Nothing Then
        Set regex2 = New RegExp
        regex2.Pattern = "[^a-z]"
        regex2.Global = True
        regex2.IgnoreCase = True
    End If
    GetValidWord = regex2.Replace(s, "")
End Function
    For Each word In ActiveDocument.Words
        fixedWord = Trim(StrConv(Trim(word), vbLowerCase, SpanishLCID))
        fixedWord = GetValidWord(fixedWord)
        If Not dict.Exists(fixedWord) Then
counting the number of occurences of each word in a pdf file java

counting the number of occurences of each word in a pdf file java


By : user2012331
Date : March 29 2020, 07:55 AM
it should still fix some issue I am making a java program using PDFbox that reads any pdf file and counts how many times each word appears in the file but for some reason nothing appears when I run the program, I expect it to print each word and the number of occurrences of that word next to it. thanks in advance. here is my code: , I have tried to resolve the logic.
code :
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Map;
import java.util.TreeMap;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;

public class Extractor {

    public static void main(String[] args) throws FileNotFoundException {
        Map<String, Integer> wordFrequencies = new TreeMap<String, Integer>();
        Map<Character, Integer> charFrequencies = new TreeMap<Character, Integer>();
        PDDocument pd;
        File input = new File("C:\\Users\\Ammar\\Desktop\\Application.pdf");
        try {
            pd = PDDocument.load(input);
            PDFTextStripper stripper = new PDFTextStripper();
            stripper.setEndPage(20);
            String text = stripper.getText(pd);
            for(int i=0; i<text.length(); i++)
            {
                char c = text.charAt(i);
                int count = charFrequencies.get(c) != null ? (charFrequencies.get(c)) + 1 : 1;
                charFrequencies.put(c, count);
            }
            String[] texts = text.split(" ");
            for (String txt : texts) {
                int count = wordFrequencies.get(txt) != null ? (wordFrequencies.get(txt)) + 1 : 1;
                wordFrequencies.put(txt, count);

            }

            System.out.println("Printing the number of words");
            for (String key : wordFrequencies.keySet()) {
                System.out.println(key + ": " + wordFrequencies.get(key));
            }

            System.out.println("Printing the number of characters");
            for (char charKey : charFrequencies.keySet()) {
                System.out.println(charKey + ": " + charFrequencies.get(charKey));
            }

            if (pd != null) {
                pd.close();
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}
Related Posts Related Posts :
  • C++ Doubly Linked List with Pointers: Object of class isn't constructed properly
  • Using a random string generator in c++ constructor
  • What should I use instead of void as one of the alternative types in an variant?
  • C++ return value from multithreads using reference
  • How to connect multiple TCP IP clients to same server port using c++
  • Defaul compiler generates the reference operator (In C++)?
  • Unable to change directory time stamp after using FILE_FLAG_BACKUP_SEMANTICS
  • vector handling displaying output
  • WSAGetLastError returns WSAENOTSOCK - Cause?
  • C++: How to overload pow for user type?
  • C++ using arrays as multidimensional despite initalising it as 1D with pointer
  • How negate std::is_integral for use in tag dispatch?
  • Retrieve serial number from USB memory (Windows environment c++)
  • g++ error: invalid preprocessing directive #INCLUDE
  • C++ What is the std::for_each() function parameter type?
  • C++: Read individual lines from text file, sort words alphabetically
  • Saving 'this' address into a variable
  • c++ command line arguments in ubuntu terminal
  • Convert "Cartesian coordinates" to "polar coordinates with respect to user specified origin"
  • In what order are local scoped objects destructed?
  • How to use SDL_MapRGB with SDL 2.0
  • how compiler and interpreter work in case of array declaration
  • GSL integration behaves strange
  • Cropping an image with OpenCV and C
  • Find the last line in text file and select the first 10 char and print to a new file?
  • Created many CCSprits but when triggering ccTouchBegan gives the last one allways
  • seekp and seekg don't work with fstream
  • Taking input in Sublime Text 3
  • ld: -bind_at_load and -bitcode_bundle (Xcode setting ENABLE_BITCODE=YES) cannot be used together
  • C++ storing #define as std::string would give size as zero compile time
  • How to use static linking with OpenSSL in C/C++
  • What is the difference between a trap, an error, a failure and program abortion?
  • Dynamic members allocation in qt
  • How to reduce object file size when compiling for VxWorks 5.5.1?
  • Printing char by integer qualifier
  • How to write code to be executed before main() gets control?
  • Blocking socket - waitForReadyRead()
  • std::string related errors and handling them, c++
  • VM interpreter - weighting performance benefits and drawbacks of larger instruction set / dispatch loop
  • C / C++ SHIFT / OFFSET / MOVE a Bitmap to the LEFT or RIGHT?
  • Printing numbers column by column
  • How do you change your app icon in visual studio 2013?
  • Fast Screen Transfer
  • c++ Read text file and input the numbers into a 2D array
  • Why are my C++ pointers suddenly diverging?
  • Is there a macro-based adapter to make a functor from a class?
  • CRTP and multilevel inheritance
  • How to implement timer for each object in c++?
  • Stuck when testing custom list class in c++
  • Using each member of class within one function to calculate average
  • check whether if two squares are intersecting with each other
  • Glm Quaternion lookat function
  • Is there guarantee heap allocated block address will not change(implicitly)?
  • Cosine Calculation without cmath library
  • Manually deleting a pointer returned by function
  • Avoid output interleaving
  • C++ error : Expected an identifier
  • Segmentation fault when I call operator new in linux mint
  • Recursively Solving A Sudoku Puzzle Using Backtracking Theoretically
  • lambda closure type and default argument in function template
  • shadow
    Privacy Policy - Terms - Contact Us © ourworld-yourmove.org