logo
down
shadow

how can I complete the text classification task using less memory


how can I complete the text classification task using less memory

By : ayesh
Date : November 14 2020, 04:51 PM
To fix the issue you can do The main problem you're facing is that you're using far too many features. It's actually quite extraordinary that you've managed to generate 542401 features from documents that contain just 400 words! I've seen SVM classifiers separate spam from non-spam with high accuracy using just 150 features -- word counts of selected words that say a lot about whether the document is spam. These use stemming and other normalization tricks to make the features more effective.
You need to spend some time thinning out your features. Think about which features are most likely to contain information useful for this task. Experiment with different features. As long as you keep throwing everything but the kitchen sink in, you'll get memory errors. Right now you're trying to pass 10000 data points with 542401 dimensions each to your SVM. That's 542401 * 10000 * 4 = 21 gigabytes (conservatively) of data. My computer only has 4 gigabytes of RAM. You've got to pare this way down.1
code :


Share : facebook icon twitter icon
C# not releasing memory after task complete

C# not releasing memory after task complete


By : mi777
Date : March 29 2020, 07:55 AM
fixed the issue. Will look into that further Okay, I've been following this...I think there are a couple issues, some of which people have touched on, but I think not answering the real question (which, admittedly, took me a while to recognize, and I'm not sure I'm answering what you want even now.)
Best scikit classifier for text classification task

Best scikit classifier for text classification task


By : nupur agrawal
Date : March 29 2020, 07:55 AM
I wish did fix the issue. The problem is not with the classifier, it is with the vectorizer. TfidfVectorizer has a parameter token_pattern : string, which is a "Regular expression denoting what constitutes a “token”, only used if tokenize == ‘word’. The default regexp select tokens of 2 or more letters characters (punctuation is completely ignored and always treated as a token separator)." (emphasis added). The tokenizer throws out the word i, resulting in an empty document. Naive Bayes then classifies that as class 1, because this is the most frequent class in the training data.
Depending on the data, you might want to consider using a uniform prior for Naive Bayes.
Text classification scheme for a classification task with 120 classes

Text classification scheme for a classification task with 120 classes


By : Kevin 8Ball Pool
Date : March 29 2020, 07:55 AM
wish helps you I assume that classes are not overlapping (that is, exactly one class per message).
A useful approach in the case of imbalanced classes is using asymetric miss-classification costs in order to enforce the classifier to focus on the less represented class, as its cost is assigned much bigger figure than other classes.
NLP data preparation and sorting for text-classification task

NLP data preparation and sorting for text-classification task


By : Aggy Tank
Date : March 29 2020, 07:55 AM
I wish this help you You can try OneVsAll / OneVsRest strategy. This will allow you to do both: predict exact one category without the need to strictly assign one label.
Which model (GPT2, BERT, XLNet and etc) would you use for a text classification task? Why?

Which model (GPT2, BERT, XLNet and etc) would you use for a text classification task? Why?


By : shineneo1
Date : October 03 2020, 05:00 AM
it helps some times It highly depends on your dataset and is part of the data scientist's job to find which model is more suitable for a particular task in terms of selected performance metric, training cost, model complexity etc.
When you work on the problem you will probably test all of the above models and compare them. Which one of them to choose first? Andrew Ng in "Machine Learning Yearning" suggest starting with simple model so you can quickly iterate and test your idea, data preprocessing pipeline etc.
Related Posts Related Posts :
  • ModuleNotFoundError: No module named 'users'
  • Interpolating with multiple y-values
  • Import warning PACKAGE.egg is added to sys.path
  • Is there a key for the default namespace when creating dictionary for use with xml.etree.ElementTree.findall() in Python
  • Using fill_between() with a Pandas Data Series
  • How to build a lookup table for tri-linear interpolation in NumPy?
  • Matrix vector multiplication along array axes
  • Can a cookiejar object be pickled?
  • __init__.py in project folder breaks nose tests
  • Comparing times with sub-second accuracy
  • advanced search using HayStack + Solr in Django?
  • Base test case class for python unittest
  • The PyData Ecosystem
  • Finding unique entries with oldest time stamp
  • Custom filesize format with Python Humanize?
  • Use `tf.image.resize_image_with_crop_or_pad` to resize numpy array
  • Sum number of occurences of string per row
  • Calculating 'Diagonal Distance' in 3 dimensions for A* path-finding heuristic
  • porting PyGST app to GStreamer1.0 + PyGI
  • Connection refused in Tornado test
  • How much time does take train SVM classifier?
  • Turning a string into list of positive and negative numbers
  • Python lists get specific length of elements from index
  • python.exe version 3.3.2 64 & 32 crash while creating .exe file on win 7 64 & 32 with cx_Freeze
  • Efficient nearest neighbour search for sparse matrices
  • django filter_horizontal can't display
  • How to install FLANN and pyflann on Windows
  • How can I plot the same figure standalone and in a subplot in Matplotlib?
  • read-only cells in ipython notebook
  • filling text file with dates
  • error:AttributeError: 'super' object has no attribute 'db_type' when run "python manage.py syncdb" in django
  • python imblearn make_pipeline TypeError: Last step of Pipeline should implement fit
  • Write to csv: columns are shifted when item in row is empty (Python)
  • DuckDuckGo search returns 'List Index out of range'
  • Python function which can transverse a nested list and print out each element
  • Python installing xlwt module error
  • Python mayavi: Adding points to a 3d scatter plot
  • Making a basic web scraper in Python with only built in libraries - Python
  • How to calculate the angle of the sun above the horizon using pyEphem
  • Fix newlines when writing UTF-8 to Text file in python
  • How to convert backward slash command in python to run on Linux
  • PyCharm Code Inspection doesn't include PEP 8
  • How can I use Python namedtuples to insert rows into mySQL database
  • Increase / Decrease Mac Address in Python from String
  • Scrollable QLabel image in PyQt5
  • (Python 2.7) Access variable from class with accessor/mutator
  • Why does "from [Module] import [Something]" takes more time than "import [Module"
  • jira python oauth: how to get the parameters for authentication?
  • Python - How to specify a relative path by jumping a subdirectory?
  • Extract scientific number from string
  • Scrapy: Python cannot find the spider
  • get the values in a given radius from numpy array
  • Is it possible to duplicate a pipe in Python, so that it has one write end but two read ends?
  • Why does wget use Firefox cookies to login on an authenticated webpage?
  • python import behaviour: different objects from same file?
  • Create YoY Graph with Matplotlib
  • Safe use of eval() or alternatives - python
  • Unix change desktop background seamlessly
  • Profiling Python code that uses multiprocessing?
  • How to query a database after render_template
  • shadow
    Privacy Policy - Terms - Contact Us © ourworld-yourmove.org