logo
down
shadow

Reading csv files with messy structure with pandas


Reading csv files with messy structure with pandas

By : Malinda McCurdy
Date : November 18 2020, 11:13 AM
hop of those help? One way -- inefficient, but effective -- is to make more space than you'll need:
code :
>>> df = pd.read_csv("knop.csv", names=range(6))
>>> df
      0     1     2    3   4   5
0  Col1  Col2  Col3  NaN NaN NaN
1     a     b     c  NaN NaN NaN
2     a     b     c  NaN NaN NaN
3     a     a     b    c NaN NaN
4     a     b     c    c NaN NaN
5     a     b     c  NaN NaN NaN
>>> df = df.dropna(axis=1,how='all')
>>> df
      0     1     2    3
0  Col1  Col2  Col3  NaN
1     a     b     c  NaN
2     a     b     c  NaN
3     a     a     b    c
4     a     b     c    c
5     a     b     c  NaN


Share : facebook icon twitter icon
PHP-Structure overview messy project

PHP-Structure overview messy project


By : Hector
Date : March 29 2020, 07:55 AM
This might help you I don't know if this will truly help but it's too big for a comment..
There is a program called Doxygen which can generate documentation from php source files, now bear with me..
Reading in 'messy' looking XML file into R

Reading in 'messy' looking XML file into R


By : Caron
Date : March 29 2020, 07:55 AM
I wish did fix the issue. You really shld read up on namespaces in XML and how they work in R and also XPath in general. Also, xml2 is a newer XML pkg and has some nice features you should look into.
code :
library(xml2)

# read the doc
doc <- read_xml("http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData?$filter=year(NEW_DATE)%20eq%202005")

# libxml2 + R == "meh" handling of default namespaces
ns <- xml_ns_rename(xml_ns(doc), d1="default")

# all the info is in the properties tag so focus on it
props <- xml_find_all(doc, "//default:entry/default:content/m:properties", ns)

# lots of ways to extract, but this data is "regular" enough to take a
# rather simplistic approach. Extract all the node values which will be 
# separated by newlines. Convert newlines to tabs, trim the whole thing
# and read it in as a table.
dat <- read.table(text=trimws(gsub("\n", "\t", unlist(lapply(props, xml_text)))), 
                  sep="\t", stringsAsFactors=FALSE)

# column names wld be good so build those from one property node
colnames(dat) <- xml_name(xml_children(props[[1]]))

# boom: done
str(dat)
## 'data.frame': 250 obs. of  14 variables:
##  $ Id              : int  3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 ...
##  $ NEW_DATE        : chr  "        2005-11-14T00:00:00" "        2005-11-10T00:00:00" "        2005-11-15T00:00:00" "        2005-11-17T00:00:00" ...
##  $ BC_1MONTH       : num  3.93 3.89 4.01 3.98 4 ...
##  $ BC_3MONTH       : num  4.02 3.97 4.01 4.01 4 ...
##  $ BC_6MONTH       : num  4.35 4.3 4.34 4.3 4.3 ...
##  $ BC_1YEAR        : num  4.4 4.34 4.38 4.32 4.34 ...
##  $ BC_2YEAR        : num  4.5 4.44 4.47 4.37 4.42 ...
##  $ BC_3YEAR        : num  4.52 4.48 4.5 4.39 4.43 ...
##  $ BC_5YEAR        : num  4.54 4.49 4.51 4.39 4.43 ...
##  $ BC_7YEAR        : num  4.57 4.51 4.52 4.42 4.45 ...
##  $ BC_10YEAR       : num  4.61 4.55 4.56 4.46 4.49 ...
##  $ BC_20YEAR       : num  4.9 4.85 4.83 4.75 4.77 ...
##  $ BC_30YEAR       : logi  NA NA NA NA NA NA ...
##  $ BC_30YEARDISPLAY: int  0 0 0 0 0 0 0 0 0 0 ...
Reading messy JSON using php

Reading messy JSON using php


By : Alvin De Cruz
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , This piece of code should help you. Implement your own logic, when status is empty.
code :
<?php

$jsonString = '{
        "capacity_test": {
          "date": "2017-03-01",
          "status": "done",
          "PROPERTIES": {
            "fail": {
              "capacity_test": {
                "date": "2017-03-02",
                "status": "done",
                "PROPERTIES": {
                  "boolean": false
                }
              },
              "def": [
                {
                  "drop_test": {
                    "Properties": {
                      "date": "2017-03-05",
                      "status": "done"
                    }
                  },
                  "waves_test": {
                    "date": "2018-03-06",
                    "status": "done"
                  }
                },
                {
                  "drop_test": {
                    "Properties": null
                  },
                  "waves_test": {
                    "date": "2018-03-06",
                    "status": "done"
                  }
                },
                {
                  "drop_test": {
                    "Properties": null
                  },
                  "waves_test": {
                    "date": "2018-03-06",
                    "status": ""
                  }
                }
              ]
            },
            "final_test": {
              "Properties": null
            }
          }
        }
      }';

$ar = json_decode($jsonString);

function recArr($array) {
    foreach ($array as $k => $v) {
        if (is_array($v) || is_object($v)) {
            recArr($v);
        } else {
            if ($k == 'status' && $v == '') {
                // some empty logic
                echo $k;
            }
        }
    }
}

var_dump(recArr($ar));
Reading a messy CSV file

Reading a messy CSV file


By : Ted Goggleye
Date : November 15 2020, 04:01 AM
this will help
If after a comma there is a space after the performer, it works fine. But if there isnt one it just prints the groupName.
Pandas - Reading multiple excel files into a single pandas Dataframe

Pandas - Reading multiple excel files into a single pandas Dataframe


By : user3487567
Date : March 29 2020, 07:55 AM
wish of those help Rather than create the pd.DataFrame based on the list, use pd.concat to concatenate them, i.e.
code :
file = pd.concat(list_)
Related Posts Related Posts :
  • Return new instance of subclass when using methods inherited from parent class in Python
  • Which function in django.contrib.auth creates the default model permissions?
  • Formatting text in tabular form with Python
  • How to determine the first day of a month in Python
  • Error while converting date to timestamp in python
  • Python string iterations
  • Is there any file number limitation when you select multiple files with wxFileDialog?
  • Errors with Matplotlib when making an executable with Py2exe (Python)
  • Django Haystack - Indexing single field
  • Go Pro Hero 3 - Streaming video over wifi
  • Appending a column in .csv with Python/Pandas
  • How to change my result directory in Robot framework using RIDE?
  • problem with using pandas to manipulate a big text file in python
  • python-magic module' object has no attribute 'open'
  • Where goes wrong for this High Pass Filter in Python?
  • Why inserting keys in order into a python dict is faster than doint it unordered
  • flann index saving in python
  • Create new instance of list or dictionary without class
  • How can I easily convert FORTRAN code to Python code (real code, not wrappers)
  • Address of lambda function in python
  • Python adding space between characters in string. Most efficient way
  • python http server, multiple simultaneous requests
  • Disguising username & password on distributed python scripts
  • Post GraphQL mutation with Python Requests
  • Why doesnt pandas create an excel file?
  • Rolling comparison between a value and a past window, with percentile/quantile
  • How to avoid repetitive code when defining a new type in python with signature verification
  • How to configure uWSGI in order to debug with pdb (--honour-stdin configuration issue)
  • In Python, how do you execute objects that are functions from a list?
  • Python- Variable Won't Subtract?
  • Processing Power In Python
  • Python 2.7.2 - Cannot import name _random or random from sys
  • Why doesn't the Python sorted function take keyword order instead of reverse?
  • Make a function redirect to other functions depending on a variable
  • get_absolute_url in django-categories
  • Monitoring non-Celery background task with New Relic in Python
  • Feature selection with LinearSVC
  • LSTM - Predicting the same constant values after a while
  • Test the length of elements in a list
  • Django: render radiobutton with 3 columns, cost column must change according to size & quantity selected
  • Python class attributes vs global variable
  • sys.stdout.writelines("hello") and sys.stdout.write("hello")
  • is ndarray faster than recarray access?
  • Python - search through directory trees, rename certain files
  • GAE: How to build a query where a string begins with a value
  • TypeError: __init__() takes at least 2 arguments (1 given)
  • Overriding and customizing "django.contrib.auth.views.login"
  • Django : Redirect to a particular page after login
  • Python search and copy files in directory
  • pretty printing numpy ndarrays using unicode characters
  • Frequent pattern mining in Python
  • How can I make a set of functions that can be used synchronously as well as asynchronously?
  • Convert one dice roll to two dice roll
  • count occourrence in a list
  • Writing an If condition to filter out the first word
  • to read file and compare column in python
  • Install python-numpy in the Virtualenv environment
  • `.select_by_visible_text()` is failed to select element?
  • Unable to send data multiple requests in a single connection — socket error
  • Pandas HDFStore unload dataframe from memory
  • shadow
    Privacy Policy - Terms - Contact Us © ourworld-yourmove.org