Questions tagged [text-parsing]
Text parsing is a variation of parsing which refers to the action of breaking a stream of text into different components, and capturing the relationship between those components.
text-parsing
1,308
questions
2327
votes
21
answers
2.5m
views
How to delete from a text file, all lines that contain a specific string?
How would I use sed to delete all lines in a text file that contain a specific string?
892
votes
21
answers
723k
views
How to convert string representation of list to a list
I was wondering what the simplest way is to convert a string representation of a list like the following to a list:
x = '[ "A","B","C" , " D"]'
Even in cases ...
115
votes
6
answers
319k
views
Split pipe-delimited values of a flat array into an associative array
//go through each question
foreach($file_data as $value) {
//separate the string by pipes and place in variables
list($category, $question) = explode('|', $value);
//place in assoc array
$...
105
votes
27
answers
108k
views
Split string containing command-line parameters into string[] in C#
I have a single string that contains the command-line parameters to be passed to another executable and I need to extract the string[] containing the individual parameters in the same way that C# ...
80
votes
4
answers
134k
views
Get integer value from malformed query string
I'm looking for an way to parse a substring using PHP, and have come across preg_match however I can't seem to work out the rule that I need.
I am parsing a web page and need to grab a numeric value ...
76
votes
43
answers
21k
views
Evaluating a string of simple mathematical expressions [closed]
Challenge
Here is the challenge (of my own invention, though I wouldn't be surprised if it has previously appeared elsewhere on the web).
Write a function that takes a single
argument that is a
...
72
votes
4
answers
99k
views
Difference between parsing a text file in r and rb mode
What makes parsing a text file in 'r' mode more convenient than parsing it in 'rb' mode?
Especially when the text file in question may contain non-ASCII characters.
67
votes
2
answers
53k
views
What is CoNLL data format?
I am using a open source jar (Mate Parser) which outputs in the CoNLL 2009 format after dependency parsing. I want to use the dependency parsing results for Information Extraction, however, I only ...
52
votes
12
answers
68k
views
Replace a whole line where a particular word is found in a text file
How can I replace a particular line of text in file using php?
I don't know the line number. I want to replace a line containing a particular word.
48
votes
6
answers
84k
views
How to get the first column of every line from a CSV file?
How do get the first column of every line in an input CSV file and output to a new file? I am thinking using awk but not sure how.
37
votes
13
answers
25k
views
Can awk deal with CSV file that contains comma inside a quoted field?
I am using awk to perform counting the sum of one column in the csv file. The data format is something like:
id, name, value
1, foo, 17
2, bar, 76
3, "I am the, question", 99
I was using this awk ...
36
votes
9
answers
107k
views
PHP - parsing a txt file
I have a .txt file that has the following details:
ID^NAME^DESCRIPTION^IMAGES
123^test^Some text goes here^image_1.jpg,image_2.jpg
133^hello^some other test^image_3456.jpg,image_89.jpg
What I'd like ...
34
votes
9
answers
40k
views
Python parsing bracketed blocks
What would be the best way in Python to parse out chunks of text contained in matching brackets?
"{ { a } { b } { { { c } } } }"
should initially return:
[ "{ a } { b } { { { c } } }" ]
putting ...
34
votes
4
answers
32k
views
What does NN VBD IN DT NNS RB means in NLTK?
when I chunk text, I get lots of codes in the output like
NN, VBD, IN, DT, NNS, RB.
Is there a list documented somewhere which tells me the meaning of these?
I have tried googling nltk chunk code ...
34
votes
2
answers
85k
views
Match and parse the content of a square-braced placeholder in text
I have a PHP variable ($content) where I need to find a certain pattern that looks like this:
[gallery::name/of/the/folder/]
I would like to search:
- starting with literal characters `[gallery::`
- ...
29
votes
5
answers
40k
views
Best way to get all digits from a string [duplicate]
Is there any better way to get take a string such as "(123) 455-2344" and get "1234552344" from it than doing this:
var matches = Regex.Matches(input, @"[0-9]+", RegexOptions.Compiled);
return ...
27
votes
6
answers
44k
views
Parse the first level keys of $_POST, then use the numeric suffix while looping
I have a form that contains a number of fields with names item1, item2, item13, item43 etc, each time those fields are different because they are populated in the form with AJAX.
When user submits I ...
26
votes
13
answers
44k
views
How should I detect which delimiter is used in a text file?
I need to be able to parse both CSV and TSV files. I can't rely on the users to know the difference, so I would like to avoid asking the user to select the type. Is there a simple way to detect which ...
25
votes
6
answers
32k
views
Split alphanumeric string between leading digits and trailing letters
I have a string like:
$Order_num = "0982asdlkj";
How can I split that into the 2 variables, with the number as one element and then another variable with the letter element?
The number ...
22
votes
2
answers
6k
views
Create Great Parser - Extract Relevant Text From HTML/Blogs
I'm trying to create a generalized HTML parser that works well on Blog Posts. I want to point my parser at the specific entrie's URL and get back clean text of the post itself. My basic approach (from ...
21
votes
4
answers
23k
views
Powershell command to trim path if it ends with "\"
I need to trim path if it ends with \.
C:\Ravi\
I need to change to
C:\Ravi
I have a case where path will not end with \ (Then it must skip).
I tried with .EndsWith("\"), but it fails when I ...
21
votes
9
answers
7k
views
Elegant structured text file parsing
I need to parse a transcript of a live chat conversation. My first thought on seeing the file was to throw regular expressions at the problem but I was wondering what other approaches people have used....
19
votes
4
answers
14k
views
Saving nltk drawn parse tree to image file
Is there any way to save the draw image from tree.draw() to an image file programmatically? I tried looking through the documentation, but I couldn't find anything.
17
votes
3
answers
9k
views
How to find the shortest dependency path between two words in Python?
I try to find the dependency path between two words in Python given dependency tree.
For sentence
Robots in popular culture are there to remind us of the awesomeness of
unbound human agency.
I ...
16
votes
4
answers
5k
views
Parse a pipe-delimited string into 2, 3, 4 or 5 variables (depending on the input string)
I have a line like this in my code:
list($user_id, $name, $limit, $remaining, $reset) = explode('|', $user);
The last 3 parameters may or may not be there. Is there a function similar to list that ...
16
votes
11
answers
11k
views
Create acronym from a string containing only words
I'm looking for a way that I can extract the first letter of each word from an input field and place it into a variable.
Example: if the input field is "Stack-Overflow Questions Tags Users" then the ...
16
votes
4
answers
27k
views
Python: Read configuration file with multiple lines per key
I am writing a small DB test suite, which reads configuration files with queries and expected results, e.g.:
query = "SELECT * from cities WHERE name='Unknown';"
count = 0
level ...
16
votes
3
answers
34k
views
How do I keep a Scanner from throwing exceptions when the wrong type is entered?
Here's some sample code:
import java.util.Scanner;
class In
{
public static void main (String[]arg)
{
Scanner in = new Scanner (System.in) ;
System.out.println ("how many are ...
14
votes
6
answers
5k
views
How to transpose the contents of lines and columns in a file in Vim?
I know I can use Awk, but I am on a Windows box, and I am making a function for others that may not have Awk. I also know I can write a C program, but I would love not to have something that requires ...
14
votes
2
answers
5k
views
Javascript, Text Annotations and Ideas
I am very curious to hear input from others on a problem I've been contemplating for some time now.
Essentially I would like to present a user with a text document and allow him/her to make ...
13
votes
5
answers
14k
views
Howto clean comments from raw sql file
I have problem with cleaning comments and empty lines from already existing sql file.
The file has over 10k lines so cleaning it manually is not an option.
I have a little python script, but I have ...
13
votes
3
answers
16k
views
Split & Trim in a single step
In PS 5.0 I can split and trim a string in a single line, like this
$string = 'One, Two, Three'
$array = ($string.Split(',')).Trim()
But that fails in PS 2.0. I can of course do a foreach to trim ...
13
votes
4
answers
13k
views
Making links clickable in Javascript?
Is there an simple way of turning a string from
Then go to http:/example.com/ and foo the bar!
into
Then go to <a href="http://example.com">example.com</a> and foo the bar!
in ...
13
votes
4
answers
12k
views
Should I use cut or awk to extract fields and field substrings?
I have a file with pipe-separated fields. I want to print a subset of field 1 and all of field 2:
cat tmpfile.txt
# 10 chars.|variable length num|text
ABCDEFGHIJ|99|U|HOMEWORK
JIDVESDFXW|8|C|CHORES
...
13
votes
4
answers
7k
views
NLTK Chunking and walking the results tree
I'm using NLTK RegexpParser to extract noungroups and verbgroups from tagged tokens.
How do I walk the resulting tree to find only the chunks that are NP or V groups?
from nltk.chunk import ...
13
votes
5
answers
2k
views
Strategy for parsing natural language descriptions into structured data
I have a set of requirements and I'm looking for the best Java-based strategy / algorthm / software to use. Basically, I want to take a set of recipe ingredients entered by real people in natural ...
13
votes
2
answers
8k
views
Why is there no std::from_string()?
Why is there no
template <typename T>
T std::from_string(const std::string& s);
in the C++ standard? (Seeing how there's an std::to_string() function, I mean.)
PS - If you have an idea ...
12
votes
13
answers
1k
views
Code Golf: Quickly Build List of Keywords from Text, Including # of Instances
I've already worked out this solution for myself with PHP, but I'm curious how it could be done differently - better even. The two languages I'm primarily interested in are PHP and Javascript, but I'd ...
12
votes
3
answers
15k
views
How do I tokenize this string in Ruby?
I have this string:
%{Children^10 Health "sanitation management"^5}
And I want to convert it to tokenize this into an array of hashes:
[{:keywords=>"children", :boost=>10}, {:keywords=>"...
12
votes
8
answers
15k
views
Expand array of numbers and hyphenated number ranges to array of integers [duplicate]
I'm trying to normalize/expand/hydrate/translate a string of numbers as well as hyphen-separated numbers (as range expressions) so that it becomes an array of integer values.
Sample input:
$array = [&...
12
votes
1
answer
2k
views
How can I parse a string to a function in Haskell?
I want a function that looks something like this
readFunc :: String -> (Float -> Float)
which operates something like this
>(readFunc "sin") (pi/2)
>1.0
>(readFunc "(+2)") 3.0
>...
11
votes
18
answers
2k
views
Convert "1d2h3m" to ["day" => 1, ”hour” => 2,"minutes"=>3]
I am trying to parse a time expression string into an associative array with full-word keys.
My input:
$time = "1d2h3m";
My desired output:
array(
"day" => 1,
"...
11
votes
2
answers
24k
views
Splitting large text file by a delimiter in Python
I imaging this is going to be a simple task but I can't find what I am looking for exactly in previous StackOverflow questions to here goes...
I have large text files in a proprietry format that look ...
11
votes
2
answers
1k
views
Which Perl modules for good for data munging?
Nine years ago when I started to parsing HTML and free text with Perl I read the classic Data Munging with Perl. Does someone know if David is planning to update the book or if there are similar books ...
10
votes
1
answer
10k
views
Chunking with rule-based grammar in spacy
I have this simple example of chunking in nltk.
My data:
data = 'The little yellow dog will then walk to the Starbucks, where he will introduce them to Michael.'
...pre-processing ...
data_tok = ...
9
votes
6
answers
13k
views
Extract floating point numbers from a delimited string in PHP
I would like to convert a string of delimited dimension values into floating numbers.
For example
152.15 x 12.34 x 11mm
into
152.15, 12.34 and 11
and store in an array such that:
$dim[0] = 152.15;
$...
9
votes
8
answers
3k
views
Populate array of integers from a comma-separated string of numbers and hyphenated number ranges
I want to translate/hydrate/expand/parse a comma-separated string of integers and hyhenated integer range expressions and populate an array with its equivalent values as individual integers elements.
...
9
votes
2
answers
429
views
Parsing ASCII file efficiently in Haskell
I wanted to reimplement some of my ASCII parsers in Haskell since I thought I could gain some speed. However, even a simple "grep and count" is much slower than a sloppy Python implementation.
Can ...
9
votes
2
answers
828
views
How can I wrap the previous, current, and next word inside a tag using jQuery?
Not sure if the title is well chosen...
I am trying to simulate text-selection in HTML/JS/CSS to get rid of the action bubble on mobile device when truly selecting texts.
To be more specific, I'm ...
8
votes
6
answers
3k
views
Iterating over a text file using Fortran like format in C++
I am making an application that deals with txt file data.
The idea is that txt files may come in different formats, and it should be read into C++.
One example might be 3I2, 3X, I3, which should be ...