Questions tagged [text-parsing]

Text parsing is a variation of parsing which refers to the action of breaking a stream of text into different components, and capturing the relationship between those components.

text-parsing
Filter by
Sorted by
Tagged with
2327 votes
21 answers
2.5m views

How to delete from a text file, all lines that contain a specific string?

How would I use sed to delete all lines in a text file that contain a specific string?
A Clockwork Orange's user avatar
892 votes
21 answers
723k views

How to convert string representation of list to a list

I was wondering what the simplest way is to convert a string representation of a list like the following to a list: x = '[ "A","B","C" , " D"]' Even in cases ...
harijay's user avatar
  • 11.6k
115 votes
6 answers
319k views

Split pipe-delimited values of a flat array into an associative array

//go through each question foreach($file_data as $value) { //separate the string by pipes and place in variables list($category, $question) = explode('|', $value); //place in assoc array $...
Phil's user avatar
  • 11.1k
105 votes
27 answers
108k views

Split string containing command-line parameters into string[] in C#

I have a single string that contains the command-line parameters to be passed to another executable and I need to extract the string[] containing the individual parameters in the same way that C# ...
80 votes
4 answers
134k views

Get integer value from malformed query string

I'm looking for an way to parse a substring using PHP, and have come across preg_match however I can't seem to work out the rule that I need. I am parsing a web page and need to grab a numeric value ...
MonkeyBlue's user avatar
  • 2,234
76 votes
43 answers
21k views

Evaluating a string of simple mathematical expressions [closed]

Challenge Here is the challenge (of my own invention, though I wouldn't be surprised if it has previously appeared elsewhere on the web). Write a function that takes a single argument that is a ...
72 votes
4 answers
99k views

Difference between parsing a text file in r and rb mode

What makes parsing a text file in 'r' mode more convenient than parsing it in 'rb' mode? Especially when the text file in question may contain non-ASCII characters.
MxLDevs's user avatar
  • 19.2k
67 votes
2 answers
53k views

What is CoNLL data format?

I am using a open source jar (Mate Parser) which outputs in the CoNLL 2009 format after dependency parsing. I want to use the dependency parsing results for Information Extraction, however, I only ...
swapna sourav rout's user avatar
52 votes
12 answers
68k views

Replace a whole line where a particular word is found in a text file

How can I replace a particular line of text in file using php? I don't know the line number. I want to replace a line containing a particular word.
kishore's user avatar
  • 1,017
48 votes
6 answers
84k views

How to get the first column of every line from a CSV file?

How do get the first column of every line in an input CSV file and output to a new file? I am thinking using awk but not sure how.
Junba Tester's user avatar
37 votes
13 answers
25k views

Can awk deal with CSV file that contains comma inside a quoted field?

I am using awk to perform counting the sum of one column in the csv file. The data format is something like: id, name, value 1, foo, 17 2, bar, 76 3, "I am the, question", 99 I was using this awk ...
maguschen's user avatar
  • 785
36 votes
9 answers
107k views

PHP - parsing a txt file

I have a .txt file that has the following details: ID^NAME^DESCRIPTION^IMAGES 123^test^Some text goes here^image_1.jpg,image_2.jpg 133^hello^some other test^image_3456.jpg,image_89.jpg What I'd like ...
terrid25's user avatar
  • 1,936
34 votes
9 answers
40k views

Python parsing bracketed blocks

What would be the best way in Python to parse out chunks of text contained in matching brackets? "{ { a } { b } { { { c } } } }" should initially return: [ "{ a } { b } { { { c } } }" ] putting ...
Martin's user avatar
  • 12.5k
34 votes
4 answers
32k views

What does NN VBD IN DT NNS RB means in NLTK?

when I chunk text, I get lots of codes in the output like NN, VBD, IN, DT, NNS, RB. Is there a list documented somewhere which tells me the meaning of these? I have tried googling nltk chunk code ...
Knows Not Much's user avatar
34 votes
2 answers
85k views

Match and parse the content of a square-braced placeholder in text

I have a PHP variable ($content) where I need to find a certain pattern that looks like this: [gallery::name/of/the/folder/] I would like to search: - starting with literal characters `[gallery::` - ...
VVV's user avatar
  • 7,583
29 votes
5 answers
40k views

Best way to get all digits from a string [duplicate]

Is there any better way to get take a string such as "(123) 455-2344" and get "1234552344" from it than doing this: var matches = Regex.Matches(input, @"[0-9]+", RegexOptions.Compiled); return ...
Chris Marisic's user avatar
27 votes
6 answers
44k views

Parse the first level keys of $_POST, then use the numeric suffix while looping

I have a form that contains a number of fields with names item1, item2, item13, item43 etc, each time those fields are different because they are populated in the form with AJAX. When user submits I ...
bikey77's user avatar
  • 6,472
26 votes
13 answers
44k views

How should I detect which delimiter is used in a text file?

I need to be able to parse both CSV and TSV files. I can't rely on the users to know the difference, so I would like to avoid asking the user to select the type. Is there a simple way to detect which ...
samiz's user avatar
  • 1,053
25 votes
6 answers
32k views

Split alphanumeric string between leading digits and trailing letters

I have a string like: $Order_num = "0982asdlkj"; How can I split that into the 2 variables, with the number as one element and then another variable with the letter element? The number ...
David19801's user avatar
  • 11.3k
22 votes
2 answers
6k views

Create Great Parser - Extract Relevant Text From HTML/Blogs

I'm trying to create a generalized HTML parser that works well on Blog Posts. I want to point my parser at the specific entrie's URL and get back clean text of the post itself. My basic approach (from ...
user avatar
21 votes
4 answers
23k views

Powershell command to trim path if it ends with "\"

I need to trim path if it ends with \. C:\Ravi\ I need to change to C:\Ravi I have a case where path will not end with \ (Then it must skip). I tried with .EndsWith("\"), but it fails when I ...
Ravichandra's user avatar
  • 2,222
21 votes
9 answers
7k views

Elegant structured text file parsing

I need to parse a transcript of a live chat conversation. My first thought on seeing the file was to throw regular expressions at the problem but I was wondering what other approaches people have used....
russtbarnacle's user avatar
19 votes
4 answers
14k views

Saving nltk drawn parse tree to image file

Is there any way to save the draw image from tree.draw() to an image file programmatically? I tried looking through the documentation, but I couldn't find anything.
John's user avatar
  • 3,067
17 votes
3 answers
9k views

How to find the shortest dependency path between two words in Python?

I try to find the dependency path between two words in Python given dependency tree. For sentence Robots in popular culture are there to remind us of the awesomeness of unbound human agency. I ...
Sean's user avatar
  • 1,161
16 votes
4 answers
5k views

Parse a pipe-delimited string into 2, 3, 4 or 5 variables (depending on the input string)

I have a line like this in my code: list($user_id, $name, $limit, $remaining, $reset) = explode('|', $user); The last 3 parameters may or may not be there. Is there a function similar to list that ...
MikeG's user avatar
  • 1,265
16 votes
11 answers
11k views

Create acronym from a string containing only words

I'm looking for a way that I can extract the first letter of each word from an input field and place it into a variable. Example: if the input field is "Stack-Overflow Questions Tags Users" then the ...
dmschenk's user avatar
  • 379
16 votes
4 answers
27k views

Python: Read configuration file with multiple lines per key

I am writing a small DB test suite, which reads configuration files with queries and expected results, e.g.: query = "SELECT * from cities WHERE name='Unknown';" count = 0 level ...
Adam Matan's user avatar
  • 132k
16 votes
3 answers
34k views

How do I keep a Scanner from throwing exceptions when the wrong type is entered?

Here's some sample code: import java.util.Scanner; class In { public static void main (String[]arg) { Scanner in = new Scanner (System.in) ; System.out.println ("how many are ...
David's user avatar
  • 14.7k
14 votes
6 answers
5k views

How to transpose the contents of lines and columns in a file in Vim?

I know I can use Awk, but I am on a Windows box, and I am making a function for others that may not have Awk. I also know I can write a C program, but I would love not to have something that requires ...
ojblass's user avatar
  • 21.4k
14 votes
2 answers
5k views

Javascript, Text Annotations and Ideas

I am very curious to hear input from others on a problem I've been contemplating for some time now. Essentially I would like to present a user with a text document and allow him/her to make ...
Mario Zigliotto's user avatar
13 votes
5 answers
14k views

Howto clean comments from raw sql file

I have problem with cleaning comments and empty lines from already existing sql file. The file has over 10k lines so cleaning it manually is not an option. I have a little python script, but I have ...
Szymon Lukaszczyk's user avatar
13 votes
3 answers
16k views

Split & Trim in a single step

In PS 5.0 I can split and trim a string in a single line, like this $string = 'One, Two, Three' $array = ($string.Split(',')).Trim() But that fails in PS 2.0. I can of course do a foreach to trim ...
Gordon's user avatar
  • 6,595
13 votes
4 answers
13k views

Making links clickable in Javascript?

Is there an simple way of turning a string from Then go to http:/example.com/ and foo the bar! into Then go to <a href="http://example.com">example.com</a> and foo the bar! in ...
max's user avatar
  • 29.6k
13 votes
4 answers
12k views

Should I use cut or awk to extract fields and field substrings?

I have a file with pipe-separated fields. I want to print a subset of field 1 and all of field 2: cat tmpfile.txt # 10 chars.|variable length num|text ABCDEFGHIJ|99|U|HOMEWORK JIDVESDFXW|8|C|CHORES ...
user3486154's user avatar
13 votes
4 answers
7k views

NLTK Chunking and walking the results tree

I'm using NLTK RegexpParser to extract noungroups and verbgroups from tagged tokens. How do I walk the resulting tree to find only the chunks that are NP or V groups? from nltk.chunk import ...
Vincent Theeten's user avatar
13 votes
5 answers
2k views

Strategy for parsing natural language descriptions into structured data

I have a set of requirements and I'm looking for the best Java-based strategy / algorthm / software to use. Basically, I want to take a set of recipe ingredients entered by real people in natural ...
Jizzoe's user avatar
  • 131
13 votes
2 answers
8k views

Why is there no std::from_string()?

Why is there no template <typename T> T std::from_string(const std::string& s); in the C++ standard? (Seeing how there's an std::to_string() function, I mean.) PS - If you have an idea ...
einpoklum's user avatar
  • 124k
12 votes
13 answers
1k views

Code Golf: Quickly Build List of Keywords from Text, Including # of Instances

I've already worked out this solution for myself with PHP, but I'm curious how it could be done differently - better even. The two languages I'm primarily interested in are PHP and Javascript, but I'd ...
12 votes
3 answers
15k views

How do I tokenize this string in Ruby?

I have this string: %{Children^10 Health "sanitation management"^5} And I want to convert it to tokenize this into an array of hashes: [{:keywords=>"children", :boost=>10}, {:keywords=>"...
Radamanthus's user avatar
12 votes
8 answers
15k views

Expand array of numbers and hyphenated number ranges to array of integers [duplicate]

I'm trying to normalize/expand/hydrate/translate a string of numbers as well as hyphen-separated numbers (as range expressions) so that it becomes an array of integer values. Sample input: $array = [&...
muffin's user avatar
  • 2,054
12 votes
1 answer
2k views

How can I parse a string to a function in Haskell?

I want a function that looks something like this readFunc :: String -> (Float -> Float) which operates something like this >(readFunc "sin") (pi/2) >1.0 >(readFunc "(+2)") 3.0 >...
user2407038's user avatar
  • 14.6k
11 votes
18 answers
2k views

Convert "1d2h3m" to ["day" => 1, ”hour” => 2,"minutes"=>3]

I am trying to parse a time expression string into an associative array with full-word keys. My input: $time = "1d2h3m"; My desired output: array( "day" => 1, "...
Jack jdeoel's user avatar
  • 4,584
11 votes
2 answers
24k views

Splitting large text file by a delimiter in Python

I imaging this is going to be a simple task but I can't find what I am looking for exactly in previous StackOverflow questions to here goes... I have large text files in a proprietry format that look ...
Kevin's user avatar
  • 1,133
11 votes
2 answers
1k views

Which Perl modules for good for data munging?

Nine years ago when I started to parsing HTML and free text with Perl I read the classic Data Munging with Perl. Does someone know if David is planning to update the book or if there are similar books ...
Pablo Marin-Garcia's user avatar
10 votes
1 answer
10k views

Chunking with rule-based grammar in spacy

I have this simple example of chunking in nltk. My data: data = 'The little yellow dog will then walk to the Starbucks, where he will introduce them to Michael.' ...pre-processing ... data_tok = ...
ben_aaron's user avatar
  • 1,492
9 votes
6 answers
13k views

Extract floating point numbers from a delimited string in PHP

I would like to convert a string of delimited dimension values into floating numbers. For example 152.15 x 12.34 x 11mm into 152.15, 12.34 and 11 and store in an array such that: $dim[0] = 152.15; $...
Tian Bo's user avatar
  • 551
9 votes
8 answers
3k views

Populate array of integers from a comma-separated string of numbers and hyphenated number ranges

I want to translate/hydrate/expand/parse a comma-separated string of integers and hyhenated integer range expressions and populate an array with its equivalent values as individual integers elements. ...
Matthew Higgins's user avatar
9 votes
2 answers
429 views

Parsing ASCII file efficiently in Haskell

I wanted to reimplement some of my ASCII parsers in Haskell since I thought I could gain some speed. However, even a simple "grep and count" is much slower than a sloppy Python implementation. Can ...
tamasgal's user avatar
  • 25.6k
9 votes
2 answers
828 views

How can I wrap the previous, current, and next word inside a tag using jQuery?

Not sure if the title is well chosen... I am trying to simulate text-selection in HTML/JS/CSS to get rid of the action bubble on mobile device when truly selecting texts. To be more specific, I'm ...
Cybrix's user avatar
  • 3,278
8 votes
6 answers
3k views

Iterating over a text file using Fortran like format in C++

I am making an application that deals with txt file data. The idea is that txt files may come in different formats, and it should be read into C++. One example might be 3I2, 3X, I3, which should be ...
PascalVKooten's user avatar

1
2 3 4 5
27