Questions tagged [sampling]
In signal processing, sampling is the reduction of a continuous signal to a discrete signal. In statistics, sampling is the selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population.
sampling
1,610
questions
93
votes
13
answers
75k
views
Take n random elements from a List<E>?
How can I take n random elements from an ArrayList<E>? Ideally, I'd like to be able to make successive calls to the take() method to get another x elements, without replacement.
79
votes
2
answers
28k
views
What does replacement mean in numpy.random.choice?
Here explains the function numpy.random.choice. However, I am confused about the third parameter replace. What is it? And in which case will it be useful? Thanks!
58
votes
8
answers
23k
views
Algorithms for determining the key of an audio sample
I am interested in determining the musical key of an audio sample. How would (or could) an algorithm go about trying to approximate the key of a musical audio sample?
Antares Autotune and Melodyne ...
55
votes
1
answer
3k
views
Abysmal OpenCL ImageSampling performance vs OpenGL TextureSampling
I've recently ported my volumeraycaster from OpenGL to OpenCL, which decreased the raycaster's performance by about 90 percent. I tracked the performance decrease to the OpenCL's imagesampling ...
48
votes
5
answers
108k
views
Random Sample of a subset of a dataframe in Pandas
I have a pandas DataFrame with 100,000 rows and want to split it into 100 sections with 1000 rows in each of them.
How do I draw a random sample of certain size (e.g. 50 rows) of just one of the 100 ...
40
votes
12
answers
20k
views
How to generate a random 4 digit number not starting with 0 and having unique digits?
This works almost fine but the number starts with 0 sometimes:
import random
numbers = random.sample(range(10), 4)
print(''.join(map(str, numbers)))
I've found a lot of examples but none of them ...
40
votes
1
answer
52k
views
What are chunks, samples and frames when using pyaudio
After going through the documentation of pyaudio and reading some other articles on the web, I am confused if my understanding is correct.
This is the code for audio recording found on pyaudio's site:...
38
votes
8
answers
98k
views
Stratified random sampling from data frame
I have a data frame in the format:
head(subset)
# ants 0 1 1 0 1
# age 1 2 2 1 3
# lc 1 1 0 1 0
I need to create new data frame with random samples according to age and lc. For example I want ...
27
votes
3
answers
866
views
FloatingPointError from PyMC in sampling from a Dirichlet distribution
After being unsuccessful in using decorators to define the stochastic object of the "logarithm of an exponential random variable", I decided to manually write the code for this new distribution using ...
22
votes
1
answer
32k
views
How to draw waveform of Android's music player? [closed]
one of the default live wallpapers that came with my phone was a wallpaper that displayed the wave form of music playing in the background in real time. I was wondering how one could go about doing ...
22
votes
3
answers
3k
views
Use R to Randomly Assign of Participants to Treatments on a Daily Basis
The Problem:
I am attempting to use R to generate a random study design where half of the participants are randomly assigned to "Treatement 1" and the other half are assigned to "Treatment 2". ...
19
votes
1
answer
13k
views
sample random point in triangle [closed]
Suppose you have an arbitrary triangle with vertices A, B, and C. This paper (section 4.2) says that you can generate a random point, P, uniformly from within triangle ABC by the following convex ...
17
votes
5
answers
19k
views
Random Sampling from Mongo
I have a mongo collection with documents. There is one field in every document which is 0 OR 1. I need to random sample 1000 records from the database and count the number of documents who have that ...
17
votes
2
answers
7k
views
Is there an algorithm for weighted reservoir sampling? [closed]
Is there an algorithm for how to perform reservoir sampling when the points in the data stream have associated weights?
15
votes
3
answers
30k
views
How to perform under sampling in scikit learn?
We have a retinal dataset wherein the diseased eye information constitutes 70 percent of the information whereas the non diseased eye constitutes the remaining 30 percent.We want a dataset wherein the ...
15
votes
6
answers
8k
views
How to keep a random subset of a stream of data?
I have a stream of events flowing through my servers. It is not feasible for me to store all of them, but I would like to periodically be able to process some of them in aggregate. So, I want to ...
15
votes
3
answers
30k
views
How to do a random stratified sampling with Python (Not a train/test split)?
I am looking for the best way to do a random stratified sampling like survey and polls. I don't want to do a sklearn.model_selection.StratifiedShuffleSplit since I am not doing a supervised learning ...
14
votes
4
answers
3k
views
Random sampling to give an exact sum
I want to sample 140 numbers between 1000 to 100000 such that the sum of these 140 numbers is around 2 million (2000000):
sample(1000:100000,140)
such that:
sum(sample(1000:100000,140)) = 2000000
...
14
votes
6
answers
16k
views
Select cells randomly from NumPy array - without replacement
I'm writing some modelling routines in NumPy that need to select cells randomly from a NumPy array and do some processing on them. All cells must be selected without replacement (as in, once a cell ...
14
votes
1
answer
6k
views
Efficiently picking a random element from a chained hash table?
Just for practice (and not as a homework assignment) I have been trying to solve this problem (CLRS, 3rd edition, exercise 11.2-6):
Suppose we have stored n keys in a hash table of size m, with
...
14
votes
8
answers
20k
views
OpenCV, how to use arrays of points for smoothing and sampling contours?
I have a problem to get my head around smoothing and sampling contours in OpenCV (C++ API).
Lets say I have got sequence of points retrieved from cv::findContours (for instance applied on this this ...
13
votes
4
answers
30k
views
Stratified splitting of pandas dataframe into training, validation and test set
The following extremely simplified DataFrame represents a much larger DataFrame containing medical diagnoses:
medicalData = pd.DataFrame({'diagnosis':['positive','positive','negative','negative','...
13
votes
1
answer
40k
views
Taking a disproportionate sample from a dataset in R
If I have a large dataset in R, how can I take random sample of the data taking into consideration the distribution of the original data, particularly if the data are skewed and only 1% belong to a ...
13
votes
4
answers
5k
views
Profiling a (possibly I/O-bound) process to reduce latency
I want to improve the performance of a specific method inside a larger application.
The goal is improving latency (wall-clock time spent in a specific function), not (neccessarily) system load.
...
12
votes
4
answers
7k
views
Oversampling functionality in Tensorflow dataset API
I would like to ask if current API of datasets allows for implementation of oversampling algorithm? I deal with highly imbalanced class problem. I was thinking that it would be nice to oversample ...
11
votes
2
answers
8k
views
Profilers Instrumenting Vs Sampling
I am doing a study to between profilers mainly instrumenting and sampling.
I have came up with the following info:
sampling: stop the execution of program, take PC and thus deduce were the program is
...
10
votes
5
answers
44k
views
Latin hypercube sampling with python
I would like to sample a distribution defined by a function in multiple dimensions (2,3,4):
f(x, y, ...) = ...
The distributions might be ugly, non standard (like a 3D spline on data, sum of ...
10
votes
2
answers
16k
views
Android: startRecording() called on an uninitialized AudioRecord when SAMPLERATE set to 44100
I get an error, when I set the sampling rate to 44100 for the AudioRecord object. When it's 22050 it works fine.
02-16 10:45:45.099 24021-24021/com.vlad.jackcomms E/AudioRecord﹕ frameCount 1024 < ...
10
votes
1
answer
17k
views
How to sample on condition with pandas?
I hava a dataframe df like the following:
Col1 Col2
0 1 T
1 1 B
2 3 S
3 2 A
4 1 C
5 2 A
etc...
I would like to create two dataframes: ...
9
votes
2
answers
30k
views
Audio samples per second?
I am wondering on the relationship between a block of samples and its time equivalent. Given my rough idea so far:
Number of samples played per second = total filesize / duration.
So say, I have a 1....
9
votes
2
answers
4k
views
Audio sample frequency rely on channels?
If you have audio encoded at 44100Hz that means you have 44100 samples per second. Does this mean 44100 samples/sec for a channel, or for all channels?
For example if a song is stereo and encoded at ...
9
votes
3
answers
3k
views
Sampling from MultiIndex DataFrame
I'm working with the following panel data in a MultiIndex pandas DataFrame called df_data:
y x
n time
0 0 0.423607 -0.307983
1 0.565563 -0....
9
votes
2
answers
7k
views
How to equidistant resample a line (or curve)?
I have a line l_1 given with a point series p_1,...,p_n. I now want a new line l_2 having k points: q_1,...,q_k. But for all i \in {1,...,k-1}: abs( q_i - q_i+1 ) = const, meaning the segments of l_2 ...
9
votes
4
answers
1k
views
How to select points at a regular density
how do I select a subset of points at a regular density? More formally,
Given
a set A of irregularly spaced points,
a metric of distance dist (e.g., Euclidean distance),
and a target density d,
how ...
9
votes
2
answers
8k
views
Android MediaRecorder Sampling Rate and Noise
I have an issue using Android's MediaRecorder to record sound from microphone to .m4a files (AAC-LC, MPEG-4 container). Starting from API level 18, the default sampling rate drops from 44.1 or 48 kHz ...
9
votes
1
answer
12k
views
librosa.load() takes too long to load(sample) mp3 files
I am trying to sample (convert analog to digital) mp3 files via the following Python code using the librosa library, but it takes too much time (around 4 seconds for one file). I suspect this is ...
9
votes
3
answers
3k
views
what is the difference between sampled_softmax_loss and nce_loss in tensorflow?
i notice there are two functions about negative Sampling in tensorflow to compute the loss (sampled_softmax_loss and nce_loss). the paramaters of these two function are similar, but i really want to ...
8
votes
2
answers
1k
views
Why does random sampling scale with the dataset not the sample size? (pandas .sample() example)
When sampling randomly from distributions of varying sizes I was surprised to observe that execution time seems to scale mostly with the size of the dataset being sampled from, not the number of ...
8
votes
1
answer
3k
views
Pandas: Sampling from a DataFrame according to a target distribution
I have a Pandas DataFrame containing a dataset D of instances drawn from a distribution x. x may be a uniform for example.
Now, I want to draw n samples from D, sampled according to some new ...
8
votes
2
answers
2k
views
How do I Sample each group from a pandas data frame at different rates
I have a data frame containing information about a population that i wish to generate a sample from. I also have a dataframe sample_info that details how many units of each group in the population ...
8
votes
2
answers
2k
views
How to sample/partition panel data by individuals( preferably with caret library)?
I would like to partition panel data and preserve the panel nature of the data:
library(caret)
library(mlbench)
#example panel data where id is the persons identifier over years
...
8
votes
0
answers
888
views
Sampling from a joint distribution in Pyro
I understand how to sample from multidimensional categorical, or multivariate normal (with dependence within each column). For example, for a multivariate categorical, this can be done as below:
...
7
votes
3
answers
2k
views
Stratified sampling on factor
I have a dataset of 1000 rows with the following structure:
device geslacht leeftijd type1 type2
1 mob 0 53 C 3
2 tab 1 64 G 7
3 pc ...
7
votes
1
answer
12k
views
"incorrect number of probabilities" error using sample()
I was trying sample(), however whenever I used custom probability in it ,it constantly displays "incorrect number of probabilities"
I've tried pretty much everything but still stuck. Kindly guide me ...
7
votes
2
answers
17k
views
Why set.seed() affects sample() in R
I always thought set.seed() only makes random variable generators (e.g., rnorm) to generate a unique sequence for any specific set of input values.
However, I'm wondering, why when we set the set.seed(...
7
votes
7
answers
634
views
name of algorithm related to load balancing / re-distribution
Given an array [x1, x2, x3, ..., xk ] where xi is the number of items in box i,
how can I redistribute the items so that no box contains more than N items. N is close to sum(xi)/k -- That is, N is ...
7
votes
2
answers
9k
views
Sampling from a given probability distribution using R
Given the probability distribution as follows:
x-coordinate represents hours, y-coordinate means the probability for each hour.
The problem is how to generate a set of 1000 random data that follows ...
7
votes
1
answer
13k
views
How to draw N random samples from a vector in R?
I have a vector with 663 elements. I would like to create random samples from the vector equal to the length of the vector (i.e. 663). Said differently, I would like to take random samples from all ...
7
votes
1
answer
2k
views
How to repeat 1000 times this random walk simulation in R?
I'm simulating a one-dimensional and symmetric random walk procedure:
y[t] = y[t-1] + epsilon[t]
where white noise is denoted by epsilon[t] ~ N(0,1) in time period t. There is no drift in this ...
7
votes
5
answers
5k
views
Randomly sampling unique subsets of an array
If I have an array:
a = [1,2,3]
How do I randomly select subsets of the array, such that the elements of each subset are unique? That is, for a the possible subsets would be:
[]
[1]
[2]
[3]
[1,2]
[...