DS Cleaner Package¶
dscleaner.IFileInfo class¶
-
class
dscleaner.ifileinfo.
IFileInfo
(file)[source]¶ Bases:
abc.ABC
Interface which must be implemented if you want to support your own filetype; Used as an argument to FileWriter, FileUtil and FileMerger.
-
addSamples
(samples)[source]¶ Adds samples given by
samples
.Parameters: samples – An array containing samples. It must be shaped like (n,c) where c is the number of channels.
-
close
()[source]¶ Defines the behavior the class should have when leaves the context manager.
If a file descriptor is being used you should always define the close method.
-
dscleaner.CSVFileInfo class¶
-
class
dscleaner.csvfileinfo.
CsvFileInfo
(samples, samplerate)[source]¶ Bases:
dscleaner.ifileinfo.IFileInfo
CsvFileInfo is used when there is no actual file, but an array.
The array must be shaped in (n,c) where c is the number of channels.
dscleaner.FileInfo class¶
-
class
dscleaner.fileinfo.
FileInfo
(path)[source]¶ Bases:
dscleaner.ifileinfo.IFileInfo
Defines the class to manipulate soundfiles.
Receives a path to a file.
Copies the file to a temporary location.
Gets edited through the FileUtil.
FileWriter converts and writes to another location.
Note
- close method MUST always be called or else the temporary file stays in disk.
with
statements should be used in order to close the files automatically.
-
addSamples
(samples)[source]¶ Appends the samples to the file.
Similar to ´setSamples()´ but appends instead of truncating. See the base class
ifileinfo
.
-
close
()[source]¶ Must always be called or the file won’t be accessible by other processes AND the temp file will stay in disk. See the base class
ifileinfo
.
-
getSamples
()[source]¶ Reads all of the samples in the file
Returns: numpy array containing the samples.
dscleaner.FileUtil class¶
-
class
dscleaner.fileutil.
FileUtil
(f)[source]¶ Bases:
object
FileUtil class is where the dataset manipulation occur.
The class should be instantiated with a
with
statement.Parameters: f – a IFileInfo specialization must be supplied! -
fix_duration
(expected_duration, grid_rate=50)[source]¶ Fixes the file to the expected duration.
Parameters: - expected_duration – Duration the file should have in minutes.
- grid_rate – frequency of the grid in hertz, this is used to discover the wave signal in order to upsample.
-
resample
(new_framerate, method='kaiser_fast')[source]¶ Resamples the data to the new framerate using librosa resample.
Parameters: - data – numpy.array shaped like (num_frames,num_channels) is expected to receive the soundfile.getSamples() not the transposed array.
- original_framerate – the original framerate the data array uses.
- new_framerate – the new framerate that data will be resampled to.
- method – Methods that librosa accepts are also accepted here, uses kaiser_fast by default.
-
standardize
(*dividers)[source]¶ This method transforms the values to fit between -1 and 1, in order to be used in soundfiles.
If the source file isn’t a soundfile the target file will not be well formated, hence you should run this method to make the file well formated.
Parameters: *dividers – The number which each channel will be divided by in order to standardize that channel. Note
In order to maintain consistency throughout the dataset it is advised that the divider chosen for each channel to be a bit higher than the max value. It is also advised to keep record of the divider for each channel for future unstardartization.
- Example:
- Max amplitude is 75 divider chosen: 90.
Returns: - A tuple with the dividers used to standardize.
- Example:
- (40,30,30) in a three channel file.
You should keep these values for future reference.
-
dscleaner.FileWriter class¶
-
class
dscleaner.filewriter.
FileWriter
(file, mode='w')[source]¶ Bases:
object
Writes to a file.
The class should be instantiated with a
with
statement.Parameters: - file – Accepts either a FileUtil, or IFileInfo Specialization.
- mode – Allows for w for writing or a for appending.
-
create_file
(new_filepath, samplerate=None)[source]¶ Creates a new file with the extension given in
new_filepath
.If the source file isn’t a soundfile the target file will not be well formated.
In order to normalize, you should run
FileUtil.standardize
method before.Parameters: - - The diretory and name the new file will have, (new_filepath) – it will convert based on file extension.
- samplerate (Optional) – if not supplied it will use the own file samplerate.
-
create_file_EMDDF
(new_filepath, json_file, samplerate=None)[source]¶ Creates a soundfile with the EMD-DF format, recurs to the
pyemddf
packageNote
Only works on wave and wave64 files.
Parameters: - samplerate (optional) – Samplerate of the file.
- json_file – a JSON file with the metadata fields, you can get a template
- it by executing pyemddf.create_template_file() (for) –
dscleaner.Merger class¶
-
class
dscleaner.merger.
Merger
(channels, path, samplerate, cutoff=None, mode='a')[source]¶ Bases:
object
Merger allows for creation of an empty soundfile to store multiple datasets easily.
Note
W64 filetype is recommended, given it can store up to 18 exabytes of data.
Parameters: - channels – Number of channels the files parsed should have.
- path – the path where the new merger file should be created.
- samplerate – samplerate to write on the file.
- cutoff – how often should the file be written NOT IMPLEMENTED (eg. for each 1024MB of data reached a new file is created)
- mode – either ‘a’ or ‘w’ if the file should be appended or truncated, respectively. Default behavior: append
dscleaner.Splitter class¶
-
class
dscleaner.splitter.
Splitter
(channels, path, samplerate, max_length)[source]¶ Bases:
object
Splitter allows splitting an existing file.
Parameters: - channels – Number of channels the files parsed should have.
- path – the path where the new merger file should be created.
- samplerate – samplerate to write on the file.
- max_length – Maximum file length in minutes.
dscleaner.Utils module¶
-
dscleaner.utils.
path_splitter
(path)[source]¶ Cleans extra / characters, splits the path in 4 parts: See example
Parameters: path – Receives a path Returns: - dictionary with the following keys:
- {full_path, path, file, file_name, extension}
Return type: tuple Example
>>> path.splitter('C:/Data/example.wav/') { 'full_path':'C:/Data/example.wav', 'path':'C:/Data/', 'file':'example', 'file_name':'example.wav', 'extension':'wav' }