Data API
The Data API provides functionality to easily read the preprocessed signal data that Nanocompore uses for the analysis. This can be used for custom plotting or other purposes. In general, you just need to load the configuration file used for the analysis via the load_config
function and then you can query the data with get_references
, get_reads
, and get_pos
.
For example:
>>> from nanocompore.api import load_config, get_pos
# Load the YAML configuration file to a Config object.
>>> config = load_config('analysis.yaml')
# Get the signal data for a given position:
>>> ref_id = 'ENST00000464651.1|ENSG00000166136.16|OTTHUMG00000019346.4|OTTHUMT00000051221.1|NDUFB8-204|NDUFB8|390|retained_intron|'
>>> get_pos(config, ref_id, 243)
condition sample read intensity dwell
0 WT WT_2 a6f3e188-6288-4215-acdc-fe28beba411f -1624.0 27.0
1 WT WT_2 09923db6-eccc-497f-8621-8adeea9b1bfb 4072.0 20.0
2 WT WT_2 f65926cc-bf13-4396-ba92-7f2f690b71d9 -2571.0 5.0
3 WT WT_2 aebabd0a-5260-41c4-b38b-1ebb117dc0fb 586.0 16.0
4 WT WT_2 994256e9-afab-4b54-94ff-cc37ae4cbe08 5229.0 16.0
.. ... ... ... ... ...
383 WT WT_1 79df3c74-a4c6-4335-93c5-a0ca7e3aec78 -1067.0 25.0
384 WT WT_1 f7dad9c6-d3d9-4501-85fb-6c6246a03719 -2225.0 56.0
385 WT WT_1 8653efdc-943f-48f8-b6f1-174cc4bb1ad5 2837.0 12.0
386 WT WT_1 fdf524f0-5bb5-45fc-a783-7e3a592eb149 462.0 30.0
387 WT WT_1 b05c004e-5f58-4bfa-896b-ce28b4225ab2 -469.0 27.0
[388 rows x 5 columns]
Reference
get_metadata(db)
Returns the metadata from the given SQLite database.
The metadata contains information such as input files, resquiggler used, and data types for the binary encoded fields.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db
|
str
|
Path to the SQLite database produced by the preprocessing command of Nanocompore. |
required |
Returns:
Type | Description |
---|---|
dict
|
Dictionary containing the metadata |
Source code in nanocompore/api.py
584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 |
|
get_pos(config, reference_id, pos)
Get the data for a given position for all samples. Note that position is a 0-based index of the first nucleotide of a k-mer.
Returns the signal data for a specific position of the given reference transcript from all reads.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
Config
|
Path to a Nanocompore configuration file. |
required |
reference_id
|
str
|
ID for a reference sequence (transcript). |
required |
pos
|
int
|
Position on the transcript for which to get data. A 0-based index is assumed. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Where the DataFrame contains the following columns:
|
Examples:
>>> from nanocompore.api import load_config, get_pos
>>> config = load_config('analysis.yaml')
>>> get_pos(config, 'ENST00000674681.1|ENSG00000075624.17|OTTHUMG00000023268|-|ACTB-219|ACTB|2554|protein_coding|', 532)
condition sample read intensity dwell
0 WT WT1 a4395b0d-dd3b-48e3-8afb-4085374b1147 3800.0 7.0
1 WT WT1 f9733448-6e6b-47ba-9501-01eda2f5ea26 4865.0 126.0
2 WT WT1 6f5e3b2e-f27b-47ef-b3c6-2ab4fdefd20a 3272.0 42.0
3 WT WT2 2da07406-70c2-40a1-835a-6a7a2c914d49 6241.0 44.0
4 WT WT2 54fc1d38-5e3d-4d77-a717-2d41b4785af6 4047.0 9.0
5 WT WT2 3cfa90d1-7dfb-4398-a224-c75a3ab99873 3709.0 70.0
6 KD KD1 3f46f499-8ce4-4817-8177-8ad61b784f27 4807.0 57.0
7 KD KD1 73d62df4-f04a-4207-a4bc-7b9739b3c3b2 4336.0 132.0
8 KD KD1 b7bc9a36-318e-4be2-a90f-74a5aa6439bf -861.0 7.0
9 KD KD2 ac486e16-15be-47a8-902c-2cfa2887c534 2706.0 45.0
10 KD KD2 797fd991-570e-42d4-8292-0a7557b192d7 5450.0 24.0
11 KD KD2 4e1ad358-ec2b-40b4-8e9a-54db28a40551 206.0 47.0
Source code in nanocompore/api.py
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 |
|
get_reads(config, reference_id, selected_reads=None)
Get the data for all reads mapping to the given reference.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
Config
|
Path to a Nanocompore configuration file. |
required |
reference_id
|
str
|
ID for a reference sequence (transcript). |
required |
selected_reads
|
Optional[list[str]]
|
Optional list of UUIDs of the reads for which to get data. By default it's set to None and returns all reads. |
None
|
Returns:
Type | Description |
---|---|
tuple[Float[np.ndarray, ["reads positions variables"]],
|
list[str], list[str], list[str]] A tuple with (signal_data, reads, samples, conditions)
|
Raises:
Type | Description |
---|---|
KeyError
|
If the reference_id is not found in the data sources. |
Examples:
>>> from nanocompore.api import load_config, get_references
>>> config = load_config('analysis.yaml')
>>> get_reads(config, 'ENST00000674681.1|ENSG00000075624.17|OTTHUMG00000023268|-|ACTB-219|ACTB|2554|protein_coding|')
Source code in nanocompore/api.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
|
get_references(config, has_data=True)
Returns a list of all references found in the list of samples defined in the configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
Config
|
Path to a Nanocompore configuration file. |
required |
has_data
|
bool
|
If True (default) will return only references for which there are mapped reads. |
True
|
Returns:
Type | Description |
---|---|
list
|
List of transcript reference id strings. |
Examples:
>>> from nanocompore.api import load_config, get_references
>>> config = load_config('analysis.yaml')
>>> get_references(config)
['ENST00000674681.1|ENSG00000075624.17|OTTHUMG00000023268|-|ACTB-219|ACTB|2554|protein_coding|', 'ENST00000642480.2|ENSG00000075624.17|OTTHUMG00000023268|OTTHUMT00000495153.1|ACTB-213|ACTB|2021|protein_coding|']
Source code in nanocompore/api.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
load_config(config_path)
Load a configuration file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config_path
|
str
|
Path to the Nanocompore configuration file. |
required |
Returns:
Type | Description |
---|---|
Config
|
A configuration object. |
Source code in nanocompore/api.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|