Generate simulated reads

In brief

Simulate reads as a NanopolishComp like file from a fasta file and an inbuild model.

...

Import the package and plotting tools

from nanocompore.SimReads import SimReads

# Ploting lib imports
import matplotlib.pyplot as pl
%matplotlib inline

Generate reads without modifications

SimReads (
    fasta_fn="./references/simulated/ref.fa",
    ref_list=["ref_0000"],
    outpath="./results/",
    overwrite="True",
    plot=True,
    nreads_per_ref=100)
2020-12-08 11:20:48.610 | INFO     | nanocompore.SimReads:SimReads:90 - Checking and initialising Simreads
2020-12-08 11:20:48.613 | DEBUG    | nanocompore.common:log_init_state:50 -     package_name: nanocompore
2020-12-08 11:20:48.615 | DEBUG    | nanocompore.common:log_init_state:51 -     package_version: 1.0.1.dev0
2020-12-08 11:20:48.619 | DEBUG    | nanocompore.common:log_init_state:52 -     timestamp: 2020-12-08 11:20:48.619541
2020-12-08 11:20:48.620 | DEBUG    | nanocompore.common:log_init_state:55 -     fasta_fn: ./references/simulated/ref.fa
2020-12-08 11:20:48.622 | DEBUG    | nanocompore.common:log_init_state:55 -     outpath: ./results/
2020-12-08 11:20:48.623 | DEBUG    | nanocompore.common:log_init_state:55 -     outprefix: out
2020-12-08 11:20:48.625 | DEBUG    | nanocompore.common:log_init_state:55 -     overwrite: True
2020-12-08 11:20:48.626 | DEBUG    | nanocompore.common:log_init_state:55 -     run_type: RNA
2020-12-08 11:20:48.627 | DEBUG    | nanocompore.common:log_init_state:55 -     ref_list: ['ref_0000']
2020-12-08 11:20:48.628 | DEBUG    | nanocompore.common:log_init_state:55 -     nreads_per_ref: 100
2020-12-08 11:20:48.639 | DEBUG    | nanocompore.common:log_init_state:55 -     plot: True
2020-12-08 11:20:48.641 | DEBUG    | nanocompore.common:log_init_state:55 -     intensity_mod: 0
2020-12-08 11:20:48.642 | DEBUG    | nanocompore.common:log_init_state:55 -     dwell_mod: 0
2020-12-08 11:20:48.644 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_reads_freq: 0
2020-12-08 11:20:48.646 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_bases_freq: 0.25
2020-12-08 11:20:48.647 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_bases_type: A
2020-12-08 11:20:48.649 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_extend_context: 2
2020-12-08 11:20:48.650 | DEBUG    | nanocompore.common:log_init_state:55 -     min_mod_dist: 6
2020-12-08 11:20:48.651 | DEBUG    | nanocompore.common:log_init_state:55 -     pos_rand_seed: 42
2020-12-08 11:20:48.652 | DEBUG    | nanocompore.common:log_init_state:55 -     not_bound: False
2020-12-08 11:20:48.653 | DEBUG    | nanocompore.common:log_init_state:55 -     progress: False
2020-12-08 11:20:48.655 | INFO     | nanocompore.SimReads:SimReads:101 - Importing RNA model file
2020-12-08 11:20:48.682 | INFO     | nanocompore.SimReads:SimReads:108 - Reading Fasta file and simulate corresponding data
2020-12-08 11:20:48.686 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0000
SimReads (
    fasta_fn="./references/simulated/ref.fa",
    ref_list=["ref_0000"],
    outpath="./results/",
    overwrite="True",
    plot=True,
    mod_extend_context=3,
    nreads_per_ref=100,
    intensity_mod=5,
    dwell_mod=5,
    mod_reads_freq=0.5)
2020-12-08 11:20:55.001 | INFO     | nanocompore.SimReads:SimReads:90 - Checking and initialising Simreads
2020-12-08 11:20:55.005 | DEBUG    | nanocompore.common:log_init_state:50 -     package_name: nanocompore
2020-12-08 11:20:55.006 | DEBUG    | nanocompore.common:log_init_state:51 -     package_version: 1.0.1.dev0
2020-12-08 11:20:55.008 | DEBUG    | nanocompore.common:log_init_state:52 -     timestamp: 2020-12-08 11:20:55.008532
2020-12-08 11:20:55.009 | DEBUG    | nanocompore.common:log_init_state:55 -     fasta_fn: ./references/simulated/ref.fa
2020-12-08 11:20:55.010 | DEBUG    | nanocompore.common:log_init_state:55 -     outpath: ./results/
2020-12-08 11:20:55.011 | DEBUG    | nanocompore.common:log_init_state:55 -     outprefix: out
2020-12-08 11:20:55.012 | DEBUG    | nanocompore.common:log_init_state:55 -     overwrite: True
2020-12-08 11:20:55.013 | DEBUG    | nanocompore.common:log_init_state:55 -     run_type: RNA
2020-12-08 11:20:55.013 | DEBUG    | nanocompore.common:log_init_state:55 -     ref_list: ['ref_0000']
2020-12-08 11:20:55.014 | DEBUG    | nanocompore.common:log_init_state:55 -     nreads_per_ref: 100
2020-12-08 11:20:55.015 | DEBUG    | nanocompore.common:log_init_state:55 -     plot: True
2020-12-08 11:20:55.017 | DEBUG    | nanocompore.common:log_init_state:55 -     intensity_mod: 5
2020-12-08 11:20:55.018 | DEBUG    | nanocompore.common:log_init_state:55 -     dwell_mod: 5
2020-12-08 11:20:55.020 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_reads_freq: 0.5
2020-12-08 11:20:55.021 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_bases_freq: 0.25
2020-12-08 11:20:55.023 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_bases_type: A
2020-12-08 11:20:55.025 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_extend_context: 3
2020-12-08 11:20:55.026 | DEBUG    | nanocompore.common:log_init_state:55 -     min_mod_dist: 6
2020-12-08 11:20:55.027 | DEBUG    | nanocompore.common:log_init_state:55 -     pos_rand_seed: 42
2020-12-08 11:20:55.028 | DEBUG    | nanocompore.common:log_init_state:55 -     not_bound: False
2020-12-08 11:20:55.029 | DEBUG    | nanocompore.common:log_init_state:55 -     progress: False
2020-12-08 11:20:55.030 | INFO     | nanocompore.SimReads:SimReads:101 - Importing RNA model file
2020-12-08 11:20:55.051 | INFO     | nanocompore.SimReads:SimReads:108 - Reading Fasta file and simulate corresponding data
2020-12-08 11:20:55.053 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0000
2020-12-08 11:20:55.383 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:293 -     Try to find 15 kmers to modify
2020-12-08 11:20:55.415 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:301 -     Found valid combination for 15 kmers
2020-12-08 11:20:55.416 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:302 -     modified positions: [  0  11  20  31  71  88 103 110 122 128 150 169 176 184 194]

Generate a small dataset with both modified and unmodified conditions

# Options
fasta = "./references/simulated/ref.fa"
data_dir = "./eventalign_files/simulated/"

for replicate, nreads in [(1, 55), (2, 60)]:
    # Generate non modified control
    SimReads (
        fasta_fn=fasta,
        outpath=data_dir,
        outprefix=f"unmodified_rep_{replicate}",
        overwrite=True,
        nreads_per_ref= nreads)

    # Generate modified control
    SimReads (
        fasta_fn=fasta,
        outpath=data_dir,
        outprefix=f"modified_rep_{replicate}",
        overwrite=True,
        nreads_per_ref= nreads,
        intensity_mod=3,
        dwell_mod=3,
        mod_reads_freq=0.9,
        mod_bases_freq = 0.25,
        pos_rand_seed=2)
2020-12-08 11:21:01.498 | INFO     | nanocompore.SimReads:SimReads:90 - Checking and initialising Simreads
2020-12-08 11:21:01.501 | DEBUG    | nanocompore.common:log_init_state:50 -     package_name: nanocompore
2020-12-08 11:21:01.502 | DEBUG    | nanocompore.common:log_init_state:51 -     package_version: 1.0.1.dev0
2020-12-08 11:21:01.503 | DEBUG    | nanocompore.common:log_init_state:52 -     timestamp: 2020-12-08 11:21:01.503063
2020-12-08 11:21:01.512 | DEBUG    | nanocompore.common:log_init_state:55 -     fasta_fn: ./references/simulated/ref.fa
2020-12-08 11:21:01.514 | DEBUG    | nanocompore.common:log_init_state:55 -     outpath: ./eventalign_files/simulated/
2020-12-08 11:21:01.515 | DEBUG    | nanocompore.common:log_init_state:55 -     outprefix: unmodified_rep_1
2020-12-08 11:21:01.516 | DEBUG    | nanocompore.common:log_init_state:55 -     overwrite: True
2020-12-08 11:21:01.517 | DEBUG    | nanocompore.common:log_init_state:55 -     run_type: RNA
2020-12-08 11:21:01.517 | DEBUG    | nanocompore.common:log_init_state:55 -     ref_list: []
2020-12-08 11:21:01.518 | DEBUG    | nanocompore.common:log_init_state:55 -     nreads_per_ref: 55
2020-12-08 11:21:01.519 | DEBUG    | nanocompore.common:log_init_state:55 -     plot: False
2020-12-08 11:21:01.520 | DEBUG    | nanocompore.common:log_init_state:55 -     intensity_mod: 0
2020-12-08 11:21:01.523 | DEBUG    | nanocompore.common:log_init_state:55 -     dwell_mod: 0
2020-12-08 11:21:01.525 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_reads_freq: 0
2020-12-08 11:21:01.527 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_bases_freq: 0.25
2020-12-08 11:21:01.529 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_bases_type: A
2020-12-08 11:21:01.530 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_extend_context: 2
2020-12-08 11:21:01.531 | DEBUG    | nanocompore.common:log_init_state:55 -     min_mod_dist: 6
2020-12-08 11:21:01.531 | DEBUG    | nanocompore.common:log_init_state:55 -     pos_rand_seed: 42
2020-12-08 11:21:01.532 | DEBUG    | nanocompore.common:log_init_state:55 -     not_bound: False
2020-12-08 11:21:01.533 | DEBUG    | nanocompore.common:log_init_state:55 -     progress: False
2020-12-08 11:21:01.534 | INFO     | nanocompore.SimReads:SimReads:101 - Importing RNA model file
2020-12-08 11:21:01.555 | INFO     | nanocompore.SimReads:SimReads:108 - Reading Fasta file and simulate corresponding data
2020-12-08 11:21:01.558 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0000
2020-12-08 11:21:01.806 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0001
2020-12-08 11:21:02.013 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0002
2020-12-08 11:21:02.223 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0003
2020-12-08 11:21:02.430 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0004
2020-12-08 11:21:02.651 | INFO     | nanocompore.SimReads:SimReads:90 - Checking and initialising Simreads
2020-12-08 11:21:02.652 | DEBUG    | nanocompore.common:log_init_state:50 -     package_name: nanocompore
2020-12-08 11:21:02.652 | DEBUG    | nanocompore.common:log_init_state:51 -     package_version: 1.0.1.dev0
2020-12-08 11:21:02.653 | DEBUG    | nanocompore.common:log_init_state:52 -     timestamp: 2020-12-08 11:21:02.653331
2020-12-08 11:21:02.654 | DEBUG    | nanocompore.common:log_init_state:55 -     fasta_fn: ./references/simulated/ref.fa
2020-12-08 11:21:02.655 | DEBUG    | nanocompore.common:log_init_state:55 -     outpath: ./eventalign_files/simulated/
2020-12-08 11:21:02.655 | DEBUG    | nanocompore.common:log_init_state:55 -     outprefix: modified_rep_1
2020-12-08 11:21:02.656 | DEBUG    | nanocompore.common:log_init_state:55 -     overwrite: True
2020-12-08 11:21:02.658 | DEBUG    | nanocompore.common:log_init_state:55 -     run_type: RNA
2020-12-08 11:21:02.659 | DEBUG    | nanocompore.common:log_init_state:55 -     ref_list: []
2020-12-08 11:21:02.660 | DEBUG    | nanocompore.common:log_init_state:55 -     nreads_per_ref: 55
2020-12-08 11:21:02.661 | DEBUG    | nanocompore.common:log_init_state:55 -     plot: False
2020-12-08 11:21:02.662 | DEBUG    | nanocompore.common:log_init_state:55 -     intensity_mod: 3
2020-12-08 11:21:02.663 | DEBUG    | nanocompore.common:log_init_state:55 -     dwell_mod: 3
2020-12-08 11:21:02.671 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_reads_freq: 0.9
2020-12-08 11:21:02.673 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_bases_freq: 0.25
2020-12-08 11:21:02.675 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_bases_type: A
2020-12-08 11:21:02.676 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_extend_context: 2
2020-12-08 11:21:02.678 | DEBUG    | nanocompore.common:log_init_state:55 -     min_mod_dist: 6
2020-12-08 11:21:02.679 | DEBUG    | nanocompore.common:log_init_state:55 -     pos_rand_seed: 2
2020-12-08 11:21:02.680 | DEBUG    | nanocompore.common:log_init_state:55 -     not_bound: False
2020-12-08 11:21:02.681 | DEBUG    | nanocompore.common:log_init_state:55 -     progress: False
2020-12-08 11:21:02.682 | INFO     | nanocompore.SimReads:SimReads:101 - Importing RNA model file
2020-12-08 11:21:02.701 | INFO     | nanocompore.SimReads:SimReads:108 - Reading Fasta file and simulate corresponding data
2020-12-08 11:21:02.706 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0000
2020-12-08 11:21:03.001 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:293 -     Try to find 15 kmers to modify
2020-12-08 11:21:03.004 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:301 -     Found valid combination for 15 kmers
2020-12-08 11:21:03.005 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:302 -     modified positions: [  0  19  34  47  62  73  82  88  95 116 137 156 163 169 189]
2020-12-08 11:21:03.121 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0001
2020-12-08 11:21:03.290 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:293 -     Try to find 12 kmers to modify
2020-12-08 11:21:03.296 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:301 -     Found valid combination for 12 kmers
2020-12-08 11:21:03.297 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:302 -     modified positions: [  1  12  44  64  93 110 121 133 144 164 180 187]
2020-12-08 11:21:03.390 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0002
2020-12-08 11:21:03.596 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:293 -     Try to find 11 kmers to modify
2020-12-08 11:21:03.598 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:301 -     Found valid combination for 11 kmers
2020-12-08 11:21:03.599 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:302 -     modified positions: [  1  22  39  45  55  75 101 117 161 171 178]
2020-12-08 11:21:03.695 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0003
2020-12-08 11:21:03.875 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:293 -     Try to find 14 kmers to modify
2020-12-08 11:21:03.923 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:301 -     Found valid combination for 14 kmers
2020-12-08 11:21:03.924 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:302 -     modified positions: [ 12  20  26  36  58  84  90  96 109 123 138 174 184 195]
2020-12-08 11:21:04.032 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0004
2020-12-08 11:21:04.209 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:293 -     Try to find 13 kmers to modify
2020-12-08 11:21:04.211 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:301 -     Found valid combination for 13 kmers
2020-12-08 11:21:04.212 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:302 -     modified positions: [ 15  26  43  58  65  75  81 104 123 163 169 179 185]
2020-12-08 11:21:04.299 | INFO     | nanocompore.SimReads:SimReads:90 - Checking and initialising Simreads
2020-12-08 11:21:04.300 | DEBUG    | nanocompore.common:log_init_state:50 -     package_name: nanocompore
2020-12-08 11:21:04.300 | DEBUG    | nanocompore.common:log_init_state:51 -     package_version: 1.0.1.dev0
2020-12-08 11:21:04.301 | DEBUG    | nanocompore.common:log_init_state:52 -     timestamp: 2020-12-08 11:21:04.301363
2020-12-08 11:21:04.301 | DEBUG    | nanocompore.common:log_init_state:55 -     fasta_fn: ./references/simulated/ref.fa
2020-12-08 11:21:04.303 | DEBUG    | nanocompore.common:log_init_state:55 -     outpath: ./eventalign_files/simulated/
2020-12-08 11:21:04.303 | DEBUG    | nanocompore.common:log_init_state:55 -     outprefix: unmodified_rep_2
2020-12-08 11:21:04.304 | DEBUG    | nanocompore.common:log_init_state:55 -     overwrite: True
2020-12-08 11:21:04.305 | DEBUG    | nanocompore.common:log_init_state:55 -     run_type: RNA
2020-12-08 11:21:04.306 | DEBUG    | nanocompore.common:log_init_state:55 -     ref_list: []
2020-12-08 11:21:04.307 | DEBUG    | nanocompore.common:log_init_state:55 -     nreads_per_ref: 60
2020-12-08 11:21:04.307 | DEBUG    | nanocompore.common:log_init_state:55 -     plot: False
2020-12-08 11:21:04.308 | DEBUG    | nanocompore.common:log_init_state:55 -     intensity_mod: 0
2020-12-08 11:21:04.310 | DEBUG    | nanocompore.common:log_init_state:55 -     dwell_mod: 0
2020-12-08 11:21:04.311 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_reads_freq: 0
2020-12-08 11:21:04.311 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_bases_freq: 0.25
2020-12-08 11:21:04.312 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_bases_type: A
2020-12-08 11:21:04.315 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_extend_context: 2
2020-12-08 11:21:04.316 | DEBUG    | nanocompore.common:log_init_state:55 -     min_mod_dist: 6
2020-12-08 11:21:04.317 | DEBUG    | nanocompore.common:log_init_state:55 -     pos_rand_seed: 42
2020-12-08 11:21:04.318 | DEBUG    | nanocompore.common:log_init_state:55 -     not_bound: False
2020-12-08 11:21:04.319 | DEBUG    | nanocompore.common:log_init_state:55 -     progress: False
2020-12-08 11:21:04.320 | INFO     | nanocompore.SimReads:SimReads:101 - Importing RNA model file
2020-12-08 11:21:04.334 | INFO     | nanocompore.SimReads:SimReads:108 - Reading Fasta file and simulate corresponding data
2020-12-08 11:21:04.336 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0000
2020-12-08 11:21:04.556 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0001
2020-12-08 11:21:04.758 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0002
2020-12-08 11:21:04.973 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0003
2020-12-08 11:21:05.216 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0004
2020-12-08 11:21:05.426 | INFO     | nanocompore.SimReads:SimReads:90 - Checking and initialising Simreads
2020-12-08 11:21:05.427 | DEBUG    | nanocompore.common:log_init_state:50 -     package_name: nanocompore
2020-12-08 11:21:05.427 | DEBUG    | nanocompore.common:log_init_state:51 -     package_version: 1.0.1.dev0
2020-12-08 11:21:05.428 | DEBUG    | nanocompore.common:log_init_state:52 -     timestamp: 2020-12-08 11:21:05.428469
2020-12-08 11:21:05.429 | DEBUG    | nanocompore.common:log_init_state:55 -     fasta_fn: ./references/simulated/ref.fa
2020-12-08 11:21:05.430 | DEBUG    | nanocompore.common:log_init_state:55 -     outpath: ./eventalign_files/simulated/
2020-12-08 11:21:05.431 | DEBUG    | nanocompore.common:log_init_state:55 -     outprefix: modified_rep_2
2020-12-08 11:21:05.431 | DEBUG    | nanocompore.common:log_init_state:55 -     overwrite: True
2020-12-08 11:21:05.432 | DEBUG    | nanocompore.common:log_init_state:55 -     run_type: RNA
2020-12-08 11:21:05.433 | DEBUG    | nanocompore.common:log_init_state:55 -     ref_list: []
2020-12-08 11:21:05.434 | DEBUG    | nanocompore.common:log_init_state:55 -     nreads_per_ref: 60
2020-12-08 11:21:05.434 | DEBUG    | nanocompore.common:log_init_state:55 -     plot: False
2020-12-08 11:21:05.435 | DEBUG    | nanocompore.common:log_init_state:55 -     intensity_mod: 3
2020-12-08 11:21:05.436 | DEBUG    | nanocompore.common:log_init_state:55 -     dwell_mod: 3
2020-12-08 11:21:05.437 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_reads_freq: 0.9
2020-12-08 11:21:05.437 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_bases_freq: 0.25
2020-12-08 11:21:05.438 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_bases_type: A
2020-12-08 11:21:05.439 | DEBUG    | nanocompore.common:log_init_state:55 -     mod_extend_context: 2
2020-12-08 11:21:05.440 | DEBUG    | nanocompore.common:log_init_state:55 -     min_mod_dist: 6
2020-12-08 11:21:05.441 | DEBUG    | nanocompore.common:log_init_state:55 -     pos_rand_seed: 2
2020-12-08 11:21:05.442 | DEBUG    | nanocompore.common:log_init_state:55 -     not_bound: False
2020-12-08 11:21:05.442 | DEBUG    | nanocompore.common:log_init_state:55 -     progress: False
2020-12-08 11:21:05.443 | INFO     | nanocompore.SimReads:SimReads:101 - Importing RNA model file
2020-12-08 11:21:05.460 | INFO     | nanocompore.SimReads:SimReads:108 - Reading Fasta file and simulate corresponding data
2020-12-08 11:21:05.462 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0000
2020-12-08 11:21:05.645 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:293 -     Try to find 15 kmers to modify
2020-12-08 11:21:05.648 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:301 -     Found valid combination for 15 kmers
2020-12-08 11:21:05.648 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:302 -     modified positions: [  0  19  34  47  62  73  82  88  95 116 137 156 163 169 189]
2020-12-08 11:21:05.746 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0001
2020-12-08 11:21:05.912 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:293 -     Try to find 12 kmers to modify
2020-12-08 11:21:05.917 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:301 -     Found valid combination for 12 kmers
2020-12-08 11:21:05.918 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:302 -     modified positions: [  1  12  44  64  93 110 121 133 144 164 180 187]
2020-12-08 11:21:06.008 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0002
2020-12-08 11:21:06.184 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:293 -     Try to find 11 kmers to modify
2020-12-08 11:21:06.186 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:301 -     Found valid combination for 11 kmers
2020-12-08 11:21:06.187 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:302 -     modified positions: [  1  22  39  45  55  75 101 117 161 171 178]
2020-12-08 11:21:06.273 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0003
2020-12-08 11:21:06.446 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:293 -     Try to find 14 kmers to modify
2020-12-08 11:21:06.484 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:301 -     Found valid combination for 14 kmers
2020-12-08 11:21:06.485 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:302 -     modified positions: [ 12  20  26  36  58  84  90  96 109 123 138 174 184 195]
2020-12-08 11:21:06.589 | DEBUG    | nanocompore.SimReads:SimReads:126 - Processing reference ref_0004
2020-12-08 11:21:06.838 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:293 -     Try to find 13 kmers to modify
2020-12-08 11:21:06.840 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:301 -     Found valid combination for 13 kmers
2020-12-08 11:21:06.841 | DEBUG    | nanocompore.SimReads:find_valid_pos_list:302 -     modified positions: [ 15  26  43  58  65  75  81 104 123 163 169 179 185]

Full CLI and API documentations

API documentation

API help can be obtained with conventional python methods (help or ?) or rendered nicely in Jupyter with the jhelp function from nanocompore

from nanocompore.SimReads import SimReads
from nanocompore.common import jhelp
jhelp(SimReads)

SimReads (fasta_fn, outpath, outprefix, overwrite, run_type, ref_list, nreads_per_ref, plot, intensity_mod, dwell_mod, mod_reads_freq, mod_bases_freq, mod_bases_type, mod_extend_context, min_mod_dist, pos_rand_seed, data_rand_seed, not_bound, progress)

Simulate reads in a NanopolishComp like file from a fasta file and an inbuild model. The simulated reads correspond to the sequences provided in the fasta file and follow the intensity and dwell time from the corresponding model (RNA or DNA).


  • fasta_fn (required) [str]

Fasta file containing references to use to generate artificial reads.

  • outpath (default: ./) [str]

Path to the output folder.

  • outprefix (default: out) [str]

text outprefix for all the files generated by the function.

  • overwrite (default: False) [bool]

If the output directory already exists, the standard behaviour is to raise an error to prevent overwriting existing data This option ignore the error and overwrite data if they have the same outpath and outprefix.

  • run_type (default: RNA) [str]

Define the run type model to import {RNA,DNA}

  • ref_list (default: []) [list]

Restrict the references to the listed IDs.

  • nreads_per_ref (default: 100) [int]

Number of reads to generate per references.

  • plot (default: False) [bool]

If true, generate an interactive plot of the trace generated.

  • intensity_mod (default: 0) [float]

Fraction of intensity distribution SD by which to modify the intensity distribution loc value.

  • dwell_mod (default: 0) [float]

Fraction of dwell time distribution SD by which to modify the intensity distribution loc value.

  • mod_reads_freq (default: 0) [float]

Frequency of reads to modify.

  • mod_bases_freq (default: 0.25) [float]

Frequency of bases to modify in each read (if possible).

  • mod_bases_type (default: A) [str]

Base for which to modify the signal. {A,T,C,G}

  • mod_extend_context (default: 2) [int]

number of adjacent base affected by the signal modification following an harmonic series.

  • min_mod_dist (default: 6) [int]

Minimal distance between 2 bases to modify.

  • pos_rand_seed (default: 42) [int]

Define a seed for randon position picking to get a deterministic behaviour.

  • data_rand_seed (default: None) [int]

Define a seed for generating the data. If None (default) the seed is drawn from /dev/urandom.

  • not_bound (default: False) [bool]

Do not bind the values generated by the distributions to the observed min and max observed values from the model file.

  • progress (default: False) [bool]

Display a progress bar during execution

CLI documentation

nanocompore simreads --help
usage: nanocompore simreads [-h] --fasta FASTA [--intensity_mod INTENSITY_MOD]
                            [--dwell_mod DWELL_MOD]
                            [--mod_reads_freq MOD_READS_FREQ]
                            [--mod_bases_freq MOD_BASES_FREQ]
                            [--mod_bases_type {A,T,C,G}]
                            [--mod_extend_context MOD_EXTEND_CONTEXT]
                            [--min_mod_dist MIN_MOD_DIST]
                            [--run_type {RNA,DNA}]
                            [--nreads_per_ref NREADS_PER_REF]
                            [--pos_rand_seed POS_RAND_SEED] [--not_bound]
                            [--outpath OUTPATH] [--outprefix OUTPREFIX]
                            [--overwrite] [--log_level {warning,info,debug}]
                            [--progress]

Simulate reads as a NanopolishComp like file from a fasta file and an inbuild model

* Minimal example without model alteration
    nanocompore simreads -f ref.fa -o results -n 50

* Minimal example with alteration of model intensity loc parameter for 50% of the reads
nanocompore simreads -f ref.fa -o results -n 50 --intensity_mod 2 --mod_reads_freq 0.5 --mod_bases_freq 0.2

optional arguments:
  -h, --help            show this help message and exit

Input options:
  --fasta FASTA, -f FASTA
                        Fasta file containing references to use to generate
                        artificial reads

Signal modification options:
  --intensity_mod INTENSITY_MOD
                        Fraction of intensity distribution SD by which to
                        modify the intensity distribution loc value (default:
                        0)
  --dwell_mod DWELL_MOD
                        Fraction of dwell time distribution SD by which to
                        modify the intensity distribution loc value (default:
                        0)
  --mod_reads_freq MOD_READS_FREQ
                        Frequency of reads to modify (default: 0)
  --mod_bases_freq MOD_BASES_FREQ
                        Frequency of bases to modify in each read (if
                        possible) (default: 0.25)
  --mod_bases_type {A,T,C,G}
                        Base for which to modify the signal (default: A)
  --mod_extend_context MOD_EXTEND_CONTEXT
                        number of adjacent base affected by the signal
                        modification following an harmonic series (default: 2)
  --min_mod_dist MIN_MOD_DIST
                        Minimal distance between 2 bases to modify (default:
                        6)

Other options:
  --run_type {RNA,DNA}  Define the run type model to import (default: RNA)
  --nreads_per_ref NREADS_PER_REF, -n NREADS_PER_REF
                        Number of reads to generate per references (default:
                        100)
  --pos_rand_seed POS_RAND_SEED
                        Define a seed for randon position picking to get a
                        deterministic behaviour (default: 42)
  --not_bound           Do not bind the values generated by the distributions
                        to the observed min and max observed values from the
                        model file (default: False)

Output options:
  --outpath OUTPATH, -o OUTPATH
                        Path to the output folder (default: ./)
  --outprefix OUTPREFIX, -p OUTPREFIX
                        text outprefix for all the files generated (default:
                        out)
  --overwrite, -w       Use --outpath even if it exists already (default:
                        False)

Verbosity options:
  --log_level {warning,info,debug}
                        Set the log level (default: info)
  --progress            Display a progress bar during execution (default:
                        False)