Generate plots and tables with SampCompDB¶
SampComp
creates a python object database (shelve DBM) containing the statistical analysis results. The API directly returns a SampCompDB
object wrapping the shelve DB. It is also possible to reload the SampCompDB
latter using the db file path prefix. SampCompDB
also need a FASTA file to get the corresponding reference id sequence and accept an optional BED file containing genomic annotations. SampCompDB provide a large selection of simple high level function to plot and export the results.
At the moment SampCompDB
is only accessible through the python API.
Import the package¶
from nanocompore.SampCompDB import SampCompDB, jhelp
Load the database with SampCompDB¶
jhelp (SampCompDB)
Basic initialisation¶
# Load database db = SampCompDB ( db_fn = "results/simulated_SampComp.db", fasta_fn = "references/simulated/ref.fa") # Print general metadata information print (db) # Prit list of references containing valid data print (db.ref_id_list)
Generate text reports¶
SampCompDB
can generate 3 types of text reports:
- Tabulated statistics =>
save_report
- Tabulated intensity and dwell values per conditions =>
save_shift_stats
- BED significant genomic positions =>
save_to_bed
In addition, we also conveniently wrapped all 3 methods in save_all
.
save_report¶
jhelp(SampCompDB.save_report)
# Reload DB db = SampCompDB (db_fn = "results/simulated_SampComp.db", fasta_fn = "references/simulated/ref.fa", log_level="warning") # Save report db.save_report (output_fn="./results/simulated_report.tsv") # Visualise first lines !head "./results/simulated_report.tsv"
save_shift_stats¶
jhelp(SampCompDB.save_shift_stats)
# Reload DB db = SampCompDB (db_fn = "results/simulated_SampComp.db", fasta_fn = "references/simulated/ref.fa", log_level="warning") # Save report db.save_shift_stats (output_fn="./results/simulated_shift.tsv") # Visualise first lines !head "./results/simulated_shift.tsv"
save_to_bed¶
jhelp(SampCompDB.save_to_bed)
# Reload DB db = SampCompDB (db_fn = "results/simulated_SampComp.db", fasta_fn = "references/simulated/ref.fa", bed_fn="references/simulated/annot.bed", log_level="warning") # Save report db.save_to_bed (output_fn="./results/simulated_sig_positions.bed") # Visualise first lines !head "./results/simulated_sig_positions.bed"
Generate plots¶
SampCompDB
comes with a range of methods to visualise the data and explore candidates.
plot_pvalue
: Plot the-log(10)
of the pvalues obtained for all the statistical methods at reference levelplot_signal
: Generate comparative plots of both median intensity and dwell time for each condition at read levelplot_coverage
: Plot the read coverage over a reference for all samples analysedplot_kmers_stats
: Fancy version ofplot_coverage
that also report missing, mismatching and undefined kmers status from Nanopolishplot_position
: Allow to visualise the distribution of intensity and dwell time in 2D for a single position
Extra imports for the plotting library¶
Matplotlib is required to use the ploting methods in Jupyter
import matplotlib.pyplot as pl %matplotlib inline
plot_pvalue¶
jhelp(SampCompDB.plot_pvalue)
Examples from simulated dataset¶
# Reload DB db = SampCompDB (db_fn = "results/simulated_SampComp.db", fasta_fn = "references/simulated/ref.fa", log_level="warning") # Plot fig, ax = db.plot_pvalue ("ref_0000")
# Reload DB db = SampCompDB (db_fn = "results/simulated_stats_SampComp.db", fasta_fn = "references/simulated/ref.fa", log_level="warning") # Plot fig, ax = db.plot_pvalue ("ref_0001", palette="Set1")
Example from real yeast dataset with extended sequence context¶
# Reload DB db = SampCompDB (db_fn = "results/yeast_SampComp.db", fasta_fn = "references/yeast/Yeast_transcriptome.fa", log_level="warning") # Plot fig, ax = db.plot_pvalue ("YHR174W")
plot_signal¶
jhelp(SampCompDB.plot_signal)
Examples from simulated dataset¶
# Reload DB db = SampCompDB (db_fn = "results/simulated_SampComp.db", fasta_fn = "references/simulated/ref.fa", log_level="warning") # Plot fig, ax = db.plot_signal ("ref_0000", start=75, end=100)
# Reload DB db = SampCompDB (db_fn = "results/simulated_SampComp.db", fasta_fn = "references/simulated/ref.fa", log_level="warning") # Plot fig, ax = db.plot_signal ("ref_0001", start=100, end=125, kind="swarmplot")
Example from real yeast dataset¶
# Reload DB db = SampCompDB (db_fn = "results/yeast_SampComp.db", fasta_fn = "references/yeast/Yeast_transcriptome.fa", log_level="warning") # Plot fig, ax = db.plot_signal ("YHR174W", start=665, end=700, kind="boxenplot")
plot_coverage¶
jhelp(SampCompDB.plot_coverage)
Example from real yeast dataset¶
# Reload DB db = SampCompDB (db_fn = "results/yeast_SampComp.db", fasta_fn = "references/yeast/Yeast_transcriptome.fa", log_level="warning") # Plot fig, ax = db.plot_coverage ("YHR174W")
plot_kmers_stats¶
jhelp(SampCompDB.plot_kmers_stats)
Example from real yeast dataset¶
# Reload DB db = SampCompDB (db_fn = "results/yeast_SampComp.db", fasta_fn = "references/yeast/Yeast_transcriptome.fa", log_level="warning") # Plot fig, ax = db.plot_kmers_stats ("YHR174W")
plot_position¶
jhelp(SampCompDB.plot_position)
Example from simulated dataset¶
# Reload DB db = SampCompDB (db_fn = "results/simulated_SampComp.db", fasta_fn = "references/simulated/ref.fa", log_level="warning") # Plot fig, ax = db.plot_position ("ref_0000", pos=82)